Measuring Individual Differences in Implicit Cognition: The Implicit Association Test

An implicit association test (IAT) measures differential association of 2 target concepts with an attribute. The 2 concepts appear in a 2-choice task (e

second discrimination follows from the existence of strong associations of male names to male faces and female names to female faces.The attempt to map the same two responses ("hello" and "goodbye" ) in opposite ways onto the two gender contrasts is resisted by well-established associations that link the face and name domains.The (assumed) performance difference between the two versions of the combined task indeed measures the strength of gender-based associations between the face and name domains.This pair of thought experiments provides the model for a method, the implicit association test (IAT), that is potentially useful for diagnosing a wide range of socially significant associative structures.The present research sought specifically to appraise the IAT method's usefulness for measuring evaluative associations that underlie implicit attitudes (Greenwald & Banaji, 1995).
2 A few recent studies have indicated that priming measures may be sensitive enough to serve as measures of individual differences in the strength of automatic attitudinal evaluation (Dovidio & Gaertner, 1995;Fazio, Jackson, Dunton, & Williams, 1995).At the same time, other studies have indicated that priming is relatively unaffected by variations The IAT procedure of the present experiments involved a series of five discrimination tasks (numbered columns).A pair of target concepts and an attribute dimension are introduced in the first two steps.Categories for each of these discriminations are assigned to a left or right response, indicated by the black circles in the third row.These are combined in the third step and then recombined in the fifth step, after reversing response assignments (in the fourth step) for the target-concept discrimination.The illustration uses stimuli for the specific tasks for one of the task-order conditions of Experiment 3, with correct responses indicated as open circles.
One might appreciate the IAT's potential value as a measure of socially significant automatic associations by changing the thought experiment to one in which the to-be-distinguished faces of the first task are Black or White (e.g., "hello" to African American faces and "goodbye" to European American faces) and the second task is to classify words as pleasant or unpleasant in meaning ( "hello" to pleasant words, "goodbye" to unpleasant words).The two possible combinations of these tasks can be abbreviated as Black + pleasant and White + pleasant.3 Black + pleasant should be easier than White + pleasant if there is a stronger association between Black Americans and pleasant meaning than between White Americans and pleasant meaning.If the preexisting associations are opposite in direction-which might be expected for White subjects raised in a culture imbued with pervasive residues of a history of anti-Black discrimination--the subject should find White + pleasant to be easier.
A possible property of the IAT--and one that is similar to a major virtue of cognitive priming methods--is that it may resist masking by self-presentation strategies.That is, the implicit association method may reveal attitudes and other automatic associations even for subjects who prefer not to express those attitudes.

Design of the IAT
Figure 1 describes the sequence of tasks that constitute the IAT measures in this research and illustrates this sequence with materials from the present Experiment 3. The IAT assesses the association between a target-concept discrimination and an attribute dimension.The procedure starts with introduction of the target-concept discrimination.In Figure 1, this initial discrimination is to distinguish first names that are (in the United States) recognizable as Black or African American from ones recognizable as White or European American.This and subsequent discriminations are performed by assigning one category to a response by the left hand and the other to a response by the right hand.The second step is introduction of the attribute dimension, also in the form of a two-category discrimination.For all of the present experiments, the attribute discrimination was evaluation, represented by the task of categorizing words as pleasant versus unpleasant in meaning.After this introduction to the target discrimination and to the attribute dimension, the two are superimposed in the third step, in which stimuli for target and attribute discriminations appear on alternate trials.In the fourth step, the respondent learns a reversal of response assignments for the target discrimination, and the fifth (final) step combines the attribute discrimination (not changed in response assignments) with this reversed target discrimination.If the target categories in attitude strength (Bargh et al., 1992;Chaiken & Bargh, 1993), implying that it may be limited in sensitivity to intra-or interindividual differences.
are differentially associated with the attribute dimension, the subject should find one of the combined tasks (of the third or fifth step) to be considerably easier than the other, as in the male-female thought experiments.The measure of this difficulty difference provides the measure of implicit attitudinal difference between the target categories.

Subjects
Thirty-two (13 male and 19 female) students from introductory psychology courses at the University of Washington participated in exchange for an optional course creditP Data for 8 additional subjects were not included in the analysis because of their relatively high error rates, which were associated with responding more rapidly than appropriate for the task.6 Data were unusable for one additional subject who, for unknown reasons, neglected to complete the computer-administered portion of the experiment.
Because the present three experiments sought to assess the IAT's ability to measure implicit attitudes, in each experiment the associated attribute dimension was evaluation (pleasant vs. unpleasant) .4Each experiment investigated attitudes that were expected to be strong enough to be automatically activated.
Experiment 1 used target concepts for which the evaluative associations were expected to be highly similar across persons.Two of these concepts were attitudinally positive (flowers and musical instruments) and two were negative (insects and weapons).Experiment 2 used two groups of subjects (Korean American and Japanese American) to assess ethnic attitudes that were assumed to be mutually opposed, stemming from the history of military subjugation of Korea by Japan in the first half of the 20th century.The IAT method was expected to reveal these opposed evaluations even for subjects who would deny, on selfreport measures, any antipathy toward the out-group.Experiment 3 used the IAT to assess implicit attitudes of White subjects toward White and Black racial categories.For these subjects we expected that the IAT might reveal more attitudinal discrimination between White and Black categories than would be revealed by explicit (self-report) measures of the same racial attitudes.

Experiment 1
Experiment 1 used the IAT to assess implicit attitudes toward two pairs of target attitude concepts for which subjects were expected to have relatively uniform evaluative associations.A second purpose was to examine effects on IAT measures of several procedural variables that are intrinsic to the IAT method.Subjects in Experiment 1 responded to two target-concept discriminations: (a) flower names (e.g., rose, tulip, marigold) versus insect names (e.g., bee, wasp, horsefly) and (b) musical instrument names (e.g., violin, flute, piano) versus weapon names (e.g., gun, knife, hatchet).Each target-concept discrimination was used in combination with discrimination of pleasantmeaning words (e.g., family, happy, peace) from unpleasantmeaning words (e.g., crash, rotten, ugly).The IAT procedure was expected to reveal superior performance for combinations that were evaluatively compatible (flower + pleasant or instrument + pleasant) than for noncompatible combinations (insect + pleasant or weapon + pleasant).

Me~od
After being seated at a table with a desktop computer in a small room, subjects received all instructions from a computer display and provided all of their responses via the computer keyboard.

Materials
The experiment's three classification tasks used 150 stimulus words: 25 insect names, 25 flower names, 25 musical instrument names, 25 weapon names, 25 pleasant-meaning words, and 25 unpleasant-meaning words.The pleasant and unpleasant words were selected from norms reported by Bellezza, Greenwald, and Banaji (1986).Many of the items for the other four categories were taken from category lists provided by Battig and Montague (1969), with additional category members generated by the authors.The selected flower, insect, instrument, and weapon exemplars were ones that the authors judged to be both familiar to and unambiguously classifiable by members of the subject population.The 150 words used as stimuli in Experiment 1 are listed in Appendix A.

Apparatus
Experiment 1 was administered on IBM-compatible (80486 processor) desktop computers] Subjects viewed this display from a distance of about 65 cm and gave left responses with left forefinger (using the A key) and right responses with right forefinger (using the 5 key on the right-side numeric keypad).

Overview
Each subject completed tasks for two IAT measures in succession, one using flowers versus insects as the target-concept discrimination, and the other using musical instruments versus weapons.The first IAT used the complete sequence of five steps of Figure 1: (a) initial targetconcept discrimination, (b) evaluative attribute discrimination, (c) first combined task, (d) reversed target-concept discrimination, and (e) reversed combined task.The second IAT did not need to repeat practice of the evaluative discrimination, and so included only four steps: (f) initial target-concept discrimination, (g) first combined task, (h) reversed target-concept discrimination, and (i) reversed combined task.One IAT measure of attitude was obtained by comparing performance in steps (c) and (e), and the second by comparing performance in steps (g) and (i).
4 The IAT can be used also to measure implicit stereotypes and implicit self-concept (see Greenwald & Banaji, 1995) by appropriate selection of target concept and attribute discriminations.
5 Another group of 32 subjects participated in a prior replication of Experiment 1 that, however, lacked the paper-and-pencil explicit measures that were included in the reported replication.With one minor exception (mentioned in Footnote 11 ), there were no discrepancies in findings between the two replications.
6 Use of data from these 8 subjects (instead of those who replaced them in the design) would have reduced power of statistical tests.As it turns out, this would not have altered any conclusions.The higher power obtained by replacing them was desirable because of the importance of identifying possible procedural influences on the IAT method.
7 The programs used for all of the present experiments were Windows 95-based and written primarily by Sean C. Draine.

Design
The two IAT measures obtained for each subject were analyzed in a design that contained five procedural variables, listed here and described more fully in the Procedure section: (a) order of the two target-concept discriminations (flowers vs. insects first or instruments vs. weapons first), (b) order of compatibility conditions within each IAT (evaluatively compatible combination of discriminations before or after noncompatible combination), (c) response key assigned to pleasant items (left or right), (d) category set sizes for discriminations (5 items or 25 items per category), and (e) interval between response and next item presentation for the combined task (100, 400, or 700 ms).The first four of these were two-level between-subjects variables that were administered factorially, such that 2 subjects received each of the 16 possible combinations; the last was a three-level within-subjects variation.

Procedure
Trial blocks.All tasks were administered in trial blocks of 50 trials.
Each trial block started with instructions that described the category discrimination(s) for the block and the assignments of response keys (left or right) to categories.Reminder labels, in the form of category names appropriately positioned to the left or right, remained on screen during each block.Each new category discrimination--in Steps (a), (b), and (f) described in the Overview section--consisted of a practice block of 50 trials followed by a block for which data were analyzed.Combined tasks consisted of a practice block followed by three blocks of data collection, each with a different intertrial interval (see next paragraph).
Timing details.The first trial started 1.5 s after the reminder display appeared.Stimuli were presented in black letters against the light gray screen background, vertically and horizontally centered in the display and remaining on screen until the subject's response.The subject's keypress response initiated a delay (intertrial interval) before the next trial's stimulus.For all simple categorization and combined-task practice trials, the intertrial interval was 400 ms.For the three blocks of combined-task data collection, the interval was either 100, 400, or 700 ms.Half of the subjects received these intervals in ascending order of blocks (100, 400, 700), and the remainder in the opposite order.Throughout the experiment, after any incorrect response, the word error immediately replaced the stimulus for 300 ms, lengthening the intertrial interval by 300 ms.At the end of each 50-trial block, subjects received a feedback summary that gave their mean response latency in milliseconds and percentage correct for the just-concluded block.
Stimuli.Words were selected randomly and without replacement (independently for each subject) until the available stimuli for a task were exhausted, at which point the stimulus pool was replaced if more trials were needed.For example, in single-discrimination tasks (a) in the 25-items-per-category condition, each 50-trial block used each of the 50 stimuli for the two categories once, and (b) in the 5-items-percategory condition, each of the 10 stimuli was used five times each.Selection of subsets of five items for the 5-items-per-category conditions was counterbalanced so that all stimuli were used equally in the experiment.For the combined tasks, stimuli were selected such that (a) for subjects assigned to 25-item categories, each of the 100 possible stimuli-50 target-concept items and 50 evaluative items--appeared twice in a total of 200 combined-task trials, or (b) for those assigned to 5item categories, each of the 20 possible stimuli appeared 10 times.In all combined tasks, items for the target-concept discrimination and the attribute discrimination appeared on alternating trials.
Explicit attitude measures.After the computer tasks, subjects completed paper-and-pencil questionnaire measures of their attitudes toward the four target concepts.On the feeling thermometer, subjects were asked to describe their general level of warmth or coolness toward flowers, insects, musical instruments, and weapons (in that order) by making a mark at the appropriate position on an illustration of a thermometer.The thermometer was numerically labeled at 10-degree intervals from 0 to 99 and anchored at the 0, 50, and 99 points with the words cold or unfavorable, neutral, and warm or favorable, respectively.Next, subjects completed a set of five semantic differential items for each of the four object categories.These 7-point scales were anchored at either end by polar-opposite adjective pairs: beautiful-ugly, good-bad, pleasant-unpleasant, honest-dishonest, and nice-awful.Subjects were instructed to mark the middle of the range if they considered both anchoring adjectives to be irrelevant to the category.The semantic differential was scored by averaging the five items for each concept, scored on a scale ranging from -3 (negative) to 3 (positive).

Data Reduction
The data for each trial block included response latencies (in milliseconds) and error rates.Prior to conducting other analyses, distributions of these measures were examined, revealing the usual impurities (for speeded tasks) in the form of small proportions of extremely fast and extremely slow responses.These outlying values typically indicate, respectively, responses initiated prior to perceiving the stimulus (anticipations) and momentary inattention.The values in these tails of the latency distribution are problematic not only because they lack theoretical interest but also because they distort means and inflate variances.The solution used for these was to recode values below 300 ms to 300 ms and those above 3,000 ms to 3,000 ms.s We then log-transformed latencies in order to use a statistic that had satisfactory stability of variance for analyses.9 Also, the first two trials of each block were dropped because of their typically lengthened latencies.Analyses of error rates are not described in detail.However, they (a) revealed relatively low error rates, averaging just under 5% in Experiment 1, and (b) were consistent with latency analyses (higher error rates were obtained for conditions that produced longer latencies ), but (c) also revealed considerably weaker effects of task-compatibility combinations than were obtained in analyses of latencies.for the two levels of the only procedural variable that substantially influenced the data, whether subjects performed evaluatively compatible combinations before noncompatible ones, or (noncompatible minus compatible).For the data presented in Figure 2, IAT effects averaged 129 ms when noncompatible combinations preceded compatible (upper panel) and 223 ms when compatible combinations came first.For this effect of compatibility order, F(1, 16) = 10.12,p = .006.

A Summary Measure of IAT Effect
In Experiment 1, IAT effects indicating more positive attitudes toward flowers than insects or toward musical instruments than weapons were expected and were also quite clearly obtained.That is, subjects performed faster for flower + pleasant or instrument + pleasant combinations than for insect + pleasant or weapon + pleasant.Using the pooled standard deviation (for compatible and noncompatible conditions) as the effect size unit and collapsing across all design factors other than order of compatibility conditions, effect sizes for the IAT effect (i.e., differences from zero) were d = 0.78 and d = 2.30, respectively, for the noncompatible first and compatible first conditions.(By convention, d = 0.8 is considered to be a large effect size.) Statistical significance tests for difference of these IAT effects from zero were, respectively, F ( 1, 8) = 25.62,p = .001,and F(1, 8 ) = 134.53,p = 10-6.1°

Effects of Procedural Variables
The design had five procedural factors, one varied withinsubject (intertrial interval) and four varied between-subjects: Combination compatibility order (compatible combination first or second), category set size (5 or 25 items), key assignment for pleasant category (left or right key), and target-concept order (flowers vs. insects or instruments vs. weapons as the first target-concept discrimination).The main effect of combination compatibility order has already been noted and described in Figure 2. Aside from an uninterpretable four-way interaction effect, there were no other significant effects of these procedural variables.~1

IAT Compared With Explicit Attitude Measures
The IAT effect index is proposed as a measure of subjects' relative implicit attitudes toward the categories under study.That is, better performance in the flower + pleasant condition than in the insect + pleasant condition is taken to indicate a stronger ~0 These statistical tests were based on the log-transformed latencies.
Here and elsewhere in this report, p values are reported as approximately exact values, rather than as inequalities relative to a Type I error criterion (e.g., p < .05).This follows the suggestion by Greenwald, Gonzalez, Guthrie, and Harris (1996) not to obscure information provided by p values.Values smaller than .0001are rounded to the nearest exponent of 10.This treatment ofp values notwithstanding, the primary reporting of data is in terms of descriptively more useful raw and standardized effect sizes.For comparison, analysis of untransformed latencies yielded F(1, 8) ratios of 18.97 and 72.45, ps = .002and 10 -5, respectively.Analyses of reciprocally transformed latencies (speeds) yielded F( 1, 8) ratios of 26.72 and 198.15, ps = .0009and 10 -6, respectively.
H Fortunately, the uninterpretable four-way interaction did not appear in the prior replication (see Footnote 5) and so appears not to call for effort at interpretation.In other respects, however, the prior replication produced IAT effects that were very similar in magnitude to those shown in Figure 2, and it also revealed the same effect of combination compatibility order that was obtained in Experiment 1. 1469 association between flowers and pleasant meaning than between insects and pleasant meaning and, thus, a more positive attitude toward flowers than insects.Table 1 presents data for the IAT latency measure along with corresponding attitude measures derived from the feeling thermometer and semantic differential measures.All measures are difference scores, with positive scores indicating more favorable attitudes toward flowers than insects, or toward musical instruments than weapons.For all of these measures, attitude differences were observed.
Correlations among the explicit and implicit attitude measures are shown in Table 2.The table presents correlations between measures for the flower-insect contrast above the diagonal and those for the musical instrument-weapon contrast below the diagonal.All of the correlations in Table 2 are in the expected positive direction.Notably, however, scores on the explicit measures for both the flower-insect and instrument-weapon contrasts were only weakly correlated with implicit attitude scores derived from the IAT.

Discussion
Experiment 1 tested the principal assumption underlying the implicit association test: that associations can be revealed by mapping two discrimination tasks alternately onto a single pair of responses.Confirming expectation, consistently superior performance was observed when associatively compatible (compared with associatively noncompatible) categories were mapped onto the same response.In Experiment 1, both flowerinsect and instrument-weapon discriminations were performed more rapidly when their evaluatively positive categories (flowers or musical instruments)shared a response with pleasant-meaning words than when those categories shared a response with unpleasant-meaning words.Of importance, the data (Figure 2) indicated that compatible task combinations were performed about as rapidly as the uncombined target concept or attribute discriminations, whereas noncompatible combinations were performed considerably more slowly.These findings were clearly encouraging regarding the possibility that the IAT method can effectively measure implicit attitudes.In summary, Experiment l's IAT measures were highly sensitive to evaluative discriminations that are well established in the connotative meaning structure of the English language.
Experiment t was remarkable for the near absence of moderating effects of procedural variables on the measures of evaluative associations that were revealed by the IAT procedure.The effect of task-combination compatibility was not noticeably affected (a) by intertrial intervals (100, 400, or 700 ms), (b) by the set size of categories used in discrimination tasks (5 or 25 items), (c) by the assignment of response key (left or right) to the pleasant category, or (d) by position of the IAT measure within the experiment (first or second internal replication).The variation of order in which compatible and noncompatible task combinations were performed produced a moderate effect, such that the IAT measure of differential evaluation was larger when the compatible combination was performed first.This effect is examined also in Experiments 2 and 3.
Last, Experiment 1 provides the first of a series of findings of low correlations between explicit and implicit measures (see Table 2).The correlations between explicit measures of differ-ent contrasts (flower-insect with instrument-weapon, average r = .41) and between implicit measures of different contrasts (average r = .58)were strikingly greater than those between explicit and implicit measures of the same contrast (average r =. 19).12 This pattern indicates the likely presence of systematic method variance for both types of measures, along with a divergence in the constructs measured by the two types of measures.This conceptual divergence between the implicit and explicit measures is of course expected from theorization about implicit social cognition (Greenwald & Banaji, 1995), as well as from previous research findings such as those already mentioned by Dovidio and Gaertner (1995) and Fazio, Jackson, Danton, and Williams ( 1995 ).It is also plausible, however, that these correlations are low because of relative lack of population variability in the attitudes being assessed (e.g., uniformity in liking for flowers or disliking for insects).
Experiment 2 Experiment 1 demonstrated the IAT's ability to detect presumed near-universal evaluative associations involving the semantic contrasts of flowers versus insects and instruments versus weapons.Perhaps because the evaluative aspects of these contrasts are so nearly uniform in the population, they are not typically considered to be attitudinal.Experiment 2 sought to extend the IAT method to a domain that is more typically attitudinal, by using it to discriminate differences between Japanese Americans and Korean Americans in their evaluative associations toward Japanese and Korean ethnic groups.The history of Japanese-Korean antagonism provided the basis for a knowngroups study in which it could be expected that each ethnic group would have not only a typical in-group-directed positive attitude but also a likely negative attitude toward the out-group.~3 To supplement the IAT results, we also obtained explicit measures of these ethnic attitudes along with measures intended to gauge participants' level of immersion in the cultures of their respective ethnicities.

Method Subjects
The subjects were 17 self-described Korean American (8 female and 9 male) and 15 Japanese American ( 10 female and 5 male ) students who participated in return for optional course credit for their introductory psychology courses at the University of Washington.Data for one of the Korean Americans were not included in analyses because of an IAT error rate of about 50%, indicative of random responding.These subjects were recruited in response to a request for volunteers belonging to the two ethnic groups.As part of the consent procedure prior to participation, subjects were informed that the experiment could reveal attitudes that they would prefer not to express and were reminded that they were free to withdraw at any time.
had been selected on the basis of their frequency in the Seattle telephone directory.Because Japanese names are typically longer than Korean names, a set of 25 truncated Japanese names was generated from the 25 selected Japanese surnames, such that for each Korean name, there was a truncated Japanese name of the same length.For example, the Japanese name Kawabashi was truncated to Kawa to match the length of the Korean name Youn while retaining the Japanese character of the name.(The three stimulus sets are presented in Appendix A.) The truncated Japanese names were used only after subjects had received several exposures to the full-length versions.Evaluative words were presented in lowercase, whereas Korean and Japanese names were presented in uppercase.The apparatus was the same as used for Experiment 1.

Procedure
IAT measures.As in Experiment l, subjects completed two IAT measures.For the first IAT measure, the target-concept discrimination was Korean names versus full-length Japanese names.For the second, the discrimination was Korean names versus truncated Japanese names.Other than the replacement of Experiment l's target-concept discriminations with the Japanese versus Korean name discrimination, Experiment 2 had only two substantial differences of procedure from Experiment I. First, the intertrial interval independent variable was dropped, and all blocks of trials were conducted with a 250 ms interval between response to one stimulus and presentation of the next.Second, combined tasks consisted of one practice block followed by two data-collection blocks (contrasted with Experiment l's use of three data-collection blocks, each with a different intertrial interval).For half of the subjects, Japanese names were initially assigned to the left key, Korean to the right; the reverse assignment was used for the remaining subjects.Throughout the experiment, all subjects responded to unpleasant words with the left key and pleasant words with the right key.(The omission of counterbalancing for key assignment was a consequence of Experiment l's finding that key assignment for the pleasant-unpleasant discrimination did not affect findings.) The second IAT differed from the first in (a) omitting practice of the pleasant-unpleasant discrimination (as in Experiment 1 ), (b) using the truncated Japanese names in place of the full-length ones, and (c) using opposite key assignments for the initial target-concept discrimination.The last of these three changes was instituted because of Experiment l's demonstration that order of performance for the target discrimination and its reversal influenced magnitude of observed IAT effect.The consequence of the change was that subjects who performed the first IAT with the Japanese + pleasant combination first performed the second IAT with the Korean + pleasant combination first.
Ethnic identity and attitude questionnaires.After the computer administered IAT tasks, subjects completed several paper-and-pencil questionnaire measures.The first three measures, which were prepared specifically for this experiment, assessed the extent to which subjects were involved in sociocultural networks that were ethnically Japanese or Korean.
The first measure asked subjects to provide initials of "up to twenty people, not family members, that you know."Subjects were instructed that listing close friends was preferable but that they could also list acquaintances.The instructions did not alert subjects to the researchers' interest in ethnicity of these acquaintances (information that was to be requested later), although subjects could well have been sensitized to

Materials and Apparatus
In addition to the 25 pleasant-meaning and 25 unpleasant-meaning words used in Experiment 1, 25 Korean and 25 Japanese surnames were used.These Korean and Japanese surnames were selected with the help of two Korean and two Japanese judges, who were asked to rate the typicality and ease of categorizing each of a larger set of surnames that ~2 All averaged correlations were computed by averaging the Fisher's Z conversions of r values, then reconverting the average of these Fisher Zs to r.
13 From 1905 to 1945, the Japanese occupied Korea, exploiting Koreans economically and repressing them politically.At present, Koreans are a discriminated against minority in Japan.ethnicity from both the inclusion of ethnic name discriminations in the IAT procedure and their knowledge of having been recruited by virtue of their ethnicity.After completing the next two measures, subjects were instructed to turn back to the list of initials and to mark each to indicate which of the following labels provided the best description: Korean, Korean American, Japanese, Japanese American, none of the above, or don't know.This acquaintances measure was scored to indicate the percentage of those listed who were ethnically Korean or Korean American and the percentage who were ethnically Japanese or Japanese American.
For the second measure, subjects were asked to indicate the number of members of their family who would be described by each of the following labels: Korean, Korean American, Japanese, Japanese American, and American.This yielded percentage scores of those mentioned who were ethnically Korean and ethnically Japanese, treating each Korean American as 50% Korean and 50% American, and similarly for Japanese Americans.
The third measure asked subjects to respond to eight yes-no items, four each concerned with Korean and Japanese language.These items asked, respectively, whether subjects could understand, speak, read, and write each language, each answered on a 3-point scale with 0 = no, 1 = somewhat, and 2 = yes.This yielded 9-point language scales (summing responses, range 0-8) for both the Korean and the Japanese language.
Next followed feeling thermometer and semantic differential measures of attitude toward Japanese and Koreans, which were identical to the corresponding measures of Experiment 1 except for the change of concepts for which responses were requested.All of the first five measures were scored by conversion to a difference score (Korean minus Japanese), for which positive values indicated numerically greater scores for the Korean submeasure.
A sixth and final questionnaire measure was the 23-item Suinn-Lew Asian Self-Identity Acculturation Scale (Suinn, Rickard-Figueroa, Lew, & Vigil, 1987).Unlike the preceding five measures, all of which yielded a comparison of involvement in or attitude toward Korean and Japanese cultures, the Suinn-Lew acculturation measure indicated involvement in Asian (relative to American) culture.

R e s u l t s a n d D i s c u s s i o n I A T E f f e c t s
Figure 3 presents Experiment 2's results separately for the counterbalanced variable of order of performing the Korean + pleasant versus Japanese + pleasant combinations, and also separately for the Korean American and Japanese American subject subsamples.The expectation for Experiment 2's data was that ethnically Korean subjects would find it more difficult to perform the Japanese + pleasant than the Korean + pleasant combination (appearing as higher white than black bars in Fig-  ure 3 ) and that the reverse should be true for ethnically Japanese subjects (higher black than white bars).Figure 3 reveals these expected patterns (higher white than black bars in the left panels; higher black than white bars in the right panels).Using the log-latency IAT-effect measure as a dependent variable, analyses for the effect of subject ethnicity yielded F( 1, 28) ratios of 28.53 and 31.93 for the subexperiments with full-length and truncated Japanese names, respectively (both p s = 10-5).There were no other significant effects in the design that included also Japanese name length (first vs. second subexperiment) and order of administration of the task combinations.The IAT effect was very similar in magnitude for the first subexperiment with full length Japanese names (mean IAT effect = 105.3ms) and the second one with truncated Japanese names (M = 92.8ms), F( 1, 27) = 0.58, p = .45.Also, there was a weak order effect of the same type found in Experiment 1: IAT effects were slightly larger when own-ethnicity + pleasant was performed first (M = 117.0ms) than when other-ethnicity + pleasant was performed first (M = 84.3ms).This difference, however, was nonsignificant, F(1, 27) = 0.37, p = .55.

IAT Compared With Explicit Measures
Table 3 presents Korean and Japanese subject means for the log-latency IAT measure, along with those for the five paper-andpencil measures that yielded Korean-Japanese difference scores, with all measures scored so that higher numbers were expected for Korean subjects.For example, the language score was computed by subtracting the 9-point measure of the subject's knowledge of Japanese language from the corresponding measure for the Korean language.Perhaps the most noteworthy result in the table is that the IAT's measure of ethnic attitudes discriminated Korean from Japanese subjects more effectively than did three of the five questionnaire measures.Only the language and family measures discriminated Japanese American from Korean American subjects with greater effect sizes (ds = 2.65 and 2.37) than did the two IAT measures (ds = 2.04 and 1.88).
Correlations of the two IAT log-latency measures with the other five measures of Table 3 are shown in Table 4.All but one correlation was in the expected positive direction.Surprisingly, the semantic differential was uncorrelated with the two IAT measures.This observation strongly suggests that the semantic differential and the IAT measured different constructs.
The strength of correlations of the implicit measures with the acquaintances, family, and language measures suggested the possibility of an analysis using individual differences within the Korean American and Japanese American subsamples.For this analysis, the acquaintances, family, and language measures were converted to absolute values and rescaled so that all were on a 0-100 scale.The acculturation measure was also converted to a 0-100 range.The resulting four measures were averaged to construct an index that was interpretable as measuring immersion in Asian culture.It was expected that the IAT effect measure should show greater Korean-Japanese differentiation for subjects who were immersed in their particular Asian culture (i.e., had high proportions of family members and acquaintances in that culture and were familiar with the language).The analysis to test this expectation is shown in Figure 4, where it can be seen that, indeed, IAT differentiation between the Korean and Japanese subsamples was greater with higher immersion in Asian culture.The test of significance for difference in slopes for the subsample regression functions in Figure 4 yielded an F( 1, 26) of 9.83, p = .004.Remarkably, the intersection of the two regression functions near the left side of Figure 4 indicates that an IAT effect of approximately zero would be expected for subjects who had zero immersion in their Asian culture.~4 14 By contrast, neither explicit measure showed the same property.Interaction F( 1, 26) ratios were 2.69, p = .l1, and 0.04, p = .85,for  .32, .37, .46, .52, and .58.Correlations between explicit attitude measures (Nos. 1 and 2) and implicit measures (Nos.3 and 4) are in bold, and correlations between implicit measures and ethnic identity measures (Nos.5, 6, and 7) are in italics.IAT = implicit association test.
Unexpectedly, the feeling thermometer explicit measure was correlated more highly with the IAT measure (average r = .59)than it was with another explicit attitude measure, the semantic differential ( r = .43).The semantic differential measure itself was uncorrelated with the IAT (average r = .04)but was modestly correlated with the three ethnic identity measures (average r = .41).Although this pattern is somewhat puzzling, it does not undermine the impressive evidence for validity of the IAT provided by the data in Figure 4. There, it can be seen that the IAT was most effective in diagnosing ethnicity for subjects who were highly involved with their Asian American culture.These findings indicate that the IAT is sensitive to the expected covariation of positivity of ethnic-name-to-evaluation associations with level of exposure to the culture of one's ethnic group.

E x p e r i m e n t 3
Experiment 3 was motivated by several previous demonstrations of automatic expressions of race-related stereotypes and attitudes that are consciously disavowed by the subjects who display them (Crosby, Bromley, & Saxe, 1980;Devine, 1989;Fazio et al., 1995;Gaertner & McLaughlin, 1983;Greenwald & Banaji, 1995;Wittenbrink, Judd, & Park, 1997).This experiment used the IAT procedure to measure an implicit attitude that might not readily be detected through explicit self-report measures.Experiment 3's IAT method combined the tasks of classifying Black versus White names and discriminating pleasant versus unpleasant word meanings.

Method Subjects
The subjects were 14 female and 12 male White American students from introductory psychology courses at the University of Washington.feeling thermometer and semantic differential measures, respectively.The nonsignificant interaction effect on the thermometer was, however, directionally the same as that for the IAT.
The students received optional course credit in return for participation.As in Experiment 2, the pre-experiment consent procedure advised subjects that the experiment could reveal attitudes that they might find objectionable and reminded them that they could withdraw at any time.

Materials and Procedure
With the exception of two unpleasant words that were changed, the 25 pleasant-meaning and 25 unpleasant-meaning words used in Experiment 3 were the same as those used in Experiments 1 and 2. Two 50item sets of first names were also used, one consisting of 25 male names that had been judged by introductory psychology students to be more likely to belong to White Americans than to Black Americans (e.g., (N = 30) as a function of an index of immersion in Asian culture that combined four measures.The trend lines are the individual regression slopes for the Korean American and Japanese American subsamples.The IAT measure is the average of the two measures obtained for each subject (one using full-length and one using truncated Japanese names).
Brandon, Ian, and Jed) and 25 male names that had been judged to be more likely to belong to Blacks than to Whites (e.g., Darnell, Lamar, and Malik).The other set consisted of 50 female first names, similarly selected (e.g., White: Betsy, Katie, and Nancy; Black: Ebony, Latisha, and Tawanda).Evaluative words were presented in lowercase and names were presented in uppercase.Appendix A contains the complete item lists.
Except for the replacement of Japanese and Korean names with Black and White names, Experiment 3 was virtually identical to Experiment 2. Like Experiment 2, Experiment 3 also contained two subexperiments, the first using male names and the second using female names.
After completing the computer-administered IAT tasks, subjects responded to five questionnaire measures of race-related attitudes and beliefs.To allow subjects to know that they would be responding in privacy, they completed these questionnaires in their experimental booths and were informed that they would be placing their completed questionnaires in an unmarked envelope before returning them to the experimenter.The measures included feeling thermometer and semantic differential measures similar to those of the previous two experiments (but targeted at the racial concepts of Black and White), the Modern Racism Scale (MRS; McConahay, Hardee, & Batts, 1981), and two measures developed by Wittenbrink, Judd, and Park (1997), their Diversity and Discrimination scales.The Diversity Scale assesses attitudes about the value of multiculturalism, and the Discrimination Scale assesses beliefs about the causes and pervasiveness of discrimination in American society.Sample items from the MRS and the Diversity and Discrimination scales are provided in Appendix B.

IAT Effects
The data of Experiment 3 (see Figure 5) clearly revealed patterns consistent with the expectation that White subjects would display an implicit attitude difference between the Black and White racial categories.More specifically, the data indicated an implicit attitudinal preference for White over Black, manifest as faster responding for the White + pleasant combination (white bars in Figure 5) than for the Black + pleasant combination (black bars).The magnitude of this IAT effect averaged 179 ms over the four White + pleasant versus Black + pleasant contrasts shown in Figure 5.For the separate tests with male names and female names, respectively, Fs( 1, 21 ) = 41.94 and 28.83, p s = 1 0 -6 and 10 -5.This finding indicates that, for the White college-student subjects of Experiment 3, there was a considerably stronger association of White (than of Black) with positive evaluation.For comparison, these effects, measured in milliseconds, were larger than those observed for the K o r e a n -Japanese contrast in Experiment 2, and even slightly larger than those for the flower-insect and instrument-weapon contrasts in Experiment 1.However, measured in log-latency units or effect sizes, Experiment 3's IAT effects were smaller than those of Experiment 1.
There were no significant effects of order of administering task combinations in Experiment 3, F s ( 1, 21 ) = 0.03 and 2.01, p s = .86and.17, respectively, for the tests with male and female names.The direction of this weak and nonsignificant effect indicated, once again, that IAT effects are slightly larger when an evaluatively compatible task combination precedes an evaluatively noncompatible one.(This assumes that for the White subjects of Experiment 3, it is appropriate to call the White +

IAT Compared With Explicit Measures
Table 5 presents the IAT measures from the two subexperiments (for male and female names) along with the feeling thermometer and semantic differential measures, each in the form of a difference score for which the value 0.0 indicates equivalent The feeling thermometer range was -99 to 99, and the semantic differential range was -6 to 6. Latency measures were transformed to natural logarithms for this analysis.IAT = implicit association test.
The effect size measure d = M -SD.Conventional small, medium, and large values of d are .2,.5, and .8,respectively.attitudes toward Black and White.The four measures in Table 5 were computed so that positive numbers would indicate preference for Black relative to White.
As can be seen in Table 5, the IAT measures indicated considerably stronger relative preference for White than did either the feeling thermometer or semantic differential measure.Remarkably, the semantic differential index indicated a virtual absence of racial preference, reminiscent of the weak sensitivity of Experiment 2's semantic differential measure to Korean versus Japanese ethnicity (see Table 3 ).The thermometer index, along with the two IAT measures, indicated statistically significant relative preference for White.The magnitude (effect size) of the pro-White preference was approximately twice as great for the IAT measures as for the thermometer measure.
Table 6 presents the correlations involving the four measures of Table 5, along with the three additional explicit (self-report questionnaire) measures that were obtained (the MRS and the Diversity and Discrimination scale measures).Scores on the three additional explicit measures were reversed relative to their usual scoring, so that high scores on all seven measures would indicate pro-Black attitudes or beliefs.All correlations were therefore expected to be positive.The five explicit measures (feeling thermometer, semantic differential, the MRS, and the Diversity and Discrimination scales) formed a cluster that accounted for all of the correlations that were greater than .50(average r = .50).By contrast, the average correlation of explicit measures with implicit measures was r = .14. Consistent with the results of Experiment 1, this again indicates a divergence between the constructs assessed by the implicit and explicit measures.
An important purpose of Experiment 3 was to determine whether the IAT would reveal an implicit White preference among subjects who explicitly disavowed any Black-White evaluative difference.Figure 6 provides a scatter plot that relates the semantic differential measure of racial evaluative preference to the average of Experiment 3's two IAT measures, q'~vo striking features of Figure 6 indicate that the IAT may indeed implicitly reveal explicitly disavowed prejudice.First, Figure 6 indicates that a majority of Experiment 3's White subjects (19 of 26) explicitly endorsed a position of either Black-White indifference (zero on the semantic differential) or Black preference (a positive semantic differential score).Second, it can be seen in Figure 6 that all but one of these subjects had negative IAT scores, indicating White preference.Indeed, only one of the 26 White subjects had a positive IAT score.At the same time that these findings are encouraging in regard to usefulness of the IAT to measure implicit attitudes, they are discouraging in indicating the pervasiveness of unconscious forms of prejudice.
In Experiment 3, the implicit measures were no more than weakly correlated with explicit measures of either attitude (feeling thermometer and semantic differential, average r = .17) or racist belief measures (MRS and Diversity and Discrimination scales; average r = .12).Although these correlations provide no evidence for convergent validity of the IAT, nevertheless-because of the expectation that implicit and explicit measures of attitude are not necessarily correlated--neither do they damage the case for construct validity of the IAT.
Of course, construct validity of the IAT measure cannot be assumed just from the suspicion that virtually all White Americans may have automatic negative associations to African American names.There is a plausible alternative interpretation: that  Experiment 3's White college student subjects were much less familiar with the African American stimulus names than they were with the White-American stimulus names.This differential familiarity, coupled with the expectation of greater liking for more familiar stimuli (Zajonc, 1968), could explain the IAT results.This possible alternative to the implicit racism interpretation is considered further in the General Discussion.

General Discussion
Each of the present three experiments produced findings consistent with the supposition that the IAT procedure is sensitive to automatic evaluative associations.These findings are encouraging in regard to usefulness of the IAT to measure implicit attitudes but do not establish that usefulness beyond doubt.Key issues still to be considered are (a) the IAT's immunity to selfpresentation forces and (b) possible alternative interpretations of IAT results in terms of variables that may be confounded with evaluative differences among the categories examined in the three experiments.

Immunity to Self-Presentational Forces
All three experiments used two explicit self-report measures of attitude that could be compared with the IAT measures.These two measures were a feeling thermometer measure that used a 100-point scale single-item rating for each category used in the experiment and a semantic differential measure that averaged ratings for each category on five 7-point bipolar evaluative items.Comparison of results obtained for the IAT measures and these self-report measures provides important indications that the IAT may be more resistant to self-presentational factors than are the explicit measures.Experiment l's attitude objects were familiar semantic categories for which evaluations are widely shared and presumably not socially sensitive.Subjects should have had little concern about being perceived as liking flowers more than insects or as liking musical instruments more than weapons.For the feeling thermometer and semantic differential explicit measures, indeed, subjects apparently had no reluctance to express these expected attitudes.Effect sizes for Experiment l's explicit measures (mean d = 1.68) were greater than the average effect sizes for the IAT log-latency measures (mean d = 1.50; see Table 1 ).
Experiment 2 sought to assess socially more sensitive attitudes involving mutual ethnic regard of Japanese Americans and Korean Americans.By contrast with Experiment 1, the average effect sizes were substantially smaller for the two explicit measures (mean d = 0.49) than for IAT measures (mean d = 0.99; see Table 3).'5 Experiment 3 assessed a presumably even more socially sensitive attitude domain, involving the Black-White racial evaluative contrast for White American subjects.In Experiment 3, effect sizes for the two explicit measures were even smaller (mean d = 0.30) than those in Experiment 2 and were considerably smaller than Experiment 3's IATomeasured effect sizes (mean d = 1.13).
The much greater variation across experiments in effect sizes of explicit measures, relative to those of the IAT measures, suggests that the explicit measures might have been more responsive to self-presentational forces that can mask subjects' attitudes.Because of the anonymity and privacy conditions under which both the IAT and explicit-measure data were collected in all three experiments, the self-presentation forces operating in them may belong more in the category of private self-presentation (self-presentation to self: Breckler & Greenwald, 1986;Greenwald & Breckler, 1985) than in the category of impression management (self-presentation to others).

Convergent Validity of IAT Attitude Measures
A measure's convergent validity is established by demonstrating that it displays theoretically expected correlations with other measures.In Experiment 1, an expected correlation was demonstrated in that the IAT effect measures were in agreement with common views regarding evaluative differentiations among semantic categories (such as weapons vs. musical instruments).In Experiment 2, the expected correlation was in the relationship of an IAT measure of attitude difference between Korean and Japanese ethnicities and subjects' self-described ethnic identities.Further, this correlation was moderated in theoretically expected fashion by subjects' level of immersion in the cultures of their ethnic groups (Figure 4).Unlike the known-groups design of Experiment 2, Experiment 3 had a single subject group, White Americans.For this group, the IAT indicated an implicit in-group preference (for Whites, relative to Blacks) that was expected on the basis of others' investigations of implicit attitudes (Crosby et al., 1980;Devine, 1989;Fazio et al., 1995; ~5 The effect sizes in Table 3 are for differences between two subject samples, Korean American and Japanese American.The mean ds of 0.49 and 0.99 were obtained by dividing Table 3's effect sizes in half, making them more directly comparable to the one-sample effect sizes available for Experiments 1 and 3. Gaertner & McLaughlin, 1983;Greenwald & Banaji, 1995;Wittenbrink, Judd, & Park, 1997), even though it was not expressed on the explicit (self-report) attitude measures of Experiment 3.

Discriminant Validity of IAT Attitude Measures
Two issues relating to discriminant validity merit consideration.The first is evidence bearing on the supposition that the IAT and the self-report measures assessed different constructs that might be identified, respectively, as implicit and explicit attitudes.Second is evidence bearing on the possibility that the IAT procedure is sensitive (in an undesired fashion) to differential familiarity with the stimulus items used to represent target concepts.

Explicit Versus Implicit
In addition to the convergent validity evidence obtained in the form of the expected patterns of results just described, each experiment also examined correlations of IAT measures of implicit attitudes with semantic differential and feeling thermometer measures of explicit attitudes.On average, these two explicit measures were better correlated with each other (average r = .60)than they were with the IAT measures of the same attitudes (average r = .25).It is clear that these implicit-explicit correlations should be taken not as evidence for convergence among different methods of measuring attitudes but as evidence for divergence of the constructs represented by implicit versus explicit attitude measures.

Differential Familiarity With IAT Stimuli
In all three experiments, target-concept stimuli for IAT measures were words or names that were associated with naturally occurring categories.This allowed possible confounding of implicit attitude differences with any other differences that existed naturally among the stimulus words or names used for the various categories.The most obvious possible confounding was that of positive evaluation with amount of prior exposure to the target concept stimuli.This possible confounding raises a concern about discriminant validity: Does the IAT measure implicit attitude, or is it an artifact of amount of exposure to the stimuli used to represent target concepts?
In both Experiments 2 and 3, it was virtually certain that subjects were more familiar with names associated with their own ethnic group than with names associated with the contrasting group.For example, the Japanese American and Korean American subjects in Experiment 2 were undoubtedly more familiar with names of their own ethnicity than the other, and the White subjects in Experiment 3 were similarly more familiar with the White first names used in that experiment than with the contrasting Black names.
Although it is plausible that IAT measures possibly tapped prior exposure differences in Experiments 2 and 3, this alternative explanation cannot apply to Experiment 1.In Experiment 1, the evaluatively negative categories (insects and weapons) consisted of words that have substantially higher frequency in the language than did the words used for the evaluatively positive categories (flowers and musical instruments).Thus, even if rela-tive familiarity of stimulus items plays some role in the IAT effect, it cannot explain the full set of findings for all three studies.This aspect of Experiment l's design notwithstanding, it is desirable to pursue alternative strategies to resolve the discriminant validity question concerning differential item familiarity.~6

Comparison of IAT With Other Automatic Evaluation Measures
The chief method previously investigated for the assessment of automatic evaluative associations is evaluative semantic priming (e.g., Bargh et al., 1992;Fazio et al., 1986;Greenwald et al., 1989).In the evaluative priming method, subjects classify each of a series of target words based on the target word's evaluative meaning, with each target word immediately preceded by a to-be-ignored prime word.Prime-target evaluative congruence facilitates responding to the target, producing variations in response latencies that can be used to measure automatic evaluation of the prime category.The more a category of words speeds judgments of positive evaluated targets or hinders judgments of negatively evaluated targets, the more evaluative positivity is indicated for that category.Studies of evaluative priming have used prime stimulus categories much like the target-concept categories of the present experiments.Perdue and Gurtman (1990) examined automatic evaluation associated with the prime categories of old and young.Perdue et al. (1990) contrasted automatic evaluation evoked by words representing concepts of in-group (such as we or us) and out-group (they or them).Fazio et al. (1995) used an evaluative priming method to assess relative automatic evaluations toward Black and White race categories.
In comparing usefulness of the IAT method with that of the priming method, it is appropriate to compare effect sizes obtained by the two procedures with similar materials.The priming studies of Fazio et al. (1986), Perdue and Gurtman (1990), Perdue et al. (1990), andFazio et al. (1995) were considered suitable for comparison with the present research, although only one of three experiments in the last of these provided latency data that could be used for comparison.Treating each of the seven comparison priming experiments as an independent estimate, and combining them in unweighted fashion, yielded an average priming effect (latency difference for evaluative-category contrasts) of 64.0 ms, with an average effect size of d = .62.For comparison, the IAT effects in the present three experiments averaged 153.5 ms, with effect sizes averaging d = 1.21.(These figures are unweighted averages of data from the present three experiments as given in Tables 1, 3, and 5, halving the figures in Table 3 in order to treat the data from the Korean and Japanese subsamples as individual subexperiments.)This comparison suggests that the IAT method has about twice the priming method's sensitivity to evaluative differences.The implications of a doubling of effect size are substantial, perhaps ~6 Preliminary findings of experiments using multiple strategies to examine the effect of item familiarity have, so far, produced findings indicating that the implicit in-group preferences observed in Experiments 2 and 3 are not artifacts of greater familiarity with in-group-related stimuli (Dasgupta, McGhee, Greenwald, & Banaji, 1998).chiefly because doing so permits experiments at fixed levels of statistical power to be conducted with a quarter of the sample size.Of course, it would be much superior to compare the IAT and priming methods' effect sizes in a single experiment, using the same stimulus categories with each method.
IAT measures share some important properties with semantic priming measures: (a) Both procedures measure attitude as the evaluative difference between two categories (target concepts in the IAT and priming item categories in semantic priming), and (b) the procedure juxtaposes items from categories for which an attribute is to be measured (target concepts in the IAT, or priming categories in priming) with items that have wellestablished attribute values (attribute categories in the IAT and target items in priming).

Effect of Procedural Variables on IAT Order of Task-Compatibility Combinations
Experiment 1 tested the impact of five procedural factors on the IAT's sensitivity to evaluative associations.Only one procedural variable was demonstrated to moderate the IAq2. the order of performing compatible and noncompatible conceptattribute combinations.When a compatible combination (for example, pleasant + flowers) precedes a noncompatible one (pleasant + insects), the IAT's measure of evaluative difference between the positive (flowers) and negative (insects) concepts is increased.Although this compatibility-order effect was statistically significant only in Experiment 1, it was also found directionally in Experiments 2 and 3.This procedural effect does not appear to undermine the IAT's sensitivity to individual differences in implicit attitudes, but it does compromise the location of a zero point.For example, a person truly characterized by no implicit attitude difference between the Black and White racial categories would appear to be mildly pro-White if given an IAT in which White + pleasant preceded Black + pleasant but would appear mildly pro-Black if this ordering were reversed.Fortunately, the effect of this procedural variable appears to be removable by reducing the number of trials used in each component of the IAT.As already mentioned, the effect was statistically nonsignificant in Experiments 2 and 3, both of which used reduced numbers of trials in the critical combined task portions of the IAT.Subsequent (as yet unreported) data collections indicate that the compatibility-order effect can be eliminated completely by further reducing the numbers of combined-task trials.

Category Set Sizes
Of the several procedural factors tested in Experiment 1 and found not to influence IAT measures, perhaps the most practically significant was the variation of 5 versus 25 items used to represent each category in Experiment 1.If the IAT can be administered equally effectively with 5-item and 25-item categories, it should be relatively easy to extend its method to new domains in which there may be relatively few items available to represent either target concepts or associated attributes.It remains possible, also, that the IAT may be successfully usable with even fewer than five items per category.

Extension of the IAT Method to Stereotypes and Self-Concept
A reason for strong interest in the IAT method is its potential for easy extension both to additional attitude-object categories and to attribute dimensions other than evaluation.For example, by using male versus female names as the target concept pair and replacing the pleasant-unpleasant attribute contrast of the present experiments with a strong-weak contrast, the IAT method can be used to assess a stereotypic differentiation between males and females on the strong-weak attribute dimension (Rudman, Greenwald, & McGhee, 1996).By using me versus not me (i.e., self vs. other) as the target-concept contrast together with the pleasant-unpleasant contrast, one can obtain a measure of evaluative associations that underlie self-esteem (Farnham & Greenwald, 1998;Farnham, Greenwald, & Banaji, in press).By combining the self-other target concepts with any of various attribute dimensions, one should also be able to determine whether each attribute dimension is associated with a person's self-concept.This last possibility offers a new method for measuring the self-schema construct that was introduced by Markus (1977).

Conclusion
Findings of three experiments consistently confirmed the usefulness of the IAT (implicit association test) for assessing differences in evaluative associations between pairs of semantic or social categories.The findings also suggested that the IAT may resist self-presentational forces that can mask personally or socially undesirable evaluative associations, such as the ethnic and racial attitudes investigated in Experiments 2 and 3.The IAT method offers the further advantage of being adaptable to assess a wide variety of associations, including those that comprise stereotypes and self-concept.

Figure 1 .
Figure1.Schematic description and illustration of the implicit association test (IAT).The IAT procedure of the present experiments involved a series of five discrimination tasks (numbered columns).A pair of target concepts and an attribute dimension are introduced in the first two steps.Categories for each of these discriminations are assigned to a left or right response, indicated by the black circles in the third row.These are combined in the third step and then recombined in the fifth step, after reversing response assignments (in the fourth step) for the target-concept discrimination.The illustration uses stimuli for the specific tasks for one of the task-order conditions of Experiment 3, with correct responses indicated as open circles.

Figure 2
Figure 2 displays mean latencies for the nine successive tasks of Experiment 1 (see Overview section), presented separately

Figure 2 .
Figure 2. Mean (untransformed) latency results of Experiment 1 (N = 32), separately for subjects who performed at evaluatively noncompatible combinations before evaluatively compatible ones (upper panel) and those who performed compatible combinations first (lower panel).Data were combined for subjects for whom the first implicit attitude test (IAT) measure used a target discrimination of flowers versus insects and those for whom the first target discrimination was weapons versus instruments.Because results were indistinguishable for the two targetconcept discriminations (flower vs. insect and instrument vs. weapon) data for both were collapsed over this design factor in the figure.The first block that introduced each new discrimination or combined task was treated as practice and not included in the figure.Error bars are standard deviations for the 16 subjects contributing to each mean.

Figure 3 .
Figure 3. Mean (untransformed) latency results of Experiment 2, separately for 16 Korean American and 15 Japanese American subjects and for subjects who received the two orders of presentation for ownethnicity +pleasant combination and other-ethnicity + pleasant combinations.Error bars are within-cell standard deviations for the 7 to 9 observations (subjects) contributing to each mean.IAT = implicit association test.

Figure 5 .
Figure 5. Mean untransformed latency data of Experiment 3 (N = 26).Results are shown separately for subjects who performed the White + pleasant combination first (n = 13) and those who performed the Black + pleasant combination first (n = 13).Error bars are standard deviations for the 13 observations included in each mean.IAT = implicit association test.

Figure 6 .
Figure 6.Relationship of semantic differential and implicit association test (IAT) measures of Black-White evaluative preference.Data are from Experiment 3 (N = 26 White American subjects).Both measures have meaningful zero points that indicate absence of preference.The major feature of the data is the indication of substantial White preference on the IAT measure.

Table 1
Summary Statistics for Difference-Score Attitude Indexes Note.Positive scores indicate preference for flowers relative to insects, and musical instruments relative to weapons.The thermometer range was -99 to 99, and the semantic differential range was -6 to 6. IAT = implicit association test.
a The effect size measure d = M + SD.Conventional small, medium, and large values of d are .2,.5, and .8,respectively.

Table 2
Correlations Among Implicit and Explicit Attitude Measures trast, below the diagonal are those for the instrument-weapon contrast, and on the main diagonal, in italics, are correlations between corresponding measures for the two contrasts.Correlations between explicit and implicit attitude measures are printed in bold.All measures were scored so that higher scores indicate more positive attitude toward flowers or musical instruments.N = 32 for all correlations; two-tailed p values of• 10, .05,.01,.005,and.001areassociated, respectively, with r values  of .30,.35,.45,.49,and.56.IAT = implicit association test.

Table 3
Summary Statistics for Difference Scores in Comparison of Ethnicity Discrimination by Seven Measures b Standard deviation is the pooled within-cell values for the two-group (Japanese vs. Korean) design.c The effect size measure d is computed by dividing the Korean minus Japanese mean difference by the pooled standard deviation.Conventional small, medium, and large values for d are .2,.5, and .8,respectively.d For t tests, degrees of freedom varied from 26 to 30 depending on sample size (see Note a).

Table 4
Correlations Among Explicit and Implicit Measures of Ethnic Attitudes and Measures of Acculturation Measures are the same as those in Table3, scored so that higher scores are expected for ethnically Korean than for ethnically Japanese subjects.N = 31 (16 Korean, 15 Japanese), reduced to 28, 29, or 30 for correlations involving Measures 5-7.For N = 28, two-tailed p values of .10,.05,.01,.005,and .001are associated, respectively, with r values of'

Table 5
Summary Statistics for Difference Score Attitude Indexes Note.Positive scores indicate preference for Black relative to White.

Table 6
Correlations Among Implicit and Explicit Measures of Racial Attitudes and Explicit Measures of Racist Beliefs Scores on Measures 5-7 were reversed (relative to their usual scoring) so that high scores on all measures would indicate pro-Black attitudes or beliefs.N = 26 for all correlations; two-tailed p values of .10,.05,.01,.005,and.001areassociated,respectively, with r values of .33,.39,.50,.54,and.61.Correlations between explicit and implicit attitude measures are printed in bold, and correlations of implicit measures with racist belief measures are in italics.IAT = implicit association test.