The Implicit Association Test (IAT; Greenwald et al., 1998) is one of the most used measures for the implicit assessment of socio-psychological constructs. The main fields of application are in social psychology, where the IAT is often employed to indirectly investigate the attitudes towards different social groups. Additionally, the IAT is used to assess food and brand preferences (see Epifania et al., 2022a, for a review of the main fields of application of the IAT). In both fields, the measure provided by the IAT is used to predict behavioral outcomes, such as intergroup relations (e.g., Dovidio et al., 2002) or food choice (e.g., Perugini, 2005). However, the IAT has shown poor ability to predict behavioral outcomes (e.g., Meissner et al., 2019), potentially because of its typical scoring method (i.e., the so-called D score; Greenwald et al., 2003). If the poor ability of the IAT to predict behaviors is ascribable to its typical scoring method, the estimates obtained with more statistically sound approaches should result in better predictions. In this contribution, a Rasch analysis based on Linear Mixed-Effects Models (LMMs) is introduced to address the across-trial variability in the IAT data and to obtain reliable measures for accurate predictions of behaviors.
The IAT assesses the strength of the associations between targets and evaluative dimensions by considering the speed and accuracy with which prototypical exemplars of two targets (e.g., Coke and Pepsi images in a Coke-Pepsi IAT) and two evaluative dimensions (Good and Bad attributes) are assigned to their own category in two contrasting conditions. In one condition, Coke and Good exemplars are assigned with the same key, while Pepsi and Bad exemplars are assigned with the opposite key. In the contrasting condition, Pepsi and Good exemplars are assigned with the same key, while Coke and Bad exemplars are assigned with the opposite key. The task is expected to be easier (i.e., responses should be faster and more accurate) in the condition consistent with one’s own automatically activated association. The D score (Greenwald et al., 2003) is usually employed to express the IAT effect (i.e., the difference in the performance between the two conditions). It is an effect size measure obtained by standardizing the difference between the average response time in the two conditions by the standard deviation computed on the pooled trials of both conditions.
The IAT effect as expressed by the D score has been found to have poor ability to predict behaviors. This can be ascribed to different factors, including the measure provided by the D score, the construct assessed by the IAT (Meissner et al., 2019), and the type of behavioral outcomes (Perugini, 2005). Additionally, the fully-crossed structure of the IAT (Westfall et al., 2014) might compromise the predictive ability of its measure. If the fully-crossed design of the IAT and its related sources of dependency are not properly addressed, biased estimates are obtained, the importance of experimental effects is confused with random noise, and the probability of committing Type I error is inflated (Judd et al., 2017; Wolsiefer et al., 2017). Averaging across trials in each associative condition, the D score is highly sensitive to the across-trial variability related to stimuli heterogeneity, and it cannot address the fully-crossed design of the IAT (Wolsiefer et al., 2017). This can be accounted for by employing Linear Mixed Effect-Models (LMMs) with appropriate random structures. Additionally, LMMs allow for obtaining parametrizations from accuracy and log-time responses that are conceptually close to the Rasch (Rasch, 1960) and the log-normal (van der Linden, 2006) models, respectively. These models disentangle the unique contribution of the respondent and the stimulus to the observed response, hence providing fine-grained information at both levels.
Information at the stimulus level allows for investigating the contribution of each stimulus to the IAT effect as well as the representativeness of each stimulus. Indeed, stimulus representativeness of its own category is a key feature for a correct functioning of the IAT (Bluemke & Friese, 2006; Nosek et al., 2005). Selecting the most informative and representative stimuli can help in reducing the across-trial variability, and could allow for designing better functioning and briefer IATs.
In this study, the predictive abilities of the estimates obtained with LMMs and the D score are compared. The predictive abilities of D scores computed on all stimuli and D scores computed only on the most (or the least) informative stimuli are compared, as well. To these ends, an IAT for the implicit assessment of the chocolate preference was used (Chocolate IAT). The most and the least informative stimuli are identified by considering the difference in their parameters between conditions (see e.g., Anselmi et al., 2013). Stimuli showing a higher difference in their parameters between conditions are considered to be more informative than those with a smaller difference in their parameters between conditions.
Method
Participants
Seventy-six university students (F = 71.05%, Mean age = 24.02 ± 2.88 years) volunteered to take part in the study. Respondents did not receive any incentives for their participation.
Materials and Procedure
The script used for running the experiment, the stimuli, and the data can be accessed at the Supplementary Materials section. Twenty-six attribute stimuli (13 Good and 13 Bad exemplars) and 7 chocolate images graphically modified to represent either dark or milk chocolate (7 Dark and 7 Milk chocolate images) were used. Sixty trials were presented in each associative condition (i.e., Dark-Good/Milk-Bad–DGMB–and Milk- Good/Dark-Bad–MGDB–conditions). No feedback followed incorrect responses.
The chocolate preferences were explicitly investigated with two items (i.e., How much do you like dark chocolate? and How much do you like milk chocolate?) evaluated on a 6-point Likert-type scale (0—Not at all, 5—Very much). Respondents were asked about their food habits and behaviors through 6 items (example item: I am usually on a diet, Cronbach’s α = 0.80) rated on a 4-point agreement Likert-type scale (1—Strongly disagree, 4—Strongly agree). High scores indicate high care for food habits. At the end of the experiment, participants were offered with dark or milk chocolate. Their choices were registered after they left the laboratory.
Data Cleaning and D Score
Exclusion criteria based on accuracy (Nosek et al., 2002) and time responses (Greenwald et al., 2003) were applied. The IAT was scored with the D4 algorithm (Greenwald et al., 2003), which was computed with the online app DscoreApp (Epifania et al., 2020). Positive D scores denote a preference for dark chocolate relative to milk chocolate.
Model Specifications
According to the Rasch model (Rasch, 1960), the observed accuracy response of respondent p (p ∈ {1, . . ., P }) to stimulus s (s ∈ {1, . . ., S}) depends on respondent’s ability (i.e., the respondent’s ability parameter θ) and stimulus difficulty (i.e., the stimulus difficulty parameter b). In the IAT, the higher the ability parameter θ of respondent p, the higher the ability of respondent p to perform the categorization task. The higher the difficulty parameter b of stimulus s, the lower the probability of s to be assigned to the correct category. The probability of a correct response of respondent p to stimulus s depends on the distance between respondent and stimulus parameters (i.e., θp−bs). It is larger than .50 when θp > bs, smaller than .50 when θp < bs, and equal to .50 when θp = bs.
Similar to the Rasch model, in the log-normal model (van der Linden, 2006) the observed log-time response depends on the characteristics of the respondent (speed parameter τ) and those of the stimulus (time intensity parameter δ). In the IAT case, the lower the speed parameter τ of respondent p, the higher the time spent by respondent p on the task (i.e., lower speed). The lower the time intensity parameter δ of stimulus s, the lower the time respondents spend in responding to stimulus s. The expected log-time response is a function of the distance between respondent and stimulus parameters (i.e., δs–τp). The expected log-time response is lower than, faster than, and equal to the observed log-time response when δs > τp, δs < τp, and δs = τp, respectively.
Rasch-like and log-normal parametrizations can be obtained by using Generalized Linear Mixed-Effects Models (GLMMs) with logit link functions applied to accuracy responses and Linear Mixed Effects Models (LMMs) applied to log-time responses, respectively. In these applications, respondent and stimulus parameters are summed (i.e., θp + bs and δs + τp). This parametrization of the accuracy responses is consistent with that of linear test models (LLTM, see e.g., Fischer, 1973; Scheiblechner, 1972). The higher the value of stimulus parameter b, the easier stimulus s is (i.e., the higher the number of correct responses registered on stimulus s is), such that parameter b is considered as an easiness parameter. The lower the value of parameter τ, the faster respondent p is. The suitability and usefulness of this approach for analyzing IAT data has already been proved (e.g., Epifania et al., 2022b).
Rasch-like and log-normal parametrizations depend on the factors specified as random, which account for the variability in the data. The fixed intercept is set at 0 (i.e., none of the levels of the fixed slope—the associative condition—is taken as the reference level). Further details on the procedure and on the random structures of the models are reported in the Appendix. Table 1 summarizes the Rasch-like and log-normal parameters attainable from each model random structure.
Table 1
Rasch-Like Parametrization
|
Log-Normal Parametrization
|
|||
---|---|---|---|---|
Model | Respondents | Stimuli | Respondents | Stimuli |
1 | Overall (θp) | Overall (bs) | Overall (τp) | Overall (δs) |
2 | Overall (θp) | Condition–specific (bsc) | Overall (τp) | Condition–specific (δsc) |
3 | Condition–specific (θpc) | Overall (bs) | Condition–specific (τpc) | Overall (δs) |
Note. p ∈ {1, . . ., P }, s ∈ {1, . . ., S}, c ∈ {1, . . ., C} denote any respondent, stimulus, condition (P, S, and C are the number of respondents, stimuli, and conditions, respectively.)
In Model 1, the random intercepts of respondents and stimuli are specified to account for the between—respondents and the between—stimuli variabilities across–conditions. This model yields overall respondent (θp or τp) and stimulus (bs or δs) parameters across associative conditions. Model 1 is expected to be the best fitting one when low between–conditions variability is observed at both respondent and stimulus levels (i.e., neither respondents’ performance nor stimuli functioning vary between associative conditions).
Specifying stimulus random slopes in associative conditions and respondent’s random intercepts across conditions, Model 2 accounts for the within–stimuli between–conditions variability and the between–respondents across–conditions variability. This model yields overall respondent (θp or τp) and condition–specific stimulus (bsc or δsc, where c denotes the associative condition) parameters. Model 2 is expected to be the best fitting model when high within–stimuli between–conditions variability is observed. This suggests that the IAT effect is mostly due to variations in stimuli functioning between conditions. The difference between condition–specific stimulus estimates allows for investigating the contribution of each stimulus to the IAT effect.
Model 3 addresses the within–respondents between–conditions variability and the between– stimuli across–conditions variability by specifying respondent’s random slopes in associative conditions and stimulus random intercepts across conditions. Model 3 yields condition–specific respondent (θpc or τpc) and overall stimulus (bs or δs) parameters. Model 3 is expected to be the best fitting model when high within–respondents between–conditions variability is observed, this suggesting that the IAT effect is mostly due to the changes in respondents’ performance between conditions. The difference between respondent condition–specific estimates allows for investigating the bias on respondents’ performance due to the IAT associative conditions.
The models were applied to the Chocolate IAT data. In what follows, the models applied to accuracy responses are identified by a capital “A”. Those applied to log-time responses are identified by a capital “T”. No correction was applied on the incorrect time responses for estimating the log-normal models. Models were fitted with the lme4 package (Bates et al., 2015b) in R (Version 3.5.1, R Core Team, 2018). Simple R scripts for estimating these models from any IAT are available at the Supplementary Materials section.
Results
Two participants showed more than 25% of incorrect responses in at least one associative condition (Nosek et al., 2002). The final sample consisted of 74 participants (F = 71.62%, Mean age = 24.08 ± 2.88 years). The 41.90% of the participants chose milk chocolate.
Accuracy Models
Model comparison is reported in the top panel of Table 2. BIC suggests a better fit of Model A1 compared to model A2, whereas AIC, Log-likelihood, and Deviance suggest a better fit of Model A2. Thus, Model A2 was chosen. This model provides overall Rasch-like respondent ability (θp) and condition–specific stimulus easiness (bMGDB and bDGMB) estimates. In this application, the ability estimates θp can be considered as accuracy-based measures of the respondents’ preference. Condition MGDB showed higher probability of correct responses (log-odds = 3.67, SE = 0.14, z = 26.15, p < .001) than condition DGMB (log-odds = 2.61, SE = 0.10, z = 27.26, p < .001). Between–respondents variability was 0.33. Stimuli showed higher variability in the MGDB condition (σ2 = 0.21) than in the DGMB condition (σ2 = 0.01). The condition–specific stimulus random effects were weakly correlated (r = .20).
Table 2
Model | AIC | BIC | Log-Likelihood | Deviance |
---|---|---|---|---|
Accuracy | ||||
A1 | 3627.70 | 3656.10 | −1809.90 | 3619.70 |
A2 | 3625.58 | 3668.10 | −1806.80 | 3613.60 |
A3 | Failed to converge | |||
Log-time | ||||
T1 | 7856.45 | 7891.91 | −3923.23 | 7846.45 |
T2 | Aberrant estimates | |||
T3 | 7159.23 | 7208.87 | −3572.62 | 7145.23 |
The condition–specific easiness estimates are reported in Table 3.
Table 3
Good attributes | bDGMB | bMGDB | bDGMB−bMGDB | δs | Bad attributes | bDGMB | bMGDB | bDGMB−bMGDB | δs |
---|---|---|---|---|---|---|---|---|---|
joya | 2.62 | 4.02 | −1.40 | 0.01 | hatea | 2.59 | 3.85 | −1.26 | 0.01 |
happinessa | 2.64 | 4.03 | −1.39 | 0.02 | failurea | 2.68 | 3.93 | −1.25 | 0.07 |
pleasurea | 2.56 | 3.70 | −1.15 | 0.01 | terriblea | 2.64 | 3.89 | −1.24 | 0.04 |
peace | 2.64 | 3.77 | −1.14 | −0.03 | disaster | 2.66 | 3.90 | −1.24 | 0.07 |
heaven | 2.63 | 3.77 | −1.14 | 0.08 | bad | 2.58 | 3.73 | −1.15 | 0.07 |
marvelous | 2.66 | 3.79 | −1.13 | 0.05 | horrible | 2.62 | 3.76 | −1.14 | 0.05 |
laughter | 2.67 | 3.76 | −1.10 | 0.06 | evil | 2.63 | 3.74 | −1.11 | 0.10 |
good | 2.66 | 3.74 | −1.08 | 0.01 | disgust | 2.60 | 3.70 | −1.11 | 0.01 |
glory | 2.57 | 3.57 | −1.00 | 0.02 | nasty | 2.59 | 3.33 | −0.74 | 0.04 |
love | 2.62 | 3.58 | −0.96 | 0.02 | ugly | 2.60 | 3.32 | −0.72 | −0.01 |
excellentb | 2.64 | 3.59 | −0.95 | 0.01 | painb | 2.58 | 3.23 | −0.65 | 0.05 |
beautyb | 2.61 | 3.46 | −0.85 | 0.02 | annoyingb | 2.58 | 3.05 | −0.47 | 0.08 |
wonderfulb | 2.62 | 3.45 | −0.83 | 0.09 | agonyb | 2.57 | 2.49 | 0.08 | 0.04 |
M (SD) | 2.63 (0.03) | 3.71 (0.17) | −1.09 (0.17) | 0.03 (0.03) | M (SD) | 2.61 (0.03) | 3.53 (0.41) | −0.92 (0.40) | 0.05 (0.03) |
Dark Chocolate | bDGMB | bMGDB | bDGMB−bMGDB | δs | Milk Chocolate | bDGMB | bMGDB | bDGMB−bMGDB | δs |
Dark5a | 2.56 | 3.94 | −1.38 | −0.12 | Milk3a | 2.60 | 3.95 | −1.35 | −0.04 |
Dark2a | 2.60 | 3.82 | −1.23 | −0.11 | Milk6a | 2.66 | 3.99 | −1.33 | −0.04 |
Dark6a | 2.55 | 3.72 | −1.16 | −0.10 | Milk4a | 2.53 | 3.80 | −1.27 | −0.04 |
Dark4 | 2.62 | 3.62 | −1.00 | −0.07 | Milk2 | 2.57 | 3.61 | −1.04 | −0.06 |
Dark3b | 2.58 | 3.53 | −0.95 | −0.08 | Milk5b | 2.62 | 3.64 | −1.02 | −0.05 |
Dark7b | 2.58 | 3.41 | −0.83 | −0.07 | Milk1b | 2.62 | 3.62 | −1.01 | −0.03 |
Dark1b | 2.49 | 3.27 | −0.78 | −0.11 | Milk7b | 2.54 | 3.49 | −0.95 | −0.04 |
M (SD) | 2.57 (0.03) | 3.62 (0.22) | −1.05 (0.20) | −0.10 (0.02) | M (SD) | 2.59 (0.05) | 3.73 (0.17) | −1.14 (0.17) | −0.04 (0.01) |
Note. DGMB: Dark-Good/Milk-Bad condition; MGDB: Milk-Good/Dark-Bad condition. Rows are ordered by increasing values of bDGMB−bMGDB. The units of the easiness estimates are the log-odds, the units of the time intensity estimates are the log-seconds.
aStimuli that, according to the condition–specific easiness estimates, contributed the most to the IAT effect.
bStimuli that, according to the condition–specific easiness estimates, contributed the least to the IAT effect.
Stimuli were easier in the MGDB condition than in the DGMB one, MMGDB = 3.64 ± 0.29, MDGMB = 2.60 ± 0.04; t(40) = −21.97, p < .001, 95% CI [−1.13, −0.94]. A linear model was specified to investigate the effect of the stimulus categories on the difference between condition–specific easiness estimates, which can be considered as an accuracy-based measure of the IAT effect. An overall significant effect of the stimulus categories was found, F(4, 36) = 139.80, p < .001, Adjusted R2 = 0.93. Milk and Good exemplars contributed the most to the IAT effect, BMilk = −1.13, SE = 0.11, t(36) = −10.84, p < .001; BGood = −1.09, SE = 0.08, t(36) = −14.10, p < .001). Bad and Dark exemplars contributed the least (BBad = −0.92, SE = 0.07, t(36) = −11.98, p < .001; BDark = −1.05, SE = 0.11, t(36) = −9.97, p < .001.
Log-Time Models
Model comparison is reported in the bottom panel of Table 2. Model T3 was chosen, providing overall stimulus time intensity (δs) and respondent condition–specific speed estimates (τMGDB and τDGMB) of the log-normal model. Responses were faster in the MGDB condition (B = −0.36, SE = 0.02, t = −15.01) than in the DGMB condition (B = −0.12, SE = 0.03, t = −4.28). The between–stimuli variability was extremely low (σ2 = 0.004). Respondents showed similar variabilities in DGMB and MGDB conditions ( = 0.05; = 0.03), and their random effects were moderately correlated (r = .40). A linear model was specified to investigate the effect of the stimulus categories on the time intensity estimates (Table 3). An overall significant effect of the stimulus categories was found, F(4, 36) = 37.41, p < .001, Adjusted R2 = 0.78. The exemplars of both targets required the least amount of time to get a response (BDark = −0.09, SE = 0.01, t(36) = −8.99, p < .001; BMilk = −0.04, SE = 0.01, t(36) = −4.09, p < .001), whereas exemplars of both evaluative dimensions required the largest amount of time (BBad = 0.05, SE = 0.01, t(36) = 6.20, p < .001; BGood = 0.03, SE = 0.01, t(36) = 3.70, p < .001).
Relationship Between Model Estimates, D Scores, and Explicit Measures
A speed-differential was obtained by taking the difference between the condition–specific speed estimates, which can be considered as a latency-based measure of the IAT effect. Positive values indicated higher speed in the DGMB condition than in the MGDB condition. Results of Pearson’s correlations between explicit measures, D scores, and model estimates are reported in Table 4. Explicit chocolate evaluations strongly correlated with D scores and condition–specific speed estimates.
Table 4
Condition | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
1 - Explicit Milk | ||||||
2 - Explicit Dark | −0.51∗∗∗ | |||||
3 - D score | −0.43∗∗∗ | 0.51∗∗∗ | ||||
4 - τDGMB | 0.12 | −0.43∗∗∗ | −0.60∗∗∗ | |||
5 - τMGDB | −0.36∗∗ | 0.14 | 0.42∗∗∗ | 0.42∗∗∗ | ||
6 - θp | 0.01 | 0.18 | 0.06 | 0.07 | 0.18 | |
7 - Speed-differential | −0.41∗∗∗ | 0.55∗∗∗ | 0.95∗∗∗ | −0.67∗∗∗ | 0.39∗∗∗ | 0.07 |
Note. τ: speed estimate; θ: Accuracy-based measure of respondents’ preference, DGMB: Dark-Good/Milk-Bad condition; MGDB: Milk-Good/Dark- Bad condition; Speed-differential: τMGDB−τDGMB.
**p < .01. ***p < .001.
The accuracy-based measure of the respondent’s preference correlated neither with explicit chocolate evaluations nor with any of the condition–specific speed estimates or the D score. As such, it appears these estimates cannot be considered as an indicator of the implicit preference of the respondents. High speed in the MGDB condition correlated with positive milk chocolate evaluations, and not with the dark chocolate evaluations. Similarly, high speed in the DGMB condition correlated with positive dark chocolate evaluations, and not with the milk chocolate evaluations. This suggests that the performance in each associative condition is mostly driven by the associations between one of the two chocolates and positive attributes. In this sense, the like for each of the two chocolates has a major importance in influencing the responses.
Choice Prediction
The predictive abilities of model estimates and D scores were compared. Two data sets were created from the full-length data set by selecting the responses to the three stimuli of each category that contributed the most (stimuli in Table 3 marked with a) or the least (stimuli in Table 3 marked with b) to the IAT effect. The D4 algorithm was computed on both data sets. The predictive abilities of differential measures (i.e., D scores and speed-differential) and of their single components (i.e., MMGDB and MDGMB of the D scores, τDGMB and τMGDB of the speed-differential) were investigated. All predictors were checked for collinearity by computing Variance Inflation Factors (VIFs). The D score was collinear with the speed differential, the two condition–specific speed estimates, and the condition–specific average response times (VIFs > 10). Condition–specific speed estimates were not collinear between each other (VIFs < 4.00), but they were collinear with condition–specific average response times. Condition–specific speed and average response times, D score, and speed differential were not collinear with food habits and preference estimates (VIFs < 4.00). Given the high collinearity between the predictors (i.e., the D score and the other time-based predictors, namely the condition–specific speed estimates, the condition–specific average response times, and the speed differential), they were entered in separate models. As such, eight logistic regression models were specified. Preference estimates and food habits of the respondents were included in all starting models. Either the D score, the speed differential, the condition–specific speed estimates, or the condition–specific average response times were included in the same model. Relevant predictors were selected with backward deletion. Model general accuracy (i.e., percentage of choices correctly identified by the model), model dark chocolate choice (DCC) accuracy (i.e., percentage of DCCs correctly identified by the model), and model milk chocolate choice (MCC) accuracy (i.e., percentage of MCCs correctly identified by the model) were computed on the models resulting from backward deletion (Table 5).
Table 5
Predictors | B | SE | Nagelkerke R2 | General | DCC | MCC |
---|---|---|---|---|---|---|
Intercept | −1.65∗∗ | 0.51 | 0.26 | 66% | 70% | 61% |
D score | −2.03∗∗∗ | 0.60 | ||||
Intercept | −1.65∗∗∗ | 0.48 | 0.26 | 68% | 72% | 61% |
Speed-differential | −5.02∗∗∗ | 1.43 | ||||
Intercept | −1.76∗∗∗ | 0.52 | 0.30 | 70% | 74% | 65% |
D score (Best) | −2.07∗∗∗ | 0.58 | ||||
Intercept | −1.23∗∗∗ | 0.42 | 0.18 | 69% | 72% | 65% |
D score (Worst) | −1.40∗∗∗ | 0.47 | ||||
Single components | ||||||
Intercept | −0.23 | 1.36 | 0.27 | 65% | 74% | 52% |
MDGMB | 0.00∗∗ | 0.01 | ||||
MMGDB | −0.01∗∗ | 0.01 | ||||
Intercept | −2.05∗ | 0.74 | 0.27 | 72% | 74% | 68% |
τDGMB | 4.73∗∗∗ | 1.48 | ||||
τMGDB | −5.99∗∗∗ | 1.98 | ||||
Intercept | −0.17 | 1.61 | 0.30 | 65% | 74% | 52% |
MDGMB (Best) | 0.00∗∗∗ | 0.01 | ||||
MMGDB (Best) | −0.01∗ | 0.01 | ||||
Intercept | 0.61 | 1.23 | 0.16 | 64% | 77% | 45% |
MDGMB (Worst) | 0.00∗ | 0.01 | ||||
MMGDB (Worst) | 0.00∗ | 0.01 |
Note. Best: Highly contributing stimuli data set; Worst: Lowly contributing stimuli data set; τ: Speed; Speed-differential: τMGDB−τDGMB; DGMB: Dark-Good/Milk-Bad condition; MGDB: Milk-Good/Dark-Bad condition; General: General accuracy of chocolate choice predictions; DCC: Dark Chocolate Choice Accuracy; MCC: Milk Chocolate Choice Accuracy.
*p < .05. **p < .01. ***p < .001.
Speed-differentials and D scores resulted in similar predictive accuracies. “Best” and “Worst” data sets D scores provided more accurate predictions than full data set D scores. The “Best” data set D scores explained the highest proportion of variance. Condition–specific speed estimates resulted in the highest MCC accuracy.
Final Remarks
This study investigated whether the predictive ability of the IAT could be enhanced with statistical models able to account for its fully-crossed structure. The results suggested that the proposed modeling framework can improve the predictive ability of the IAT while providing information on respondent’s performance and stimulus functioning. This information can be further employed to reduce the across-trial variability due to stimuli heterogeneity, thus leading to better functioning, more informative, and potentially briefer IATs.
The stimulus functioning in respect to both its own category and other categories can be investigated through stimulus time intensity estimates. The within–category variability allows for identifying the most and the least representative stimuli of each category, whereas the between–category variability suggests different times for processing target and attribute exemplars that potentially contribute to the across-trial variability.
Condition–specific easiness estimates suggested that the IAT effect in the Chocolate IAT was mostly driven by Good and Milk exemplars. Consistently, the correlations between condition–specific speed estimates and differential measures pointed at a major influence of the speed in the MGDB condition. The correlations between speed estimates and explicit chocolate evaluations further suggested that the performance in each condition was mostly influenced by positive attributes. As such, it can be speculated that the IAT effect is mostly driven by a milk chocolate preference, but the performance in each condition is mostly influenced by the associations of positive attributes with one of the two chocolates. The ability of the model estimates to disentangle the component(s) mostly involved in the performance at the IAT might have a high resonance in both marketing and applied social psychology. In the former field, it can clarify whether the obtained results are mostly due to the preference for one of two contrasting brands and help in designing ad hoc marketing campaigns. In the latter one, it can disentangle whether the performance at the IAT is mostly due to in-group preference rather than outgroup derogation. Understanding whether individuals more easily associate the in-group with positive attributes rather than the outgroup with negative ones has important practical implications.
Previous studies have stressed the sensitivity of the IAT to the stimulus properties, suggesting that valid IATs can be obtained with a small number of highly informative and representative stimuli (Bluemke & Friese, 2006; Nosek et al., 2005). In this application, the selection of highly contributing stimuli allowed for reducing the across-trial variability, such that the number of trials was minimized while the information that could be gathered from the IAT was maximized. This unveils the possibility of reducing the length of the IAT without losing information and/or impairing its validity. Reducing the stimuli heterogeneity also resulted in D scores better able to predict the behavioral outcome. The D scores computed on the most informative data set explained the highest proportion of variance and provided better predictions than the D scores computed on the full-length data set. Interestingly, also the D scores computed on the least informative data set better predicted the choice than the full-length D scores. We speculate that by reducing the stimuli heterogeneity and the across trial variability, more reliable D scores can be obtained because the sources of error variance are accounted for. Being more reliable, the D scores obtained on reduced data sets can better predict behavioral outcomes than those obtained on full data sets, which are affected by error variance. This result might further stress the sensitivity of the D score to the across-trial variability. However, further investigations on this topic are needed.
In this study, the target categories (i.e., dark chocolate and milk chocolate) were quite homogeneous. The modeling framework helped in highlighting the stimuli with a different functioning in respect to the stimuli belonging to the same category and those that mostly contributed to the IAT effect (i.e., the stimuli that presented a high difference in their easiness estimates between conditions). This information contributed to get a better understanding of the IAT measure, and to reduce the across-trial variability, leading to a better prediction of the behavioral outcome. When target categories are more heterogeneous (as it could be, e.g., race), the proposed modeling framework can identify the malfunctioning stimuli and those that mostly contribute to the IAT effect (Epifania et al., 2021). A reduction of the across-trial variability can be expected also in the case of heterogeneous categories, but it might not directly result in better predictions of behavioral outcomes. In these cases, the heterogeneity of the categories might require a larger collection of stimuli to appropriately represent them and to efficiently predict behavioral outcomes of interest. Future studies should investigate the functioning of the proposed modeling framework when heterogeneous categories are used.
The comparisons between the full-length IAT and the short IATs based on the responses from the same starting data set constitutes the main limitation of the study. In future studies, two IATs could be designed, one including only highly representative stimuli, the other one including only poorly representative stimuli. If the results are replicated with these IATs, further evidence on the importance of the representativeness of the stimuli and about the D score sensitivity to the across-trial variability would be obtained.
Other models that can concurrently account for accuracy and time responses have been applied to the IAT data, namely the Diffusion Model (DM; Klauer et al., 2007) and the Discrimination Association Model (DAM; Stefanutti et al., 2013, see also the four-counter DAM; Stefanutti et al., 2020). DM and DAM consider the performance of the respondents at the IAT as the result of different processes, each of which is expressed by its own parameter. As such, both models provide in-depth information concerning the individual differences of the respondents. However, no information at the single stimulus level is available, but only at the stimulus categories level. On the other hand, the modeling framework introduced in this contribution results in fine-grained information also at the individual stimulus level, which in turn allows for the investigation of the stimuli representativeness of their own category as well as of their contribution to the IAT effect. A limitation of this study is that it does not provide a direct comparison between the information resulting from the DAM or the DM and that resulting from the modeling framework proposed here. Such a comparison could be of interest for future studies.
The convergence failure of Model A3 and the aberrant estimates obtained with Model T2 raise concerns and should be considered as a potential drawback of the modeling framework introduced in this contribution. Convergence failure or aberrant estimates suggest that the model could not find a solution, usually because of a lack of variability in the data (i.e., the random structure of the model requires a higher variability than that observed in the data, Bates et al., 2015a). The poor variability in the accuracy performance of the respondents (SD = 0.11) might have caused the convergence failure of Model A3. Similarly, the poor variability in the response times of the stimuli (SD = 0.02) might have caused the degenerate solution of Model T2.