Multilevel models (also known as “hierarchical,” “random effects,” and “mixed effects” models) for clustered data1 have become a staple analytic approach in the psychological and educational sciences (Antonakis et al., 2021; Bauer & Sterba, 2011; Luo et al., 2021) as well as other social science disciplines (Bell & Jones, 2015; Bell et al., 2019). Not only can multilevel models (MLMs) account for non-independence in residual errors, they are also able to flexibly disentangle within- and between-cluster relationships, allowing researchers to test context-driven research questions, such as “Does therapist training predict individual-level patient mental health outcomes?” or “Does school climate predict individual-level student achievement?” Moreover, MLMs can model more than two levels of data, model data that are cross-classified or partially nested, and be useful in estimating both slope and intercept variance (i.e., relaxing the assumption of homogeneity of slopes).
Despite their popularity, quantitative methodologists have pointed out that the use of MLM can be unnecessarily complex for clustered data when MLM model assumptions are untenable (especially in smaller samples) or when the substantive research questions are focused primarily on level 1 (L1) in a 2-level hierarchy, such as “Does a patient’s own treatment fidelity predict their mental health?” or “Does a student’s own perception of school belongingness predict their academic achievement?” While a few MLM alternatives exist2, unilevel regression (also known as single-level and fixed effects models) with cluster-robust standard errors (CR-SEs) is perhaps the most convenient and flexible alternative for handling residual error non-independence without MLM-specific assumptions (Bell et al., 2019; Hayes & Cai, 2007; Huang, 2016; McNeish et al., 2017). Nonetheless, applied researchers may ignore clustering completely in circumstances when they find the variance due to clustering in the outcome (Y) is not statistically significant (Raudenbush & Bryk, 2002, pp. 63–64; Snijders & Bosker, 2012, p. 22), or if the “design effect” computed for Y is < 1.5 (Lai & Kwok, 2015). Yet, the assumption of independence for any model pertains to the residual errors, not Y itself, and dependencies in X will contribute to inducing correlated residuals, also known as the endogeneity problem (e.g., Grilli & Rampichini, 2011). Further, even when correlated residuals stemming from Y or X are appropriately handled using MLM or a unilevel alternative, researchers may be quite unaware that their L1 predictor slope estimates are still prone to bias as a result of what we call the “blended slope” problem (e.g., Hamaker & Muthén, 2020)—an issue more well known in the multilevel literature that occurs when there is clustering in X (a non-zero intraclass correlation) and different X–Y relations at L1 and L2 (which cannot be readily known a priori).
The present paper examines the “blended slope” problem in terms of how it applies differentially to unilevel regression models (Mundlak, 1978) that do or do not take clustering into account, as compared with multilevel models (Raudenbush & Bryk, 2002), with specific attention on the independent contribution of cluster dependencies in X that might be ignored by researchers if there are little or no observed dependency in Y. First, we review the problem mathematically for the 2-level case; next, we review how to fix the problem. Thereafter, we use results from a Monte Carlo simulation to illustrate how the problem manifests in different analytic approaches to modeling clustered data3. Consistent with others (Antonakis et al., 2021; Bell et al., 2019; Enders & Tofighi, 2007; Hamaker & Muthén, 2020), our simulation results also demonstrate how cluster-mean centering L1 predictors can solve the problem in both MLMs and their unilevel alternatives4.
The “Blended Slope” Problem
Fairly recent simulation evidence by McNeish (2014) appeared to suggest that X–Y regression slope coefficient estimates are equivalent for MLM and unilevel fixed effects regression models; however, the simulations were designed so that within-cluster (L1) and between-cluster (L2) X–Y relations were identical. In instances where this is not the case, other research has shown that regression coefficient estimates of L1 predictors can be greatly distorted, to varying degrees, whenever X–Y relationships differ5 at L1 and L2 (Mundlak, 1978; Scott & Holt, 1982). In this circumstance, if an L1 predictor’s slope is estimated using a regression, the estimate will be a hodgepodge of the true within- and between-cluster slopes, which we call a “blended slope” ( ), expressed as follows (Raudenbush & Bryk, 2002, p. 137):
1
In Equation 1, is a weighted sum of the within-cluster L1 X-Y relationship ( ) and the between-cluster L2 X-Y relationship ( ). It is further assumed that and are estimated as follows:
2
and
3
where i = 1, 2, 3, …, individuals within clusters, and clusters as j = 1, 2, 3, ..., clusters.
Inspecting Eq. 1 makes clear that greater between-cluster relations relative to within-cluster relations results in a blended slope that is larger than the true within-cluster slope, whereas the reverse is true when between-cluster relations are smaller than within-cluster relations. As such, the blended slope will be a biased estimator of the within-cluster slope whenever within- and between-cluster relationships differ (Figure 1A). By contrast, the blended slope is not problematic when the within- and the between-cluster relationships are similar (Figure 1B).
Figure 1A
Hypothetical X-Y Relationships for Data with J = 5 Clusters by Between-Within Relations: Within- and Between-Cluster Slopes Differ
Figure 1B
Hypothetical X-Y Relationships for Data with J = 5 Clusters by Between-Within Relations: Within- and Between-Cluster Slopes are Same
Note. In Panels 1A and 1B, the blue data clouds and individual blue lines represent within-cluster (L1) X–Y slopes (Bw), and the black solid line represents the between-cluster (L2) X–Y slopes (Bb). The red dashed line in Panel A represents the “blended slope” (Btotal) that would be estimated when either the L1 predictor is not cluster-mean centered and/or when the L2 cluster means (aggregates) of the L1 predictor are not included in the model.
Importantly, the weight, , in Equation 1 plays a role in the magnitude of slope bias. For unilevel regression models that ignore clustering altogether, this weight is defined by Raudenbush and Bryk (2002, p. 137; see also Snijders & Bosker, 2012, p. 31) as:
4
where:
5
Here, is the ratio of sum of squared deviations between clusters to the total sum of squared deviations in the predictor. That is, the proportion of total variability in X explained by clustering.
For 2-level (random intercept) MLMs, Scott and Holt (1982) showed the population weight for estimating slope bias is:
6
Here, represents the population intraclass correlation (ICC) among residual errors due to clustering when unilevel regression is conducted, represents the ICC among predictor scores due to clustering, and denotes cluster size, assuming that the within-cluster sample sizes ( ’s) are equal across clusters.
In practice, both s in Equation 6 are values (as in Equation 5), rather than ICCs. Further, the population weight in Equation 6 can be computed for samples as:
7
The foregoing now leads to the connection between the blended slope weight given in Equation 4 and Equation 7. Specifically, the term involving the correlated residual error can be reinterpreted as the proportion of observed correlated residual error that would be accounted for by the modeling method chosen. In the case of unilevel regression, correlated residual error would not be taken into account during model parameter estimation, whereas in MLM regression, residual clustering due to dependencies in Y are taken into account via the random intercept term. Thus, the formula for the blended slope weight in Equation 7 reduces to the unilevel regression weight shown in Equation 4 because the term is zero in this context:
8
therefore:
9
To summarize, the two common factors that drive the slope bias in both MLMs and MLM alternatives include: 1) greater differences in the within- and between-cluster X-Y relations (i.e., L1 vs. L2 slopes), and 2) greater dependencies in the predictor (X) due to clustering ( ). In addition, a careful comparison of the bias weight formulas for MLMs (Equation 7) and unilevel models (Equation 8) shows that L1 slope bias in MLMs will be less pronounced than in unilevel models when: (a) there are dependencies present in Y, and (b) when the data involve larger cluster sizes (m). Specifically, in MLM regression, residual clustering due to ICCY is explicitly modeled via the random intercept term such that the term yields a value less than 1 (depending on ICCY); in contrast, this term is always 1 in a unilevel regression. In short, as ICCY increases, for MLMs, the numerator in Equation 7 will become smaller and the denominator will become larger, resulting in a smaller weight of the difference in L1 and L2 slopes (Equation 1), all else being equal. Second, the impact of the decrease in the term on as ICCY increases (for MLMs) is amplified by the average cluster size (m). As m increases, all other things equal, the denominator becomes larger, resulting in an even smaller weight and a lower level of L1 slope bias. In short, when ICCY or m are relatively high, for MLM models with uncentered L1 predictors will tend toward zero, and the blended slope reduces to just the within-cluster slope.
Fixing the Problem
As shown in Equation 1, the “blended slope” ( ) is simply a weighted linear combination of within- and between-cluster components of an L1 predictor. However, if an L1 predictor (X) is cluster-mean (or “group-mean”) centered—as distinct from grand-mean centered—all clustering effects are removed and the predictor becomes the within-cluster component of X, . When the original predictor is aggregated into cluster means, it becomes the between-cluster component of X, , which now contains all cluster-specific information about X. Further, the separated components of X ( and are uncorrelated with each other. In fact, including either just , a combination of X and , or both and , eliminates the slope bias problem, both for unilevel and MLM regression approaches. Although Enders and Tofighi (2007) and others have discussed this in the multilevel literature (Bell et al., 2019; Hamaker & Muthén, 2020; Raudenbush & Bryk, 2002, pp. 31–35; Snijders & Bosker, 2012, p. 58), in practice, researchers are still not using/reporting how they are dealing with the problem, whether it be cluster-mean centering and/or use of aggregates, even in MLMs (Antonakis et al., 2021; Luo et al., 2021). More importantly, we find very rare discussion regarding the blended slope problem (or how to solve it) within the methodological literature on CR-SEs as an alternative approach to MLMs for modeling clustered data (exceptions include Bell et al., 2019, and McNeish, 2019).
Correlated Residual Error Due to X (Also Known as Endogeneity)
Putting aside the blended slope problem, it is important to note that dependencies in L1 (within-cluster) predictors, not just outcomes, can cause correlated residual errors (Cronbach, 1976; Mundlak, 1978), which is also known as the endogeneity problem (e.g., Bell & Jones, 2015; Grilli & Rampichini, 2011). Even when there are no dependencies in Y, residual error dependencies can be induced by dependencies in X via the shared relationship between X and Y. This may occur, for example, in outcomes that have more biologically based etiologies such as information processing, memory, or attention (Fiedorowicz et al., 2001), which are unlikely to vary across L2 units such as therapists or schools. Yet, individualized (L1) treatment characteristics or interventions used as predictors would likely vary across L2 units (e.g., differences among provider techniques or school policies).
Specifically, the magnitude of cluster dependency in X ( ) will be directly transferred to dependencies in the predicted values of the outcome, since . Thus, the magnitude of the residual error dependencies, , will be a function of and the proportion of variance in Y explained by X, . Of course, it is more complex in typical analyses where there are many predictors with differing dependency levels and relationships with Y and each other. However, as noted above, when L1 predictor cluster-mean centering is used, and/or L2 aggregates are added, dependencies in Xs will be removed such that dependencies in Y associated with dependencies in Xs will also be removed.
Monte Carlo Simulation
To illustrate how the blended slope problem (and associated standard error distortion) manifests in both multilevel/random effects models (MLMs) and unilevel/fixed effects models (with and without cluster-robust standard errors), we conducted a Monte Carlo simulation (an applied analysis example is featured in the Supplementary Materials). Our results will show that ignoring cluster dependencies in X when there are different Level 1 (L1) and Level 2 (L2) X–Y relations yields biased slopes and distorted standard errors, especially for unilevel models. Further, because unilevel regression with CR-SEs only adjusts standard errors for residual error clustering, the bias in the slope coefficient will still be identical to that of a regression model that does not employ CR-SEs (although models using CR-SEs have more accurate standard errors). Last, we also demonstrate how simple cluster-mean centering L1 predictors resolves the slope bias problem.
Data Generation
Data were generated using Mplus 8 (Muthén & Muthén, 2017) within the MplusAutomation package in R (Hallquist & Wiley, 2018) as 2-level, 2-predictor models in which Y and both Xs are continuous. We varied seven crossed factors, with 500 replications per cell. Factor levels were chosen to capture null conditions in addition to realistic research scenarios to maximize generalizable results. Manipulated conditions were:
-
J number of clusters (30, 100).
-
m cluster sizes (5, 100).
-
Intraclass correlation (ICC) in X (.01, .20,.40).
-
ICC in Y (.01, .20, .40).
-
Within-cluster total (.00, .20, .50), with each predictor contributing half the proportion each, and with positive and negative slopes for non-null conditions.
-
between-cluster total (.00, .20, .50), with each predictor contributing half the proportion each, and with positive and negative slopes for non-null conditions.
-
predictor-predictor correlations at each level (.00, .20, .40).
Intraclass Correlations
ICCs were induced by setting specific values at L2 for the variance of X and the residual variance of Y, each calculated a priori for every design cell. Specifically, we assumed predictors had separable within- and between-cluster effects on the outcome using latent disaggregation (Figure 2), as follows:
10
Figure 2
2-Level, 2-Predictor Model With Predictors (Xs) and Outcome (Y) Measured at L1
where the ith score in the jth cluster is a function of the sum of the conditional mean ( ), the within-cluster (cluster-mean centered, L1) and between-cluster (aggregate, L2) fixed effects of the first predictor ( and , respectively), the within- and between-cluster fixed effects of the second predictor ( and , respectively), the deviation between the cluster mean and the grand mean intercept ( ), and the deviation between the ith score and its jth cluster mean ( ).
Each of the L1 variables were fixed to have unit variances ( ), and L2 variances ( ) were set to achieve a desired ICC using the Jak (2019) approach:
11
The residual outcome variances were calculated from a desired value:
12
13
Given the data generation process, we could not set X or Y ICCs to be exactly zero; instead, our null ICCs were set to .01.
Regression Coefficients
Regression coefficients were derived for each combination of and conditions using eigenvalue decomposition (e.g., Pedhazur, 1997). Both predictors were assumed to have the same population slope values; parameter values are provided in the Supplementary Materials.
Analyses
For each condition, 500 datasets were simulated and saved using Mplus’ Monte Carlo utility (via MplusAutomation in R) assuming a 2-level, 2-predictor linear latent variable model shown in Equation 10 (Figure 2). Each dataset was analyzed using six approaches6:
-
Unilevel linear regression model without standard error adjustment, predictors un-centered (L1uc).
-
Unilevel linear regression model with cluster-robust standard errors (CR-SEs), predictors un-centered (L1CRuc).
-
Multilevel linear regression (random intercept model), predictors un-centered (L2uc).
-
Unilevel linear regression model without standard error adjustment, predictors cluster-mean centered (aggregate predictors not included) (L1c).
-
Unilevel linear regression model with cluster-robust standard errors (CR-SEs), predictors cluster-mean centered (aggregate predictors not included) (L1CRc).
-
Multilevel linear regression (random intercept model), and predictors cluster-mean centered (aggregate latent predictors were included in this analysis because it was the population generation model with which to compare with the other analyses; nevertheless, aggregates are uncorrelated with cluster-mean centered predictors and thus L1 slopes and their standard errors are unaffected by their inclusion) (L2c).
In Mplus, TYPE = GENERAL was employed for analysis approaches (1) and (4); for approaches (2) and (5) we used TYPE = COMPLEX, and for approaches (3) and (6) we used TYPE = TWOLEVEL. Maximum likelihood estimation was used for all analyses. Example code files are provided in the Supplementary Materials.
Results
For each analysis approach and condition, slope coefficient and standard error estimates were averaged across the 500 replications. Evaluation of the correlation between the results for the two predictors across all analysis approaches and conditions showed that the two estimated slopes and their standard errors were correlated at .99 or higher; thus, without loss of generality we report slope parameter estimate and standard error bias7 for the first predictor throughout.
Slope Coefficient Relative Bias
Relative bias for the L1/within-cluster predictor slope estimates was tabulated for non-null conditions in which 10% or 25% variance in Y was explained by each L1/within-cluster predictor in isolation (20% and 50% total variance explained jointly by both within-cluster predictors). Specifically, relative bias was computed as the difference between the estimated slope and the true within-cluster slope, divided by the true within-cluster slope; a relative bias of zero indicates no bias, whereas positive values indicate overestimation and negative values indicate underestimation. (Hoogland & Boomsma, 1998 suggest that relative bias exceeding ± 5% for point estimates is cause for concern.)
Table 1 displays mean relative bias across conditions for each of the six analyses, paneled by X and Y ICCs. As can be observed, for un-centered predictor analyses (top three rows), the unilevel approaches exhibited identical bias, with increased bias solely a reflection of increased dependencies in X. For MLM with un-centered predictors, the bias due to dependencies in X was much lower than in the unilevel approaches. Not surprisingly, when we visualize the bias by the true differences in the within- and between-cluster slopes (Figure 3), we see that: 1) there is no bias when the ICC in X is close to zero, 2) there is no bias when cluster-mean centering is used, and, 3) for un-centered predictor analyses, the magnitude of bias (underestimation of the within-cluster slope) increases markedly as the ICC X increases and also as the differences between L1 and L2 slopes increases (identically for positive or negative differences). As can also be observed in both Figure 3 and Figure 4A/Figure 4B (which collapses across slope difference conditions), the bias in the un-centered MLM slope disappears for larger clusters whereas the bias in the un-centered unilevel analyses is large and constant across cluster sizes (Figure 4A/Figure 4B).
Table 1
Within-Cluster (L1) Slope Coefficient Relative Bias for Analysis Approach by Y and X ICC Combinations
ICC Y = .01
|
ICC Y = .20
|
ICC Y = .40
|
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ICC X = .01
|
ICC X = .20
|
ICC X = .40
|
ICC X = .01
|
ICC X = .20
|
ICC X = .40
|
ICC X = .01
|
ICC X = .20
|
ICC X = .40
|
||||||||||
Analysis | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) |
Un-centered Predictor | ||||||||||||||||||
1L | -.01 | (.01) | -.19 | (.04) | -.39 | (.05) | -.01 | (.05) | -.19 | (.19) | -.39 | (.24) | -.01 | (.08) | -.19 | (.32) | -.39 | (.39) |
1L-CRSE | -.01 | (.01) | -.19 | (.04) | -.39 | (.05) | -.01 | (.05) | -.19 | (.19) | -.39 | (.24) | -.01 | (.08) | -.19 | (.32) | -.39 | (.39) |
2L | -.01 | (.01) | -.08 | (.06) | -.14 | (.13) | .00 | (.02) | -.03 | (.06) | -.08 | (.09) | .00 | (.01) | -.01 | (.06) | -.04 | (.08) |
Cluster-Mean Centered Predictor | ||||||||||||||||||
1L | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) |
1L-CRSE | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) |
2L | .00 | (.00) | -.01 | (.01) | -.01 | (.01) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) | .00 | (.00) |
Note. N = 240 non-null conditions/cell, 500 replications/condition. 1L = unilevel regression; 1L-CRSE = unilevel regression with cluster-robust standard errors; 2L = multilevel random intercept model. Values closer to 0 are better; values within 0±.05 are considered acceptable.
Figure 3
Slope Coefficient Relative Bias for Analyses by Within- and Between-Cluster (L1-L2) Slope Differences when = 0
Note. N = 40 non-null conditions/cell with X-X correlations = 0 (for ease of interpretation), per analysis, 500 replications/condition. Horizontal axis reflects standardized L1-L2 true slope differences grouped as: < -0.25, -0.25 to +0.25, and > +0.25 standard deviations in differences. L1uc (CR/nCR) = unilevel regression, un-centered predictors, with and without cluster-robust standard errors; L2uc = multilevel random intercept model, un-centered predictors; L1c (CR/nCR) = unilevel regression, cluster-mean centered predictors, with and without cluster-robust standard errors; L2c = multilevel random intercept model, cluster-mean centered predictors. Solid red line delineates 0 = no bias; values within 0 ± .05 (red dashed lines) are considered acceptable.
Figure 4A
Slope Coefficient Relative Bias for Analyses by Cluster Size: ICCs by Cluster Size
Note. See note to Figure 4B.
Figure 4B
Slope Coefficient Relative Bias for Analyses by Number of Clusters: ICCs by Number of Clusters
Note. N = 360 non-null conditions/cell, per analysis, 500 replications/condition. L1uc (CR/nCR) = unilevel regression, un-centered predictors, with and without cluster-robust standard errors; L2uc = multilevel random intercept model, un-centered predictors; L1c (CR/nCR) = unilevel regression, cluster-mean centered predictors, with and without cluster-robust standard errors; L2c = multilevel random intercept model, cluster-mean centered predictors. Solid red line delineates 0 = no bias; values within 0 ± .05 (red dashed lines) are considered acceptable.
Across all of these results, the relative bias we observed was negative (underestimation) because our simulation design assumed the within-cluster predictor had a greater relationship with the outcome than the between-cluster aggregate predictor. Had we designed the simulation to have greater between-cluster (L2) predictor-outcome relations than within-cluster (L1) relations, bias would be in a positive direction (overestimation).
Last but not least, slope bias was completely removed for MLM and unilevel regression analysis approaches by cluster-mean centering the predictors.
Slope Standard Error Empirical Bias
Because standard errors are unique to each condition and analysis approach, we computed empirical bias as the ratio of the mean of the model-implied standard errors to the expected standard error, with the expected standard error based on the empirical standard deviation of the slope parameters (e.g., Bell & Jones, 2015). Values of 1 indicate no bias, positive values indicate upward bias, and negative values indicate downward bias (Hoogland & Boomsma, 1998 suggest that relative bias exceeding ±10% (below 0.9 or above 1.1) for standard errors is cause for concern).
Table 2 summarizes empirical bias for the mean slope standard error estimates. Across analyses, MLM and unilevel CR-SE approaches showed reasonable slope standard errors. Only the unilevel approach without adjustment exhibited poor standard errors: when no centering was used, the standard errors were biased downward as a function of both X and Y ICCs (Figure 5A/Figure 5B); when cluster-mean centering was used, the bias from X ICC vanished (but not bias from Y ICC).
Table 2
Within-Cluster (L1) Slope Standard Error Empirical Bias for Analysis Approach by Y and X ICC Combinations
ICC Y = .01
|
ICC Y = .20
|
ICC Y = .40
|
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ICC X = .01
|
ICC X = .20
|
ICC X = .40
|
ICC X = .01
|
ICC X = .20
|
ICC X = .40
|
ICC X = .01
|
ICC X = .20
|
ICC X = .40
|
||||||||||
Analysis | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) |
Un-centered Predictor | ||||||||||||||||||
1L | 1.01 | (.03) | .77 | (.21) | .65 | (.26) | .95 | (.06) | .65 | (.26) | .56 | (.27) | .91 | (.10) | .58 | (.28) | .49 | (.27) |
1L-CRSE | 1.01 | (.03) | .98 | (.03) | .97 | (.03) | 1.01 | (.03) | .99 | (.03) | .98 | (.03) | 1.00 | (.03) | .98 | (.03) | .97 | (.03) |
2L | 1.01 | (.03) | 1.00 | (.03) | .99 | (.04) | 1.01 | (.03) | 1.01 | (.03) | 1.00 | (.03) | 1.01 | (.02) | 1.01 | (.02) | 1.01 | (.03) |
Cluster-Mean Centered Predictor | ||||||||||||||||||
1L | 1.05 | (.03) | 1.05 | (.03) | 1.05 | (.03) | 1.22 | (.06) | 1.22 | (.06) | 1.22 | (.06) | 1.47 | (.10) | 1.47 | (.10) | 1.47 | (.10) |
1L-CRSE | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) | 1.01 | (.03) |
2L | 1.03 | (.02) | 1.02 | (.02) | 1.02 | (.02) | 1.02 | (.02) | 1.01 | (.02) | 1.01 | (.02) | 1.02 | (.02) | 1.01 | (.02) | 1.01 | (.02) |
Note. N = 240 non-null conditions/cell, per analysis, and 500 replications/condition. 1L = unilevel regression; 1L-CRSE = unilevel regression with cluster-robust standard errors; 2L = multilevel random intercept model. Values closer to 1 are better; values within 1 ± .10 are considered acceptable.
Figure 5A
Standard Error Empirical Bias for Analyses by Cluster Size: ICCs by Cluster Size
Note. See note to Figure 5B.
Figure 5B
Standard Error Empirical Bias for Analyses by Number of Clusters: ICCs by Number of Clusters
Note. N = 360 non-null conditions/cell, per analysis, 500 replications/analysis. L1uc (CR/nCR) = unilevel regression, un-centered predictors, with and without cluster-robust standard errors; L2uc = multilevel random intercept model, un-centered predictors; L1c (CR/nCR) = unilevel regression, cluster-mean centered predictors, with and without cluster-robust standard errors; L2c = multilevel random intercept model, cluster-mean centered predictors. Solid red line delineates 1 = no bias; values within 1 ± .10 (red dashed lines) are considered acceptable.
Discussion
The present study demonstrates how the “blended slope” problem manifests differentially in unilevel and multilevel models (MLMs) for varied levels of within- and between-cluster (L1 and L2) slope values, and varied levels of dependencies in predictors (Xs) and outcomes (Ys). Our results show clearly that bias can occur in either analysis when clustering in X is ignored. What both our mathematical review and simulation results also reveal is that ignoring dependencies in predictors in MLMs is less pronounced than in unilevel models as cluster sizes increase, and as dependencies in Y increase. We consider these findings in the context of three likely data analytic circumstances that researchers might currently be implementing: 1) when a lack of dependency in Y is the decision rule for ignoring clustering altogether; 2) when a unilevel model with adjusted standard errors is used, but the researcher unwittingly uses grand-mean centering or no centering for their L1 predictors; and 3) when a multilevel model (MLM) is used but again, the researcher is unaware that there may be differences in L1 and L2 X-Y relations and fails to cluster-mean center their L1 predictors (or add L2 aggregates).
In the first circumstance, dependencies in X will cause distorted slope coefficient estimates and distorted slope standard error estimates. With respect to coefficient distortion, the predictor slope will be underestimated when the true within-cluster (L1) X-Y relation is larger than the true between-cluster (L2) X-Y relation, which is more likely to be the case when the theoretical focus is on individual differences. In other words, researchers will erroneously infer that the X-Y relationship effect size is smaller than it is in truth. In our Monte Carlo simulation design, the within-cluster relation was designed to be stronger than the between-cluster relation and hence under-estimation was observed as the difference in within- and between-cluster slopes increased (Table 1). Irrespective of slope bias, when clustering is completely ignored using unilevel regression, the slope standard errors will be downwardly biased (assuming a positive ICC). In our simulations, the standard error was underestimated by 35% when X ICC = .40 and Y ICC = .01, all else constant (Table 2). When both X and Y ICC = .40, the standard errors were underestimated by 51%. As is well known, downward standard error bias leads to a false sense of confidence in hypothesis tests and confidence interval coverage.
In the second analytic circumstance, when a researcher chooses to use unilevel regression with CR-SEs as a means for handling correlated residual errors but is unaware of the need to cluster-mean center their L1 predictors (or add L2 aggregates; e.g., Bell et al., 2019), the L1 slope coefficient bias will still be identical to that for the unilevel regression without standard error adjustment (Table 1). This is because the adjustment occurs after coefficients are estimated. As observed in our simulations, the standard errors for the CR-SE approach were similar to those for MLM (Table 2).
In the third analytic circumstance, when a researcher uses MLM to analyze their data but fails to cluster-mean center L1 predictors (or add L2 aggregates; e.g., Antonakis et al., 2021), the L1 slope bias will exhibit the same directional pattern as the other two analysis methods, but the bias will likely be smaller for two reasons. First, the MLM regression coefficient estimation directly takes into account cluster differences in the residuals (see again Equation 7). Second, as the average cluster size becomes large, especially as dependencies in Y increase, the weight for L1-L2 slope differences tends toward zero and the blended slope becomes the within-cluster slope (see again Figure 4A). In our simulations, the bias in MLM slopes when failing to cluster-mean center predictors was much lower than that of bias in unilevel slopes that did not use cluster-mean centered predictors (-14% compared to -39% on average when ICC X = .40 and ICC Y = .01; Table 1).
Why is There a Discrepancy With Earlier Work?
Applied researchers may be surprised to learn that L1 predictor slopes estimated with unilevel regression and MLM are not the same. This is because recent methodological work showed that both analysis approaches could produce the same coefficient estimates (McNeish, 2014). However, this is only the case when data are generated to have the same within- and between-cluster X-Y relationships (Figure 1B). As we review and demonstrate in this paper, L1 slope bias arises only when there is a difference in the L1 and L2 X-Y relations (Figure 1A), which in practice may be far more realistic than assuming they are the same.
Isn’t Everyone Already Using Cluster-Mean Centering?
Some readers may be under the impression that everyone is appropriately cluster-mean centering lower-level predictors (or including their aggregates) when analyzing clustered data. We do not believe this is the case. Although the centering issue has been discussed in prominent MLM textbooks (e.g., Raudenbush & Bryk, 2002; Snijders & Bosker, 2012) as well as MLM methodological literature (e.g., Enders & Tofighi, 2007), two recent meta-analyses of multilevel modeling research by Luo et al. (2021) and Antonakis et al. (2021) showed that very few researchers reported either cluster-mean centering their L1 predictors and/or including L2 aggregate predictors to ensure slope results were unbiased. More importantly, we found little guidance in the MLM alternatives literature (unilevel models for analyzing clustered data structures) about the issues involved, particularly for unilevel regression with CR-SEs (exception is Bell et al., 2019). Thus, we remain skeptical that researchers have received the message yet.
Two Caveats in Simulation Conditions
Besides the specificity of our design conditions, our simulation results were limited in two regards. First, we used latent disaggregation of lower-level regressors into within- and between-cluster (L1 and L2) predictors in our correctly specified multilevel models. Latent disaggregation mitigates against score unreliability that can result from sampling error when estimating X-Y relations at L1 and L2 (Asparouhov & Muthén, 2019). However, we did not build in extra measurement error, and so L1 estimates would not differ from L1 estimates in univariate multilevel software, such as R lme4 (Bates et al., 2015). However, L2 aggregates themselves are automatically subject to sampling error (Grilli & Rampichini, 2011), and so one would expect the difference between latent and univariate multilevel models to be located in the L2 aggregate slope estimates (which are not the focus of our paper). Indeed, when we subjected the simulated data to 2-level models estimated with R lme4 (with observed cluster means used for L1 predictor cluster-mean centering and creating L2 aggregates), we found the L1 slope estimates to be nearly perfectly correlated between Mplus and lme4 results, r > .99 (within and across all sample sizes) but the L2 aggregate slopes were less correlated, at r ≥ .86 across sample sizes (r ≥ .81 for the smallest sample size).
The second point to note is that we used continuous (normally distributed) predictors and outcomes in our simulation, as well as in our applied analysis provided in the Supplementary Materials. Nevertheless, there is no reason to believe the same biases would not persist with categorical data. In fact, categorical predictors can also be either cluster-mean centered (Enders & Tofighi, 2007, Appendix B), and/or aggregated as L2 cluster mean (percentages) that are then added to the model to ensure that slope inferences are correct.
Conclusion
Although MLM has become a staple analytic technique in the psychological and educational sciences (Antonakis et al., 2021; Bauer & Sterba, 2011; Luo et al., 2021), alternative unilevel regression approaches for analyzing clustered data when only L1 questions are of interest are also possible—and sometimes preferable—to MLM, such as unilevel regression with cluster-robust standard errors (CR-SEs) (Hayes & Cai, 2007; Huang, 2016; McNeish et al., 2017). Yet, in any clustered data, the predictor-outcome relation at L1 may easily differ from that of the relation at L2. If lower-level predictors are not cluster-mean centered, or if their aggregates are not added as predictors to the model, the L1 slope estimate will be an unintelligible mixture of the L1 and L2 components. The current study reviews and illustrates this “blended slope” problem, and its differential impact on unilevel models compared to MLMs, when left unchecked. Importantly, the problem only occurs when clustering in X is ignored, resulting in L1 slope under- or over-estimation that is especially pronounced in unilevel regression models, and to a lesser extent, in MLMs with smaller cluster sizes.
In our view, there is no utility in trying to draw inferences from a “blended slope,” as it serves neither the individual level nor the cluster level; further, it is prone to inflated Type I error rates8 in both unilevel and multilevel analyses. Simply put, researchers should always check and report both X and Y ICCs when they are dealing with clustered data, regardless of whether the nature of their data is cross-sectional or repeated measures/time series. Second, given that most datasets employ numerous predictors that vary in their magnitude of cluster dependencies as well as their correlations with the outcome and each other, it should be a routine procedure for researchers to either cluster-mean center their lower-level predictors and/or add cluster mean aggregates as additional predictors (see Supplementary Materials for R code to obtain cluster-mean centered and aggregate predictors), irrespective of whether unilevel or multilevel modeling is used to handle residual error dependencies.