In behavioral sciences, researchers are often interested in investigating the underlying causal mechanisms of treatment or intervention effects on an outcome through mediation analysis. The total effect, encoding the effect of the intervention X on the outcome Y, is then decomposed into a direct and an indirect effect. The direct effect represents the part of the effect of the predictor variable X on the outcome Y that is not transmitted by the mediator M, while the indirect effect is the part of the effect of X on Y that is transmitted through M (MacKinnon, 2012).
Realistic research settings in behavioral sciences often involve hierarchical data structures such as, (i) data nested within groups (clustered data), and (ii) repeated measures from the same subject (longitudinal data) (West et al., 2022). Multilevel mediation analysis occurs when conducting mediation analysis with such hierarchical data. Various multilevel mediation exist depending on the level at which the predictor, mediator(s) and outcome are measured1. A common setting is a mediation model, where the intervention (X), mediator (M), and outcome (Y) are all measured at Level 1 but clustered within Level 2. Examples of studies including such a model are Fitzsimmons-Craft et al. (2016), Hill et al. (2021), Tan et al. (2022), and Yu and Hu (2022). Figure 1 illustrates a mediation model, first presented by Vuorre and Bolger (2018) for a computer game study by Metcalfe and Greene (2007). Participants played a computer game catching Xs and avoiding Os while rating their perceived performance. The experiment manipulated the game by introducing a brief delay (250 ms) between mouse movement and cursor response (intervention X). The mediation question is whether this manipulation affects performance ratings indirectly, by first reducing actual performance, i.e: the hit rate (mediator M), which then influences participants’ judgments of performance (outcome Y). All 3 variables were repeatedly measured within independent individuals (), four times in the lag condition, and four times in a control condition. Since we have longitudinal data within individuals, we can allow for heterogeneity in the effects between individuals (and hence the subscript ).
Figure 1
Computer Game Example of Metcalfe and Greene (2007): A Mediation Model, First Presented by Vuorre and Bolger (2018)
How can the mediation model depicted in Figure 1 be formalized? Throughout this paper, we will assume linear relationships. For convenience, we will only consider the simple setting where no other (measured) variables are considered, although in practice, adjusting for confounders for the different relationships may be necessary to obtain causal effects. Following the approach of Kenny et al. (2003), the mediation model can be written as two Level 1 equations:
1
Model (1) includes two random intercepts and three random slopes. These random slopes are depicted in Figure 1 as circles. It is important to note that both Kenny et al. (2003) and Bauer et al. (2006) consider all five random effects of Model (1). Hence, it could be argued that fitting all random effects is the default way to analyze a model. However, in multilevel modeling, researchers often exclude random slopes from the model and only include random intercepts to account for the non-independence across observations of the same individual (Heisig et al., 2017; Schielzeth & Forstmeier, 2009). Several reasons for ignoring random slopes can be found in the literature. First, there may be no expected variation in the effect between Level 2 units, which implies that there is no random slope. However, LaHuis et al. (2020) suggested that it is rarely the case that the variance for a slope is exactly zero when working with hierarchical data. Heisig et al. (2017) also demonstrated that the assumption of invariant effects of lower-level control variables is often implausible. Second, there may be insufficient information to consider all random effects, for example when limited data is available. Challenges related to limited sample sizes can emerge at distinct levels. While small cluster sizes (i.e. a small number of measurement occasions) are problematic, especially with small to medium samples sizes at the higher level, the primary concern in multilevel modeling typically pertains to the number of individuals or clusters (Hox & McNeish, 2020; McNeish, 2014). Increasing the number of individuals or clusters is often not straightforward, as it may entail a higher financial cost and more time (Maas & Hox, 2005). When limited information is available from the data, convergence issues may occur (Bryan & Jenkins, 2016; Schielzeth & Forstmeier, 2009). Additionally, fitting (multiple) random slopes could be hindered by software limitations or insufficient computational resources (Heisig & Schaeffer, 2018). A third explanation may be that researchers have limited knowledge of multilevel modeling and are unaware of the consequences of misspecifying all the necessary random components. As noted by Schielzeth and Forstmeier (2009), there is a prevalent misconception that utilizing a multilevel model with only a random intercept adequately addresses all instances of non-independence in the data.
The consequences of ignoring random slopes have already been studied in the multilevel modeling literature. Several scholars have shown that falsely assuming fixed slopes rather than random slopes results in biased standard errors and overly narrow confidence intervals (i.e., anti-conservative inference) for the fixed effects (Barr et al., 2013; Bell et al., 2019; Berkhof & Kampen, 2004; Bryan & Jenkins, 2016; Heisig et al., 2017; Hoffman & Walters, 2022; Schmidt-Catran & Fairbrother, 2016) To the best of our knowledge, the impact of omitting random slopes in the more complex model has not been studied yet. There are two important aspects of this model that have not been thoroughly investigated in the literature under the considered scenario. First, this setting includes jointly fitting two models with five possibly correlated random effects. Second, as we will explain later, the expression for the average indirect in the model includes an additional term involving the covariance of random slopes. Therefore, the aim of this article is to examine in detail the impact of ignoring random slopes in the mediation model. More specifically, simulation studies based on the joint modeling approach described by Bauer et al. (2006) will be used to study the impact of omitting the random slopes. In the simulation study, we first assess the impact of model complexity on the convergence rate when using the default options in open source software. Second, given the mediation context, we focus on the bias, precision, efficiency, and coverage probability of the average indirect effect estimator.
The outline of this article is as follows. First, we provide a detailed overview of the joint modeling approach proposed by Bauer et al. (2006), and explain how the average indirect effect can be derived in the setting. Next, we describe the results from the simulation study. Finally, we summarize the key findings and provide some concluding remarks.
Joint Modeling Approach
In this section, we present a detailed overview of the joint modeling approach proposed by Bauer et al. (2006). We begin by outlining the mediation model and its underlying assumptions. Next, we discuss how to restructure the data and fit the model using free statistical software. Finally, we describe the closed-form expressions for the indirect and total effects.
Model and Assumptions
Consider model presented in Equation 1. Bauer et al. (2006) make the following assumptions:
The predictors are uncorrelated with the random effects and the residuals, both within and across equations.
Both Level-1 residuals and are assumed to be uncorrelated 2 and normally distributed with a mean of zero. The residuals in each equation are expected to be independent and homoscedastic across individual observations () within clusters ().
The random effects are normally distributed with mean their respective average population effect (or zero if notation in deviation form is used). The random effects may covary with each other.
The Level-1 residuals are uncorrelated with the random effects both within and across equations (i.e., and are uncorrelated with , and ).
It follows from Assumptions 2 and 3 that the distributions of M and Y are conditionally normal. More specifically, M is normally distributed conditional on X; Y is normally distributed conditional on M and X.
Kenny et al. (2003) separately estimate the two Level-1 models presented in Equation 1. However, as pointed out by Bauer et al. (2006), this approach results in a loss of essential information. Indeed, when fitting both models separately, the correlation between the random effects is not modeled explicitly (cfr. Assumption 3). It is important to understand the consequences of this. The random intercepts and in Model 1 represent unmeasured Level-2 heterogeneity in the mediator and outcome, respectively. In a standard multilevel model random intercepts are assumed to be independent from the predictors. Hence, the unmeasured heterogeneity in Y is assumed to be independent from X and M, and cannot capture unmeasured upper level common causes of M and Y. When assuming correlated random intercepts though, we explicitly allow for unmeasured upper level common causes of M and Y. To eliminate upper level confounding, it is sometimes argued to separate within and between effects. By doing so, the cluster-mean centered predictors will be independent from any upper level variable. While this is an interesting alternative, we will further built on the framework of Bauer et al. (2006), and capture any upper level confounding of M and Y by allowing the random intercepts to be correlated. In the next section, we explain how those two models can be fitted jointly.
In this paper, we focus on the mediation model under the assumption that (unmeasured) upper-level confounding exists only for the mediator–outcome relationship. In other words, we do not account for (unmeasured) upper-level confounding of the exposure–mediator or exposure–outcome relationships. In our computer game example, the exposure is randomized, which satisfies this assumption. This condition is also commonly met in alternating treatment designs (Manolov et al. 2022), used in single-subject research (Shadish et al., 2013) for example, but it may be violated in observational settings where the exposure is not randomized. The primary aim of the paper is to show that ignoring random slopes in mediation models can lead to biased estimates of the average indirect effect. In the discussion, we outline possible extensions that relax this simplifying assumption by allowing for upper-level confounding beyond the mediator–outcome relationship.
Reorganizing the Data
Bauer et al. (2006) proposed a solution that involves formulating the model jointly in a single Level 1 equation using selection variables. This requires creating a new outcome variable Z by stacking Y and M for each observation within cluster . To distinguish between M and Y, two selection variables ( and ) are created. If Z pertains to M, then and . Otherwise, if Z pertains to Y, then and . is associated with the model for the mediator M, while is associated with the model for the outcome Y. In the stacked data set, the predictor and mediator are retained, along with the new outcome variable and the two selection variables.
Fitting the Joint Model
By using the selection variables, the model (see Equation 1) can be specified in a single equation:
2
As explained by Bauer et al. (2006), the joint model for Z (see Equation 2) can be fitted by specifying a model for Z without a common fixed intercept, but with random intercepts for and (denoted as and , respectively) and with random slopes for the product variables , , and (, , and , respectively).3 Furthermore, the residual variance (denoted as ) is allowed to differ depending on one of the selection variables (e.g., ) (Bauer et al., 2006). By fitting the joint model, the mediation models (see Equation 1) can be estimated simultaneously, and both the complete covariance matrix of the random effects and the complete covariance matrix of the fixed effects can be directly estimated (Bauer et al., 2006). As we will show in the next section, all the necessary information to estimate the average indirect and total effects becomes available with this stacked approach.
Estimating and Evaluating the Indirect and Total Effects
This section discusses the expressions for the average indirect and total effects as presented by Bauer et al. (2006) and Kenny et al. (2003), as well as their precision (i.e., the standard error) and expressions for their 95% confidence intervals.
The Average Indirect and Total Effects
While cluster-specific indirect and total effects can easily be obtained from Equation 1, expressions for the average indirect and total effects require more attention. The average indirect effect (Goodman, 1960, p. 712) can be derived as:
3
4
Since and are not necessarily independent (cfr. the third assumption), the average indirect and total effect do not only depend on the average effect of X on M () and the average effect of M on Y (), but also on the covariance between the two random slopes (Kenny et al., 2003).4 This implies that ignoring the presence of correlated random slopes may lead to biased estimators for the average indirect and total effects (Tofighi et al., 2013). In practice, sample estimates of the quantities in Equation 3 and Equation 4 are used as substitutes for their population counterparts.
Precision of the Estimated Indirect and Total Effects
This section aims to quantify the precision of the estimated average indirect and total effects. As was pointed out by Bauer et al. (2006) the precision of the estimated average effects should not be confused with the heterogeneity in the effects. The precision refers to the variability in the estimators of the average effects, while heterogeneity pertains to the variability across Level-2 units. In this article, we focus on the precision of the average effects estimators. Bauer et al. (2006) showed that, assuming multivariate normality of the estimators, an expression for the variance of the estimated average indirect effect is given by
5
6
The (co)variances in Equation 5 and Equation 6 represent the (co)variances of the fixed effect estimates , , , and the covariance estimate for the random slopes. These (co-)variances can be estimated by substituting the population values in Equation 5 and Equation 6 with their sample estimates (Raudenbush & Bryk, 2002). Again, the covariance between fixed effect estimators residing in two different models, , is only possible by simultaneously estimating the mediation model (Bauer et al., 2006). Additionally, it should be noted that Equation 5 and Equation 6 include the term .
Confidence Interval of the Average Indirect and Total Effects
Bauer et al. (2006) also presented expressions for 95% confidence intervals (CIs) for the average indirect and total effects. These expressions assume that the sampling distributions of the estimators are normal. Under this normality assumption, the 95% CIs for the average indirect effect and the average total effect can be calculated as:
7
8
Furthermore, Bauer et al. (2006) noted that a Wald-based null hypothesis test could be performed similarly by dividing the estimate of the effect by its standard error. However, since is a product of normally distributed estimators, and the product of the normally distributed variables does typically not follow a normal distribution (Aroian et al., 1978), the normality assumption required for hypothesis testing and construction of the CIs may not hold exactly (Shrout & Bolger, 2002). However, the deviation from normality may be small enough that the CIs or significance tests will still be reasonably accurate (Bauer et al., 2006). In the next paragraph we present an alternative approach for the construction of confidence intervals that was also suggested by Bauer et al. (2006).
Monte Carlo Method to Construct Confidence Intervals
A viable alternative for constructing CIs is the Monte Carlo (MC) method, which does not assume that the product follows a normal distribution. Instead, the method only relies on the assumption that the estimators and follow a bivariate distribution (MacKinnon et al., 2004). Other distribution-free alternatives, such as non-parametric and residual bootstrapping, are also available. However, there is no consensus on the best way to perform bootstrapping methods in the multilevel context, and multilevel modeling software does not always facilitate their application (Preacher et al., 2010; Preacher & Selig, 2012). We therefore opted to present the MC method here since it is computationally faster and has been shown to perform comparable to other approaches (Preacher & Selig, 2012). The MC method for a mediation model is implemented in a free online R calculator, see Preacher and Selig, (2010).
Simulation Study
Simulation Model and Conditions
We examined the impact of ignoring random slopes in the mediation model in the presence of unmeasured upper-level mediator-outcome confounding. We focused on conditions commonly encountered in studies involving multilevel (mediation) models, using varying cluster sizes () and numbers of clusters (), along with different values for the intraclass correlation coefficient (ICC) to represent different clustering effects. Three scenarios were considered regarding the ICC, involving both equal and unequal values for the ICC of X, M, and Y. Specifically, we examined: (i) a lower ICC for the predictor (), (ii) equal ICC values (), and (iii) a higher ICC for the predictor (). The ICC was computed by dividing the cluster-level random intercept variances of X, M, and Y by the total variance for each variable (i.e., the sum of the Level 1 residual variances and the cluster level intercept variance), following the approach of Zigler and Ye (2019). These values for the number of clusters, cluster sizes, and ICC values have been widely used in the literature and cover a range of conditions used in previous work (see e.g., Baird & Maxwell, 2016; Hox & Maas, 2001; Kim & Hong, 2020; Preacher et al., 2011; Zhang et al., 2009; Zigler & Ye, 2019).
The data generating model for the simulation study is represented by Model (1) and used various population values for the fixed and random effects (co-)variances. The fixed effects for and were assumed to be equal, and both set to either or . Additionally, we considered a scenario in which the fixed effects were unequal, with and . The population value for the fixed effect (i.e., the direct effect) was set to . The variances of the random slopes of the and paths ( and ) were both chosen to be , and . Additionally, we examined a scenario in which the variances of the random slopes were unequal, with and . To evaluate the performance of the approaches in a model without random slopes, we included a condition with minimal non-zero variances for and (i.e., ), which we refer to as the ‘no random slopes’ condition. The covariance between the random slopes of both paths () was set to , , and .5 However, when the data generating model contained no random slopes for the and paths (i.e., ), then was set to zero. For simplicity, all other random effects were assumed to be uncorrelated, while the variances of the other random effects were based on the values used by Bauer et al. (2006). An overview of all parameters in the data generation models can be found in Figure 2.
Figure 2
Simulation Model and Covariance Matrices of Level 1 Residual Variances (left) and Random Effects (right)
Note. The values of and were chosen based on the clustering effect. The random slope variances and were set to 0.01, 0.15 and 0.5.
For each of the 648 conditions (i.e., 2 different cluster sizes 3 different numbers of clusters 3 scenarios for the ICC values 3 scenarios for and 4 scenarios for and 3 different values for the random slope covariance), 1000 samples were generated and analyzed using the statistical software program R (Version 4.2.1) (R Core Team, 2022) and the lme4 package (Bates, Mächler et al., 2015). In order to examine the impact of ignoring random slopes, we fitted a joint model with random slopes (further abbreviated RS) and without random slopes (further abbreviated RI). To assess the convergence rate, we initially used the default optimizer and the default number of iterations from lme4 for both the RI and RS approaches. However, the RS approach exhibited high non-convergence rates under various conditions when using the default optimizer. To address this, we tested the bobyqa optimizer, which demonstrated substantially higher convergence rates.6 The results presented in the manuscript are therefore based on the bobyqa optimizer.
Additionally, we identified singular solutions for the RS approach. Below, we present results for the 1000 samples (hereafter referred to as RS all) and for the subset excluding these singular fits (hereafter referred to as RS). As explained by Bates, Kliegl et al. (2015); Bates, Mächler et al. (2015), singular fits are common with complex covariance models and occur when a random effect variance is estimated near zero.7 As it has been recommended that robust standard errors may be used to overcome model misspecification in multilevel models McNeish (2023), we considered both the regular (hereafter referred to as RI) and robust standard errors (hereafter referred to as RI robust) for the RI approach. The robust standard errors were obtained using the clubSandwich package (Pustejovsky, 2023). An illustration of the examined estimation approaches is provided in Appendix A.
Performance Criteria
Several performance criteria were employed to evaluate the impact of ignoring random slopes in the mediation model. The first criterion was non-convergence, which occurs when the model-fitting algorithm fails to achieve a solution within the predefined number of iterations or stopping criteria (Park et al., 2020). Possible convergence issues might arise for the RS approach, especially in conditions with a smaller cluster size and/or number of clusters. The convergence rate (CR) for each condition is calculated as:
9
Second, we assessed the bias of the estimated average indirect effect. Since boxplots of the estimated average indirect effect (see Appendix B) revealed several right-skewed distributions and numerous outliers, the median bias was computed. The median bias (MB) is expressed as the difference between the median of the parameter estimates and the true parameter value:
10
Fourth, the efficiency of the estimated average indirect effect was evaluated. Typically, the root mean squared error is used to examine the bias-variance trade-off. However, we utilized its robust counterpart the median absolute error (MAE):
11
Fifth, the coverage probability (CP) of the 95% confidence intervals was inspected. The coverage probability is the proportion of 95% confidence intervals (CIs) containing the true parameter value:
12
Finally, we examined the use of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) for model comparison and selection (West et al., 2022). The AIC and BIC are derived from the log-likelihood (LL) statistic and can be computed as follows:
13
Hypotheses
Based on the available multilevel literature, we anticipated several findings. First, we expected lower convergence rates for the RS approach compared to the RI approach, especially for a small number of clusters and/or when (i.e., when there were no random slopes in the data generating model). Second, we anticipated that the RI approach would show downward or upward biased estimates for the average indirect effect, depending on whether is negative or positive, and would underestimate the standard errors, except when . Finally, we expected a poor coverage probability for the RI approach in most conditions, which may be improved when a robust estimator is used for the standard error (i.e., robust RI). For transparency and reproducibility, the code to perform the analysis and all the results are available at Bogaert et al. (2025).
Results
To compare the performance of the estimation models in terms of convergence rate, median bias (of both the indirect effect and the standard error), median absolute error, coverage probability, and the model selection rate, an ANOVA was conducted for each of those criteria. The independent variables included the employed estimation models and the simulation conditions: cluster size (), number of clusters (), value of the indirect effect (), variance of the random slopes (), and ICC. All two-way interactions between the independent variables were included in the analysis. The results, presented in Table C1 in Appendix C, indicated that cluster size, number of clusters, value of the indirect effect, random slope variance, and estimation models had a statistically significant effect on the performance criteria. In contrast, neither the ICC nor its interaction terms exhibited a significant effect, suggesting a limited impact on the outcomes under the tested conditions. Therefore, we focus on the conditions with balanced ICC values (). Although various values for the random slope covariance were examined, we report only the results for the condition where . The results showed mostly similar patterns across different values of the random slope covariance. However, when , the bias in the average indirect effect was positive (instead of negative). When , there was no bias in the average indirect effect, but the standard error was underestimated, which led to unacceptable low coverage probabilities. A full overview of the results can be found at Bogaert et al. (2025).
Convergence Rate
As expected, the RI approach outperformed the RS approach, with convergence rates consistently at or above 0.97. The RS approach exhibited poor performance in conditions with a small cluster size () and number of clusters (), particularly when no random slopes were included in the data generating model (i.e., ), yielding convergence rates as low as 0.10. However, as both the cluster size and number of clusters increased, the RS approach achieved convergence rates as high as 1.00. The RS all approach demonstrated perfect convergence rates, suggesting the occurrence of numerous singular solutions in the case of a small sample size and/or when no random slopes were present in the data generating model. Overall, convergence rates remained largely similar across different values of the indirect effect. An overview of the convergence rates is provided in Figure 3.
Figure 3
Convergence Rates for the Considered Estimation Models
Median Bias of Average Indirect Effect Estimates
The RS approach performed well, even if the cluster size and number of clusters were small. Furthermore, only negligible differences were found between the RS and RS all approaches, indicating that the singular solutions yielded reasonable estimates for the average indirect effect. The RI approach demonstrated substantially biased estimates for the average indirect effect in all conditions, except when . In this case, there were no random slopes and therefore the covariance between the random slopes was zero (i.e., ). The RI approach showed the same absolute bias but smaller relative bias when the fixed indirect effect was larger. Finally, the cluster size, number of clusters, and ICC did not influence the overall performance regarding bias. An overview of the median bias can be found in Figure 4.
Figure 4
Median Bias of the Average Indirect Effect Estimates for the Considered Estimation Models
Median Bias of Standard Error
The RS approach was overall the best performing estimation model, showing only minor underestimation of standard errors if the cluster size and number of clusters is small. Furthermore, only negligible differences could be found between the RS and RS all approaches, except in the condition with the smallest cluster size () and number of clusters (), in which case the singular solutions resulted in somewhat underestimated standard errors.
The RI approach showed considerable underestimated standard errors across all conditions, except when the data-generating model did not include random slopes. The underestimation increased substantially when the fixed indirect effect and variance of the random slopes became larger, as well as when the cluster size and number of clusters decreased. However, the underestimated standard error largely diminished when the RI approach is combined with a robust estimator, indicating that the RI robust approach performed comparable to RS in most conditions. Finally, the ICC did not influence the bias in the standard errors. An overview of the median bias can be found in Figure 5.
Figure 5
Median Bias of the Standard Error of the Average Indirect Effect Estimates for the Considered Estimation Models
Bias-Variance Trade-Off: Median Absolute Error
The findings indicated that the RS (all) approach was the most efficient, as it exhibited the lowest median absolute error in all conditions, except when . No (considerable) differences in median absolute error values were observed between the RS and RS all approaches, even when the number of clusters and cluster size were small. The efficiency of all estimation models improved as the number of clusters increased and the random slope variance decreased. The RI approach also demonstrated a lower median absolute error when the fixed indirect effect was larger, in which case the differences between the estimation models became smaller. Additionally, the ICC and cluster size did not substantially influence efficiency. An overview of the median absolute error is provided in Figure 6.
Figure 6
Median Absolute Error of the Average Indirect Effect Estimates for the Considered Estimation Models
Coverage Probability
Overall, the RI approach demonstrated low coverage probabilities. It performed best when no random slopes were present in the data generating model, with coverage probabilities ranging from 0.84 and 0.92. The lowest coverage was observed with small cluster sizes. However, when random slopes were present, the RI approach exhibited coverage rates as low as 0.00 in multiple conditions. The RI robust approach showed better coverage than RI in most conditions but still performed poorly when random slopes were present in the data-generating model (coverage probabilities ranging from 0.00 to 0.85). The RI robust approach also performed best when , consistently achieving coverage probabilities near the nominal level of 0.95. Importantly, the performance of RI (robust) approach deteriorated as the cluster size and number of clusters increased. Although the RI robust approach exhibited improved coverage rates when the fixed indirect effect was large, its overall performance remained unsatisfactory.
The RS (all) approach demonstrated overall acceptable coverage probabilities. However, the coverage rate decreased when the cluster size and number of clusters were low, and when the fixed indirect effect was small. In these conditions, the RS all approach displayed somewhat lower coverage probabilities compared to RS, implying that in the case of singular solutions lower coverage probability may also occur. Notably, no large differences in coverage probability were observed when , where most singular solutions occurred. The ICC did not effect the coverage probability, as was the case for RI (robust) approach. An overview of the coverage probabilities is provided in Figure 7.
Figure 7
Coverage Probability of Considered Estimation Models for the 95% CI of the Average Indirect Effect
Model Selection Rate
When random slopes were included in the model, both the AIC and BIC favored the RS all approach over the RI approach under these conditions. When no random slopes were included, the RS all approach was increasingly favored over the RI approach as cluster size, number of clusters, and the magnitude of the indirect effect increased. Selection rates also varied depending on the information criterion used. Specifically, the BIC tended to favor the RI approach more frequently than the AIC, which only favored the RI approach in scenarios with small cluster sizes and a low number of clusters. Additionally, the ICC did not affect the model selection rate. An overview of the selection rates is provided in Figure 8.
Figure 8
Model Selection Rate of RS All Over RI Approach
Discussion
In this article, we investigated the impact of ignoring random slopes in a mediation model. We examined the convergence rate, bias, efficiency, and coverage probability of the average indirect effect for an estimation model with and without random slopes. Various cluster sizes and number of clusters were used, along with different values for the intraclass correlation coefficient, the fixed effects and the variances of the random slopes.
The results revealed clear patterns that align with prior literature. First, we examined the convergence rate. As expected, the RI approach displayed very high convergence rates in all conditions, whereas the model including random slopes showed substantially lower convergence rates (i.e., many singular solutions) when the cluster size and number of clusters were small, particularly when no random slopes were included in the data generating model. These findings are consistent with those of Bauer et al. (2006), Bryan and Jenkins (2016), Park et al. (2020), and Schielzeth and Forstmeier (2009), who argued that the estimation of a multilevel model becomes more difficult as the number of random effects (and therefore the number of parameters to be estimated) increases. Importantly, when using the default optimizer of the lme4 package (instead of the bobyqa optimizer), the findings showed lower convergence rates for the RS approach. We therefore recommend using the bobyqa optimizer in case of a mediation model.
The RS approach performed well in terms of bias of the estimated indirect effect, even when the data generating model contained no random slopes and/or in the smallest considered cluster size and number of clusters. In contrast, the RI approach displayed substantially biased indirect effect estimates in all conditions, except when there were no random slopes in the data generating model or the random slope covariance was zero (i.e., ). The bias decreased when the fixed indirect effect became larger, but still remained considerable. As discussed earlier, the RI approach did not provide unbiased estimates for the average indirect effect whenever , or in other words, whenever correlated random slopes for paths and are present in the data generating model. This is a distinctive finding for the mediation model compared to a standalone multilevel model, where omitting random slopes usually does not result in biased estimators for the fixed effects (LaHuis et al., 2020; Schielzeth & Forstmeier, 2009).
A similar trend was observed for the estimated standard errors. The RS approach yielded appropriate standard errors, whereas the RI approach systematically underestimated the standard errors. This result coincides with findings of previous work Barr et al. (2013), Bell et al. (2019), Heisig et al. (2017), and LaHuis et al. (2020), who investigated the impact of omitting random slopes in the standalone multilevel model and reported underestimated standard errors for the fixed effects too. A better performance was observed for the RI approach when a robust estimator for the standard error was used, which is in line with previous research findings (Bell et al., 2019; King & Roberts, 2015).
The RS approach resulted in acceptable coverage probabilities for the average indirect effect in most conditions. In contrast, the RI approach performed extremely poorly. Its performance was best when no random slopes were included in the data generating model, yet it still exhibited too low coverage probabilities under these conditions. These findings align with the results of Barr et al. (2013), Bell et al. (2019), LaHuis et al. (2020), and Schielzeth and Forstmeier (2009), who concluded that falsely assuming fixed slopes can be detrimental for statistical inference (i.e., too narrow confidence intervals). Here, however, the low coverage probabilities result from both the bias in the estimated indirect effect and the underestimated standard error, which respectively shifts the confidence intervals away from the true average indirect effect and narrows the 95 % CIs.
Finally, the RI and RS all approaches were compared in terms of their performance regarding the bias-variance trade-off and model selection based on information criteria. The RS all approach exhibited smaller median absolute error values compared to the RI approach, except when , and is therefore considered being more efficient. Both information criteria showed high model selection rates, indicating that the RS all approach was generally favored over the RI approach, particularly when random slopes were included in the model. In the absence of random slopes, the RS all approach was increasingly preferred as cluster size, number of clusters, and the magnitude of the indirect effect increased. The BIC favored the RI approach more often than the AIC, but only in scenarios with small cluster sizes and a low number of clusters.
Overall, we make the following recommendations. When convergence or estimation issues arise and/or insufficient resources are available, a first recommendation is to try a different optimizer and/or increasing the number of iterations. A second recommendation is to fit the RI approach with robust standard errors. The RI approach with robust standard errors showed a better but still unsatisfactory performance (due to biased indirect effect estimates and low coverage probability). However, it can be used as a diagnostic procedure, as suggested by King and Roberts (2015). If there is a disparity between model-based and robust standard errors, it implies the presence of model misspecification. This misspecification could potentially involve the omission of random slopes in the model Bell et al. (2019). A third recommendation is to use a piecemeal approach to potentially leave out random slopes (e.g., for the direct effect) and/or correlation between the random effects. This may involve evaluating the magnitude of the estimated variance components and comparing models using likelihood ratio tests with the varTestnlme package in R for example or information criteria (Bates, Kliegl et al., 2015; Bell et al., 2019; LaHuis et al., 2020; Matuschek et al., 2017; Park et al., 2020). Finally, it is also possible to test the parameter using a Wald test.
Like all simulation studies, this study was not without limitations. A first limitation is that we restricted upper-level confounding to the mediator-outcome relationship. Under the considered data generating mechanism in our simulation study the within- and between-effect of the exposure are equal. Though this assumption is only likely to hold in settings such as alternating treatment designs where the exposure is randomized, it is also unlikely to matter for the main conclusions in this paper. A second limitation is that the simulation study included only a limited number of conditions. Although the conditions that we considered here may be commonly found in the literature, different settings may lead to different results. A third limitation is that we have not considered the bootstrap or Monte Carlo approach for estimating the precision of the indirect effect since it is computationally very intensive. A fourth limitation is that we have not considered cross-level interactions (i.e, a Level 2 that moderates the relationship between two Level 1 variables). Previous research findings indicate that the negative consequences of falsely omitting random slopes may aggravate in this condition (Heisig et al., 2017). A fifth limitation in the presentation of the simulation results is that we only compared the performance of the more simple RI model with the performance of the more complex RS model. In practice, a researcher may use model selection based on the AIC. Once a model is selected by AIC, it could be interesting to look whether it actually gives good estimates of the parameters of interest.
In this work, we have employed the joint modeling approach by Bauer et al. (2006), as it was the most convenient approach to examine the consequences of ignoring random slopes in a mediation model in the traditional multilevel framework. However, there are other methods for multilevel mediation analysis, such as MSEM (Preacher, 2011). The MSEM approach to multilevel mediation modeling has the advantage that it can easily handle multiple multilevel models simultaneously, but it often requires a larger sample size compared to the traditional multilevel approaches (McNeish, 2017; Preacher et al., 2011; Zigler & Ye, 2019). Indeed the latent decomposition in MSEM can be quite demanding. Because MSEM is more complex (modeling latent components, more parameters) and has greater sample-size and convergence demands, we sticked to the joint multilevel approach. The latter is easier to implement in standard mixed-model software and can perform adequately when focus is on within effects. Future research could directly compare both approaches.
As explained we opted not to separate within and between effects using a manifest decomposition (Zhang et al., 2009). The decomposition is not needed to eliminate upper level confounding when allowing correlated random slopes. The downside of the approach presented in this paper is that it only allows for unmeasured upper level confounding of the mediator-outcome relationship. In non-randomized studies there may also be exposure-mediator or exposure-outcome confounding. In principle one could add a third model to the stacked approach of Bauer et al. (2006), and model the exposure. By allowing the random intercepts of all three models to be correlated, one allows for all three types of upper level confounding. Obviously, such approach would further increase the dimensionality of the covariance matrix of the random effects. Alternatively, one may follow the suggestion of Bauer (2015), and separate the within-effect and between-effect of the exposure to avoid conflated effects. The advantage of the latter approach is that it does not require an additional model for the exposure.
In sum, we have shown that omitting random slopes in a mediation model results in biased average indirect effects, as well as underestimated standard errors, leading to invalid inference. Given the limited negative impact for unnecessarily allowing random slopes, we encourage (when in doubt) to include random slopes in a mediation model. This advice is in line with conclusions from other scholars (Barr et al., 2013; Bell et al., 2019; LaHuis et al., 2020; Schielzeth & Forstmeier, 2009; Schmidt-Catran & Fairbrother, 2016). At the same time it is important to acknowledge potential convergence issues for the RS approach due to the increased complexity of the model when all random slopes are included (Heisig et al., 2017; LaHuis et al., 2020; Matuschek et al., 2017).
This is an open access article distributed under the terms of the Creative Commons Attribution License (