^{a}

^{b}

^{c}

^{d}

Researchers often apply moderation analyses to examine whether the effects of an intervention differ conditional on individual or cluster moderator variables such as gender, pretest, or school size. This study develops formulas for power analyses to detect moderator effects in two-level cluster randomized trials (CRTs) using hierarchical linear models. We derive the formulas for estimating statistical power, minimum detectable effect size difference and 95% confidence intervals for cluster- and individual-level moderators. Our framework accommodates binary or continuous moderators, designs with or without covariates, and effects of individual-level moderators that vary randomly or nonrandomly across clusters. A small Monte Carlo simulation confirms the accuracy of our formulas. We also compare power between main effect analysis and moderation analysis, discuss the effects of mis-specification of the moderator slope (randomly vs. non-randomly varying), and conclude with directions for future research. We provide software for conducting a power analysis of moderator effects in CRTs.

A critical consideration in the evaluation of treatment programs is whether those treatment effects are moderated by context or individual characteristics. As a result, an important consideration that emerges in the planning stage is how to design studies that have the sufficient power to detect such moderation if it exists. Although there has been a steady pace of advancement in the design of moderation studies in cluster randomized trials (CRTs;

Similarly, current multilevel literature is limited in the guidance it offers concerning statistical power when assessing the extent to which treatment effects vary across subgroups defined by an individual-level variable. More specifically, assessments of individual-level moderators are typically operationalized through cross-level interactions between the cluster-level treatments and individual-level moderators (e.g., child’s gender). The result is that the effect of the individual-level variable (i.e., as quantified through the coefficient) can be regarded as randomly or nonrandomly varying across clusters. The nonrandomly varying slope approach assumes that the gender achievement gap does not vary randomly across schools but rather only as an explicit function of cluster-level variables (e.g., the individual-level slope or coefficient for gender varies across clusters only as a function of the treatment status). The randomly varying slope or coefficient model addresses the same moderation question, but allows for the possibility that the gender achievement slope or coefficient randomly varies across schools even after accounting for the treatment effect (e.g., unexplained heterogeneity across schools in terms of the relationship between gender and the outcome). The choice between these approaches ultimately depends on prior knowledge of the effects of the moderator variables and the theory underlying the intervention. However, it is important that design frameworks consider both of these approaches and the implications of designing a study based on one of the frameworks.

Our review of the literature identified only two methodological studies that have examined the power for the randomly varying slope model in moderation analysis (

A key prior contribution to the literature with regard to designing multilevel moderation studies was

The purpose of this study is to consolidate and extend the literature on power analyses for moderators by developing power formulas that accommodate categorical or continuous moderators, models with or without covariates, same or cross-level moderator effects, and nonrandomly varying or randomly varying slopes in two-level CRTs. We then advance the practical application of these results by examining the effects on power when the slope is mis-specified (randomly varying slope vs. non-randomly varying slope) to outline the sensitivity of power analysis to such mis-specifications. Because a team planning a CRT may be interested in the power for a moderator effect of a given magnitude or the MDESD given sample size and the desired power, we provide the power formulas as well as the MDESD calculations and their corresponding confidence intervals. We also created a Microsoft Excel-based function, an R function, and an R shinny app to assist researchers conducting power analyses for various moderator effects^{1}

The software can be accessed from the website:

The paper is organized as follows. We present the formulas for statistical power and the MDESD and its confidence intervals for the moderator variable at level 2 and subsequently for a moderator at Level 1. In each case, we start with a continuous moderator and extend it to a binary moderator. We also conduct a small Monte Carlo simulation to assess the empirical validity of the formulas in finite sample sizes. We then compare the statistical power and MDESD for moderation effects under different design considerations followed by a comparison of the MDES for main treatment effects and the MDESD for the moderation effects. Finally, we discuss the implications of planning studies to detect moderator effects in two-level CRTs and consider directions for future work.

We present the key results of the formulas for statistical power and the MDESD and its confidence intervals for different moderator effects in the framework of a two-level hierarchical linear model (HLM;

We begin with a two-level design that randomly assigns groups/clusters (e.g., schools) to the treatment or control condition and conditions on a cluster-level covariate (e.g., the percentage of students eligible for free or reduced-price lunch) and probes a cluster-level moderator (e.g., school size). The data are generated using a two-level hierarchical linear model (

Level 1:

Level 2:

_{j}

We assume that the data are balanced such that each cluster has the same number of observations (_{j}

We can test

where

The statistical power for a two-sided test is (note

^{2}

Generally,

The MDESD for the standardized coefficient is:

where,

The 100*(1−α)% confidence interval for

When the moderator,

where

Equation for... | Model Number |
||
---|---|---|---|

CRT2-1N | CRT2-1R | CRT2-2 | |

HLM | |||

L1 | |||

L2 | |||

Standardized Noncentrality Parameter (λ) | |||

Binary Moderator | |||

Continuous Moderator | |||

MDESD | |||

Binary Moderator | |||

Continuous Moderator | |||

100*(1-α)% Confidence Interval | |||

Binary Moderator | |||

Continuous Moderator | |||

Degree of Freedom (v) |

Under the same design, we next consider individual-level moderators allowing for two different specifications: 1) the randomly varying slope model, which assumes that the effect of the Level 1 moderator varies by the treatment status and varies randomly across the Level 2 units, and 2) the nonrandomly varying slope model, which assumes that the effect of the Level 1 moderator varies by the treatment status but does not vary further across the Level 2 units.

The randomly varying slope hierarchical linear model, including one treatment variable,

Level 1:

Level 2:

The Level 2 residuals for the intercept,

We test the moderator effect (

The statistical power for a two-sided test is (note

The MDESD for the standardized coefficient is:

where,

The 100*(1−α)% confidence interval for

In the nonrandomly varying slope model the Level 1 model is the same as that in

The standardized noncentrality parameter is:

The degrees of freedom^{3}

Generally,

When the Level 1 moderator,

and

where

The standardized noncentrality parameters, the MDESD for the standardized regression coefficient, and the 100*(1−α)% confidence interval for

To validate the standard error and power formulas we derived, we conducted a small Monte Carlo simulation. The simulation results provided initial but limited evidence of the close correspondence on the standard error and power (or Type I error) between our formulas and the empirical distribution from the simulation when the analytic model was correctly specified. The detailed procedures and results are presented in

We note one particular finding that emerges from the results of the simulation. For a Level 1 moderator, we set the effect heterogeneity (ω) for the Level 1 moderator across Level 2 units varied from 0 to 0.8. For each dataset, we used both the randomly varying slope model and the nonrandomly varying slope model to estimate the moderator effects. When ω is set as 0, the nonrandomly varying slope model is the correctly specified analytic model while the randomly varying slope model is mis-specified analytic model. In these simulations, the randomly varying slope model tended to slightly over-estimate the standard error, but the coverage rate of 95% CI is as good as the nonrandomly varying slope model. Comparing with the nonrandomly varying slope model, the randomly varying slope model produced slightly smaller power. When ω is set as 0.2, 0.4, 0.6, and 0.8, the nonrandomly varying slope model is the mis-specified analytic model while the randomly varying slope model is the correctly specified analytic model (see Tables S1-S24 in

As in the power analysis of the main treatment effect, the power of the moderator effect in two-level CRTs is associated with the noncentrality parameter (λ) and the critical

If the moderator is a binary variable, the power is also associated with the proportion (

If the moderator is at Level 1 with a randomly varying slope, the power is also associated with the effect heterogeneity (ω) for the Level 1 moderator across Level 2 units. The MDESD increases and power decreases as ω increases. The results for the nonrandomly varying slope model for the Level 1 moderator do not contain the factor that is related to ω. The degrees of freedom also differ depending on whether it is a random slope model or not. The degree of freedom (

Using the mis-specified analytic models for study design will result in either overestimating or underestimating the power. Specifically, if the randomly varying slope model is used to design the studies where ω = 0, the power will be underestimated; if the nonrandomly varying model is used to design the studies where ω > 0, the power will be overestimated. The bias in power estimates due to model mis-specification decrease when the sample size for the clusters (

To make these comparisons more concrete, we compare MDESD and power among three moderation designs using several examples. Suppose a team of researchers are designing a two-level CRT to test the efficacy of a school-based intervention on student achievement. They are interested in student-level moderator effects and school-level moderator effects. They approach the moderator power analyses from two perspectives: 1) what is the MDESD given power of 0.80 and 2) what is the power for a moderation effect size of 0.20. Based on the literature (

Level of moderator | Slope of lower level moderator | MDESD |
Power |
||||||
---|---|---|---|---|---|---|---|---|---|

Binary moderator |
Continuous moderator |
Binary moderator |
Continuous moderator |
||||||

1 | Nonrandomly varying | 0.11 | 0.08 | 0.06 | 0.04 | 1.00 | 1.00 | 1.00 | 1.00 |

1 | Randomly varying | 0.26 | 0.18 | 0.25 | 0.17 | 0.56 | 0.86 | 0.63 | 0.91 |

2 | N/A | 0.67 | 0.45 | 0.34 | 0.23 | 0.13 | 0.24 | 0.39 | 0.70 |

The findings in

We examine the ratio of the MDESD for the moderator analysis to the minimum detectable effect size (MDES) for the main effect analysis. The MDES formula for a two-level cluster randomized design with a Level 1 and two Level 2 covariates is as follows (

where the multiplier

We use the MDESD formulas for binary moderators in

The result in

The situation is different for the analysis of the Level-1 moderator effect, which may have bigger power than the main effect. The MDES formula for the main effect in

The figure is based on the following assumptions: the intraclass correlation coefficient (

The main findings are summarized as follows. First, the effects of the sample sizes at different levels, the levels of the moderators at which they have been assessed, the slopes of Level 1 moderators (random vs. non-randomly varying), the distribution of moderators (binary vs. continuous), and the inclusion of covariates on power and MDESD in two-level CRTs are consistent with that in three-level CRTs (

Second, when the estimation models are correctly specified for the real data, the model with a varying moderator slope will yield less precise estimates than the model with a constant moderator slope. The differences on the power and MDESD between the two models decreases when the number of clusters (

Lastly, the mismatch between the study design and real data will result in either overestimating or underestimating the power. Specifically, if the randomly varying slope model is used to design the studies where ω = 0, the power will be underestimated; if the nonrandomly varying slope model is used to design the studies where ω > 0, the power will be overestimated. The bias in power estimates due to model mismatch decreases when the sample size for the clusters (

This study focused on two-level CRTs. There are many important directions for further work. First, extending the work to other designs is necessary. This includes multisite randomized trials (MRTs), which are also common designs used to evaluate the effectiveness of programs (

This project has been funded by the National Science Foundation [1437679, 1437692, 1437745, 1913563, 1552535, 1760884]. The opinions expressed herein are those of the authors and not the funding agency.

The authors have declared that no competing interests exist.

The authors have no additional (i.e., non-financial) support to report.