Cronbach’s alpha (Cronbach, 1951) stands as one of the most widely used coefficients reflecting the interrelatedness of items (Sijtsma, 2009; Sijtsma & Pfadt, 2021). Despite considerable debate surrounding its utilization (Cho & Kim, 2015; Cortina, 1993; Green & Yang, 2009; Kelley & Pornprasertmanit, 2016; McNeish, 2018; Raykov & Marcoulides, 2019; Schmitt, 1996; Sijtsma & Pfadt, 2021), the estimation and the testing of the alpha coefficient have gained critical attention in applied settings. Ensuring the reliability of a measure remains crucial for correctly interpreting the effects of experimental variables. However, it has been evidenced that most reliabilities fall short of ideal standards in making precise, confident decisions (Charter, 2003a). Furthermore, as investigative tasks heavily rely on sample information, determining sample size in the initial stage of research design is pivotal in reducing sampling errors. While systemic studies have discussed sample size calculation, particularly in Intraclass Correlation Coefficient (ICC) studies (Donner & Eliasziw, 1987; Shieh, 2014a; 2014b; Shoukri et al., 2004; Walter et al., 1998), the discussion regarding Cronbach’s alpha remains relatively scarce.
Estimating the necessary number of participants to yield meaningful results proves challenging (Charter,1999; Cocchetti,1999; Flight & Julious, 2016; Peterson & Kim, 2013). On one hand, the number of subjects might be too small to produce sufficiently precise reliability coefficients or enough statistical power for hypothesis tests (Charter, 2003b; Heo et al., 2015; Yurdugül, 2008). On the other hand, the number of measurements (items, raters) might be too large to lack cost-effectiveness (Hsu, 1994; Overall & Dalal, 1965). Note that the magnitude of the coefficient alpha is contingent upon the number of items, with a curvilinear relationship (Komorita & Graham, 1965). This aspect necessitates further investigation into subject/item size determination. Specifically, this involves considerations of constructing confidence intervals and assessing cost-effectiveness in reliability estimation.
The conventional practice of reporting alpha coefficients as point estimates impedes interpretation and replication (Terry & Kelley, 2012) since the alpha estimate is influenced by variance sources and contains unknown-direction sampling errors. Recommending the reporting confidence intervals (CIs) aims to enhance the trustworthiness of reliability (American Psychological Association [APA], 2001; Bonett & Wright, 2015; Fan & Thompson, 2001; Iacobucci & Duhachek, 2003; Kelley et al., 2003) and to convey information related to precision and reproducibility, especially in cases of very large or small sample sizes (Mendoza & Stafford, 2001). However, the mere use of CIs does not inherently enhance statistical practice (Cumming, 2014; Morey et al., 2016) without proper sample size planning (Liu, 2009). Liu (2012) also noted that current sample size planning typically aims to achieve the power of a statistical test under specified alternative hypotheses, rather than constructing precise confidence intervals. Notably, sample size influences CI width (Charter, 1999). The process of planning sample sizes to obtain CI precision has some parallels to planning for statistical power but often results in significantly different sample size requirements (Borenstein et al., 2001; Goodman & Berlin, 1994). Moreover, researchers lack well-established criteria for determining CI widths (Smithson, 2003), and may overlook the stochastic nature of interval width, i.e., a CI width is a random variable, where approximately half the time, the computed CI width exceeds the desired width in repeated sampling (Terry & Kelley, 2012), potentially underestimating required sample sizes (Liu, 2009).
In light of the various applications of integration of hypothesis testing and confidence intervals for obtaining the needed sample sizes, the present study employed the concept of event rejection, validity, and width (Jiroutek et al., 2003), defined as follows: an event rejection (R) is said to occur if the null value of Cronbach’s alpha is rejected; an event validity (V) is said to occur if Cronbach’s alpha is contained between the upper and lower CI limits, and an event width (W) is said to occur if the width of the CI is no larger than the desired width w. The probabilities of the aforementioned events are then denoted as , , and , respectively. There are combinations of these events for various scenarios. Consider a scenario where a researcher aims to limit the width of a CI, conditional upon Cronbach’s alpha falling between the lower and upper CI bounds. Determining the necessary sample size in this case ensures achieving a probability at a desired level of . Another instance arises when, alongside rejecting the null hypothesis, there's a desire to construct a confidence interval within a specified width. Calculating the required sample size for reporting Cronbach’s alpha aims to warrant that can achieve a desired probability of . This integrative approach to sample size planning, addressing multiple conditional probabilities, remains infrequently explored in literature, apart from the work by Terry and Kelley (2012), who focused on for composite reliability coefficients. Other studies have discussed various conditional probabilities (Beal, 1989; Jiroutek et al., 2003; Liu, 2012). Our study addresses a total of nine unconditional/conditional probabilities (cases), namely:
-
1. .
-
2. .
-
3. .
-
4. .
-
5. .
-
6. .
-
7. .
-
8. .
-
9. .
Yet another crucial consideration within reliability studies involves balancing the cost of data acquisition against the precision/accuracy of estimates. Surprisingly, a cost-efficient design, rooted in health economics, often remains neglected (Rezagholi & Mathiassen, 2010). In practical applications, the acquisition of raters or certain measurements entails a considerable expense. When prioritizing budget constraints, obtaining substantial information at minimal cost necessitates optimizing the configuration of both the number of measurements (i.e., items, raters) and subjects (or observations) (Shoukri et al., 2003). To address these pivotal concerns, this study proposes the development of a framework that integrates hypothesis testing and confidence intervals. The aim is to determine the optimal number of measurements/subjects, guided by two key objectives: (a) minimizing the total cost for a desired probability, and (b) maximizing the probability of interest within a predefined total cost. Additionally, to enhance accessibility for researchers, this proposed procedure has been translated into two R Shiny apps (Diedenhofen & Musch, 2016; R Development Core Team, 2020). Leveraging rapid advancements in computing technology, there is a renewed opportunity to stimulate interest in sample size planning, specifically exploring various conditional probabilities.
The subsequent sections of this study are organized as follows. The Measurement Model section delves into an elucidation of Cronbach’s alpha using a measurement model. In the section, Method for Acquiring Number of Measurements and Subjects, we detail the methodology for acquiring pairs of measurements/subjects for , , and , respectively. The Proposed Apps section showcases the functionality of the proposed apps concerning objectives (a) or (b) with an illustrative example. Moving to the Tables and Simulations section, we present three tables and simulation results. Finally, the Discussion and Conclusions section encapsulates the study with some best-practice suggestions.
The Measurement Model
Cronbach’s describes the reliability of a sum (or average) of m measurements (test items, raters, occasions, or alternative forms). To evaluate the Cronbach’s coefficient, a model for the parallel- measurements score is given as:
1
where is the true score of subject and is the error of measurement for subject , (subjects); (measurements). We also assume that { } are normally and identically distributed with mean 0 and variance , { } are normally and identically distributed with mean 0 and variance ; and { } and { } are independent. That is, the random vector is distributed as a multi-normal distribution with mean 0 and a covariance of . Based on Feldt (1965, 1969) and Kraemer (1981), the estimated Cronbach’s coefficient can be expressed as
2
where is the mean square for measurement by subject, is for subjects, and estimates the population value of Cronbach’s alpha ( ) as
3
Based on Yurdugül (2008) and Heo et al. (2015), Equations (2) and (3) can be re-expressed as
4
where S is the unbiased sample covariance matrix; is the transpose of column vector with m unit elements; is the variance-covariance matrix of the population; and is the sum of the diagonal elements of the square matrix . Note that under the assumption of parallel measures (two measures have identical true scores and equal error variances), we can obtain , where is an identity matrix; hence, Equation (4) is identical to Equations (2) and (3). Also note that the coefficient alpha is satisfactory if the less restrictive essentially tau-equivalent assumption (i.e., unequal variances but equal covariances) holds (Sijtsma & Pfadt, 2021) in the case of approximate unidimensionality.
From Kristof (1963) and Feldt (1965), we know that is distributed as a central F-distribution with ( ) and degrees of freedom. Therefore, we define the test statistic
5
which is distributed as an F-distribution with and degrees of freedom. In the present study, to find the number of measurements/subjects, we applied distribution F, based on the distributional theory derived by Feldt (1965) and Feldt et al. (1987), described in the Method for Acquiring Number of Measurements and Subjects section.
Method for Acquiring Number of Measurements and Subjects
In this section, we consider events rejection, validity, and width and their corresponding , , and , respectively to acquire pairs of measurements/subjects by using F distribution. First, to enhance the clinical interpretation of testing Cronbach’s alpha, the null hypothesis against the right-tailed alternative hypothesis is considered in the following manner:
6
Note that the hypothesis testing described here does not invoke the nil null hypothesis that the score reliability is 0 (Fan & Thompson, 2001) but instead is a golden standard or a particular criterion (Kuijpers et al., 2013; Nunnally & Bernstein, 1994). For a significance level , to test the null hypothesis , we have
7
where and is the quantile of distribution F with and degrees of freedom. Hence, based on , can be rejected when
8
which can be defined as an event rejection (R). For the alternative hypothesis with a specified value , the power function will coincide with P(R), the probability of the event rejection, as
9
where . To achieve the desired power we must set
10
Then, we can find various pairs of measurement m with its corresponding number of subjects n to satisfy Equation (10), that is, to satisfy .
Second, for , various numbers of measurements/subjects for constructing two-sided and one-sided CIs of coefficient with a desired probability are described. A confidence level is set to (i.e., the probability of the event validity, ). To form a two-sided CI, from Equation (5) and by Feldt et al. (1987), it can be shown that
where . The lower confidence limit (LCL) and the upper confidence limit (UCL) are denoted as and , respectively. Hence, a ( ) 100% two-sided CI is [LCL, UCL], for which the width is
11
an increasing function of . In some contexts, there is rationale in acquiring a one-sided CI, structured as [LCL, 1]. It can be shown that
Here, and . The width of the one-sided CI is defined as
12
From Equations (11) and (12), it is known that the width of a CI is an increasing function of given , , and . Hence, the width is a random variable.
Third, we define the event width ( ) as for a two-sided CI, or for a one-sided CI, where w is the desired width chosen as sensibly as . The probability of the event width is
13
where ; and = for a two-sided CI; and and = for a one-sided CI, respectively. Then, to achieve , we must set
14
by replacing with a planning value obtained from expert opinion or prior research (Bonett, 2002). Then, pairs of (m, n) can be obtained to satisfy Equation (14). For other unconditional/conditional probabilities, the pairs can be obtained by using the proposed apps demonstrated in The Proposed Apps section.
The Proposed Apps
In the framework of integration of hypothesis testing and CIs for nine unconditional/ conditional cases, when the budget is the primary concern, we derived an optimal pair (m, n) under cost constraint. Let represent the cost per measurement, represent the cost per subject, and represent the cost per observation. The total cost can be expressed as
15
when a pair of ( ) is given (Eliasziw & Donner, 1987). To be ethically and economically feasible for objectives (a) and (b) outlined in the introduction, we have developed two R Shiny apps described as follows to either minimize the total cost for a desired probability or to maximize the probability of interest within a predefined total cost.
For objective (a), App (I) (see Luh, 2024a) is designed based on the section Method for Acquiring Number of Measurements and Subjects, employing an exhaustive search method. To use App (I), researchers start by selecting the specific event of interest (case), setting a significance level, a desired probability, a planning value for an alternative , and determining the number of measurements up to which all outcomes will be printed. For cases related to the event (R), researchers input a null hypothesis value which should be smaller than . Regarding cases linked to the event (W), users specify the desired width of CI and whether it’s one- or two-sided. Additionally, if there is a cost constraint, they can specify the unit cost of measurement, subject, and observation can be specified (refer to Figure 1). Upon entering these values, clicking “Submit” executes App (I), displaying a list of measurement-subject pairs that satisfy the desired probability. Among the pairs with the minimal total cost, an optimal pair with the maximal probability for objective (a) is highlighted at the bottom of the output.
Figure 1
For objective (b), the probability of interest is maximal for a given cost (C). From Equation (15), we have . Thus, for given measurements , the corresponding number of subjects is . The optimal pair that has a maximal probability can be derived by using an exhaustive algorithm. We offer App (II) (see Luh, 2024b) for a user-friendly application. Researchers need to specify the event of interest and the corresponding parameters. Additionally, the fixed total cost (C), as well as the unit cost of measurement, subject, and observation, are required. The output presents the optimal measurement-subject pair along with its corresponding total cost and the maximum attainable probability within this total cost.
In the following, we utilized an example from Bonett (2002) to demonstrate the functionality of the proposed apps. For objective (a), to test versus at , with a desired probability (power) of .9 and a given number of measurements , App (I) indicated the required number of subjects for as , a result close to Bonett’s 173. Additionally, aiming for a desired precision, , Bonett (2002) set the planning value , the desired absolute precision of 0.2 with 95% confidence for a two-sided CI, which yielded a total of 99 subjects. However, our simulation revealed an empirical probability of .5429, roughly equal to a probability of 1/2. Using the statpsych R package, the sample size was determined as 95 by the command size.ci.cronbach(.05, .7, 4, 0.2), still falling short of the desired probability (1- ) of .8. Contrastingly, based on App (I), the required number of subjects was 123, resulting in an empirical probability of .8004 from our 10,000 simulations.
Subsequently, if the primary concern is the total cost, assuming the costs of obtaining a single measurement, , and a single subject, , are both $1, while a single observation, , is $0. App (I) derived the optimal pair (11, 101) with a minimal cost of $112 for objective (a) in the case of . Finally, for objective (b), with a fixed cost of $112, employing App (II) with and 1- for a two-sided CI, we obtained the optimal number of measurements and subjects as and , respectively, resulting in a maximal probability of .804072.
Tables and Simulations
Tables
To aid applied researchers, three tables are presented, generated by running App (I), showcasing key characteristics regarding the interrelation of the inputted parameters with the number of measurements/subjects. Table 1 exhibits the configuration of the desired width ( ) and the planning value ( ) across nine probability cases, offering significant insights into various aspects. First of all, as anticipated, while keeping other factors constant, a wider desired width alongside the larger planning value; refer to Equations (11) and (12), generally necessitates fewer measurements/subjects, excluding Cases 1 and 2. Second, the Cases 1. and 2. , solely involving the event rejection, are contingent upon the difference value, ; the larger the difference, the fewer measurements/subjects required; refer to Equation (10). These outcomes align with the pattern demonstrated in Table 2 of Jiroutek et al. (2003). Thirdly, for cases solely involving the event width, note that due to and , the required number of measurements/subjects in Case 5 is slightly higher than or equal to that in Case 3 and Case 4. Note that the resulting numbers are similar in Cases 1 and 2, and in Cases 3, 4, and 5 because P(V) is as high as .95 (i.e., ). Fourthly, for cases involving both events of width and rejection, note that and . Thus, Case 7 and Case 9 needed relatively more measurements/subjects than Case 6 and Case 8, respectively. Finally, among all cases, Case 9 necessitates the largest number of measurements/subjects due to the inclusion of all three events. Put simply, the higher the corresponding probability value of the case, the fewer measurements/subjects are required to achieve the desired probability . Generally, the conditional probabilities are larger or equal to those corresponding probabilities of events with intersection, leading to a slightly reduced number of required measurements/subjects. Furthermore, due to negligible differences, the following Tables 2 and 3 do not display these conditional probabilities (Cases 4, 6, and 8).
Table 1
Desired Widtha |
|||
---|---|---|---|
Case | 0.1 | 0.2 | |
0.8 | 1. | 10, 83 | 10, 83 |
2. | 10, 89 | 10, 89 | |
3. | 13, 165 | 8, 54 | |
4. | 13, 163 | 8, 53 | |
5. | 14, 167 | 8, 55 | |
6. | 13, 162 | 6, 34 | |
7. | 13, 165 | 10, 83 | |
8. | 13, 163 | 10, 79 | |
9. | 14, 167 | 10, 89 | |
0.85 | 1. | 6, 32 | 6, 32 |
2. | 6, 34 | 6, 34 | |
3. | 11, 101 | 7, 35 | |
4. | 11, 100 | 7, 34 | |
5. | 11, 104 | 7, 37 | |
6. | 11, 101 | 6, 28 | |
7. | 11, 101 | 7, 35 | |
8. | 11, 100 | 7, 34 | |
9. | 11, 104 | 7, 37 | |
0.9 | 1. | 4, 15 | 4, 15 |
2. | 5, 15 | 5, 15 | |
3. | 8, 54 | 5, 21 | |
4. | 8, 53 | 5, 21 | |
5. | 8, 55 | 5, 22 | |
6. | 8, 54 | 5, 18 | |
7. | 8, 54 | 5, 21 | |
8. | 8, 53 | 5, 21 | |
9. | 8, 55 | 5, 22 |
Note. Setting , , , , , and .
aTwo-sided CIs.
Table 2
Number of Measurements (m)
|
||||
---|---|---|---|---|
Case | 10 | 15 | 20 | 25 |
83 | 80 | 78 | 77 | |
89 | 86 | 85 | 84 | |
169 | 163 | 160 | 159 | |
173 | 166 | 163 | 162 | |
169 | 163 | 160 | 159 | |
173 | 166 | 163 | 162 |
Note. Setting , , , , w = 0.1 (two-sided CIs), , , and .
Table 3
Case | (1)a | (2)b | (3)c | (4)d |
---|---|---|---|---|
10, 83 | 5, 94 | 20, 78 | 7, 87 | |
10, 89 | 6, 97 | 21, 84 | 7, 94 | |
10, 84 | 5, 94 | 19, 79 | 6, 90 | |
10, 86 | 5, 97 | 19, 81 | 7, 90 | |
10, 84 | 5, 94 | 19, 79 | 6, 90 | |
10, 89 | 6, 97 | 21, 84 | 7, 94 |
Note. Setting , , , , and w = 0.15 (two-sided CIs).
a , , . b , , . c , , . d , , .
To delve deeper into the relationship between the number of measurements and subjects, Table 2 presents the required subject sizes (n) while holding the number of measurements (m) constant. Considering the test reliability and test time, the number of measurements ranged from m = 10 to 25. Using the proposed App(I), it is evident that with an increase in the number of measurements, the required subject sizes decrease, albeit inconspicuously. Likewise, as more events are considered, the larger the number of subjects is needed.
To delve deeper, Table 3 highlights the trilateral relationship among the number of measurements, subjects, and costs. It delineates four conditions based on various cost scenarios. A comparison of Columns 1 and 2 reveals that an increased unit cost for a measurement ( ) results in a decreased number of required measurements. Similarly, a higher cost for a subject ( ) (see Column 3) leads to a reduced number of necessary subjects to achieve objective (a) at a minimal cost. Finally, when each observation incurs a cost, i.e., (see Column 4), the multiplicative effect of measurements and subjects contributes to minimizing the total cost. In other words, the total number of observations ( ) is decreased. Taking as an example, the total number of observations is reduced from 1501(= 19 79) to 540 (= 6 ). Moreover, Figure 2 shows the comparison of the resulting total costs under the condition of , , and . It can be observed that the lowest total cost is $420 with m = 6, n = 90. The cost can increase to $640.4 as the number of measurements reduces to 2, and up to $532 as the number of measurements increases to 25. The increase rates in terms of costs are 52.5% (= (640.4-420)/420) and 26.7%, respectively. From a practical point of view, there is much to gain from cost optimization.
Figure 2
Simulations
In simulations, two criteria were used to validate the proposed apps—empirical probability and coverage rates. We first executed App (I) by setting , , , , , , and to obtain the optimal pair (m, n) for Case 4. as (10, 85) and (8, 53) for a one- and two-sided CI, respectively. Then, given m, and , we obtained based on Equation (3). To conduct simulation experiments, we generated from a normal distribution with mean 0 and variance by using the R rnorm function to form , ( , ), for each . Next, for each subject , we generated for , from a normal distribution with mean 0 and variance . Finally, the observed score was derived by adding and meeting the additivity condition. In the subsequent simulation, we set = 1 to obtain = 2.5 for one-sided CIs and = 2 for two-sided CIs. Two simulations were conducted with 10,000 replications each for one- and two-sided CIs to report the empirical probabilities across nine cases. The simulation outcomes reveal nearly identical empirical probabilities to the corresponding theoretical probabilities (refer to Figure 3 for two-sided CIs) and demonstrate excellent coverage rates. Detailed results are reported below.
Figure 3
For two-sided CIs, Figure 3 illustrates that Case 4 aligns with the theoretical probability as the desired level of .80. However, Cases 1, 2, 5, and 7 to 9 fall short due to insufficient measurements/subjects, whereas Case 6 exceeds the desired level due to notably larger given values of (m, n) = (8, 53) compared to the intended values of (6, 34), as shown in Table 1. For researchers seeking to calculate the theoretical probability, we include R codes in the Appendix. Moreover, we observed that the empirical distribution of estimates exhibited leftward skewness, featuring a mean of 0.7922 (SD = 0.0449), a median of 0.7978, and a coverage rate of 95.18% (i.e., empirical ). Moreover, among the confidence intervals encompassing the planning alpha value, the average width was 0.1704, close to w = 0.2.
For one-sided CIs, we observed that the empirical distribution of estimates was also leftward skewness with a mean of 0.7956 (SD = 0.0336), a median of 0.7993, and a coverage rate of 94.94%. Moreover, for those confidence intervals containing the planning alpha value, the average width was 0.2669, which is close to the desired width of 0.3 ( ). The empirical probabilities for the nine cases were as follows:
-
1. .
-
2. .
-
3. .
-
4. .
-
5. .
-
6. .
-
7. .
-
8. .
-
9. .
Discussion and Conclusion
The determination of sample size stands as a crucial and integral part of study planning, vital to achieving robust statistical power and precise estimation—a cornerstone of sound statistical practice. In addressing the need for enhanced clinical interpretation and cost-effectiveness, our study aimed to contribute to this evolving field by establishing the required number of measurements/subjects for evaluating Cronbach’s alpha within a comprehensive framework encompassing hypothesis testing and confidence intervals. The introduction of our proposed apps represents a novel advancement, enabling researchers to identify optimal configurations of measurements and corresponding subjects across various events of width, validity, and rejection, crucial for achieving desired probabilities. Our empirical findings underscore the accuracy of the obtained optimal numbers, exhibiting excellent coverage rates and near-identical empirical probabilities to the desired ones. Significantly, our study illuminates the intricate interplay among the number of measurements, subjects, and costs.
It is important to note that the calculations presented here are tailored for parallel measurements and normal distributions. We acknowledge prior research (Liu et al, 2010; Olvera Astivia et al., 2020) highlighting the consequence of violating distributional assumptions and the presence of outliers. However, studies by Raykov (1997), Yuan and Bentler (2002), Osburn (2000), and our observations from simulations indicate that under specific yet verifiable conditions, coefficient alpha remains minimally affected by population deviations from scale reliability. Hence, our work stands as an initial guide for sample size planning and lays a foundation for future investigations. Subsequent studies might expand to encompass intraclass correlation coefficient (ICC) cases, facilitating adaptability across a spectrum of research designs.