The present research examined the distributional properties of construct reliability indices and model fit metrics, explored relationships between and among the indices and metrics, and investigated variables influencing the relative magnitudes of the indices and metrics in structural equation measurement models. A broad-based meta-analysis of reported construct reliability indices and selected model fit metrics revealed modest relationships among reliability indices, minimal relationships among model fit metrics, and a virtual absence of relationships between reliability indices and model fit metrics. Differences in magnitudes of selected reliability indices and model fit metrics were found to primarily be a function of the (total) number of items employed in a measurement model. The implications of the findings suggest that the current practice of indiscriminately computing and reporting of reliability indices and model fit metrics based only on arbitrary heuristics should be abolished and replaced by theoretically justified indices and metrics.
In a seminal article,
Perusal of the empirical literature reveals that most evaluations of measurement models now begin by assessing the reliability of the constructs being investigated in conjunction with the evaluation of the measurement model itself. If the properties of measures mapping into constructs or latent variables are deemed to possess “adequate” reliability, usually by comparing calculated reliability indices to some heuristically determined criteria, the measurement model itself is then assessed to determine whether it is “acceptable” (
It is generally asserted that the reliability of a measurement model’s constructs and the fit of the model itself are distinct and therefore have to be separately satisfied to validate the measurement model (e.g.,
Given that several different reliability indices and model fit metrics are typically calculated and reported when evaluating a measurement model, the purpose of the present research was to address the question, “Is there an empirical relationship between construct reliability indices and model fit metrics in measurement models?” Further, when addressing this question, the present research also addressed two prefatory questions: “Are there empirical relationships among common construct reliability indices in a measurement model?” and “Are there empirical relationships among common model fit metrics in a measurement model?” A meta-analytic approach based on (non-simulated) data harvested from reported measurement models was employed to answer the three questions. Based on the answers, suggestions as to how to improve the application of measurement models are discussed. As such, findings from the research have broad implications for evaluating, interpreting, and reporting measurement models (e.g.,
Three indices are commonly used to measure the reliability of constructs or latent variables in structural equation measurement models: coefficient alpha, composite reliability (CR), and average variance extracted (AVE). Commonly recommended acceptance thresholds are .70 for alpha and CR, and .50 for AVE (
However, “large” alpha and CR values do not always guarantee “large” AVE values. For example,
Evaluating the fit of a structural equation measurement model involves a statistical comparison of the model-implied population covariance and the observed covariance adjusting for sample size, number of constructs, and/or degrees of freedom ( Interested readers can contact the authors for the results of the other model fit metrics (e.g., AGFI, NFI). The other model fit metrics had a correlation of at least .75 with one of the investigated model fit metrics. Consequently, these model fit metrics were dropped from the analyses.
As recommended by
The possible relationship or lack thereof between reliability indices and model fit metrics remains an intriguing and relatively under-discussed topic. To illustrate, whereas
Articles reporting an empirical reflective structural equation modeling analysis application and containing one or more alpha, CR, AVE, and/or model fit metric values served as the data source for the present research. Terms such as “measurement model,” “confirmatory factor analysis,” “structural equation,” “(coefficient) alpha,” “composite reliability,” “average variance extracted,” “model fit,” and a combination of these terms were searched in Business Source Premier, Communication & Mass Media Complete, JSTOR,
Then, to provide a broad representation of measurement models, an issue-by-issue search was made using the same terms in several prominent journals in psychology, marketing, business, education, information systems, and management. The search covered the period 1996 through 2017. Five hundred fifty-seven articles were initially identified that contained potentially usable measurement model data. Studies applying a partial least squares (PLS) technique were excluded as it is not a covariance-based modeling technique. One hundred-fifty articles were eliminated due to this exclusion criterion.
The final data base consisted of 312 articles reporting 332 studies drawn from 93 journals. The total sample size was 243,700 individuals, with a mean of 734 individuals per study. On average, there were 6.39 constructs per study. All harvested reliability indices and model fit metrics were retained for analysis, even if they were possible outliers. Construct reliability indices were based solely on reflective constructs in a measurement model. On average, the number of reliability indices reported per construct was 2.1, and the number of fit metrics reported per model was 5.3. Based on the harvested data, measures of central tendency, measures of variability, and pairwise correlations respectively among and between reliability indices and model fit metrics were computed.
In addition to these measures, differences among reliability indices (
Characteristic | α | CR | AVE | χ2/ |
CFI | RMSEA | GFI | SRMR |
---|---|---|---|---|---|---|---|---|
1.115 | 1.552 | 1.522 | 246 | 253 | 227 | 143 | 72 | |
.85 | .86 | .69 | 4.01 | .95 | .06 | .91 | .05 | |
.86 | .88 | .69 | 2.01 | .96 | .06 | .91 | .05 | |
.08 | .07 | .12 | 17.23 | .03 | .02 | .04 | .02 | |
Minimum | .38 | .53 | .31 | .89 | .77 | .00 | .73 | .02 |
Maximum | .99 | .99 | .99 | 249.97 | 1.00 | .13 | .99 | .15 |
Range | .61 | .46 | .68 | 249.08 | .23 | .13 | .26 | .13 |
Skewness | -1.02 | -.80 | -.02 | 12.73 | -1.47 | .10 | -.63 | 2.07 |
Kurtosis | 1.74 | .48 | -.60 | 174.66 | 4.76 | 1.67 | 1.31 | 9.26 |
*
Majorities of the reported values for alpha and composite reliability were above the recommended (heuristic) thresholds. For alpha, 77% of the reported values exceeded .80, 96% exceeded .70, and 99% exceeded .60. For CR, 83% of the reported values exceeded .80, 98% exceeded .70, and 99% exceeded .60. Ninety-six percent of the reported AVE values were greater than the recommended threshold of .50. In studies where all three reliability index values were reported, approximately 93% of the alpha, composite reliability, and average variance extracted values were jointly above the heuristic criterion of .70 for alpha and CR and .50 for AVE. Hence, based on the heuristic criteria, the magnitudes of the construct reliability indices in reported structural equation measurement models were generally adequate. Two percent of the reliability indices consisted of alpha and CR values, but not AVE values, meeting the heuristic criteria.
With respect to the model fit metrics, the mean χ2/
A majority of the reported model fit metric values met the recommended (heuristic) thresholds. For χ2/
Three sets of pairwise correlations were computed based on reported construct reliability indices and model fit metrics: (i) within-set correlations between comparable reliability indices, (ii) within-set correlations between comparable model fit metrics, and (iii) cross-set correlations between comparable reliability indices and model fit metrics.
Metric | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
1. α | - | ||||||
2. CR | .78** (697) | - | |||||
3. AVE | .62** (786) | .71** (1.388) | - | ||||
4. χ2/ |
.02 (785) | .09** (1.242) | .06 (1.208) | - | |||
5. CFI | .08* (816) | .05 (1.272) | .13** (1.245) | -.08 (232) | - | ||
6. RMSEA | -.02 (782) | .00 (1.147) | .01 (1.130) | .23** (211) | -.43** (218) | - | |
7. GFI | -.09* (471) | -.14** (748) | -.06 (707) | .06 (132) | .50** (129) | -.36** (123) | - |
8. SRMR | -.09 (241) | -.11* (353) | -.11* (373) | -.06 (64) | -.48** (69) | .22 (64) | -.17 (27) |
*
Although five of the 10 correlations among the model fit metrics were statistically significant at the .05 significance level, they were relatively small in magnitude. In particular, they ranged from |.06| to |.50|, with a median value of |.23|. Thus, the model fit metrics had pairwise shared variances ranging from zero to 25%, with a median of 5%. Further, inspection of In
The 15 pairwise correlations between construct reliability indices and model fit metrics were minimal to nonexistent. As can be seen in
In those instances wherein reported alpha, CR, and AVE values all exceeded the recommended reliability threshold criteria, 4.5%, 41.3%, 10.6%, 41.2%, and 1.4% of the studies respectively reporting χ2/
Because there were minimal or nonexistent relationships among the measurement model fit metrics, regression analyses were first carried out to determine whether there was a relationship between the magnitude of a fit metric and the size of the sample used, the total number of items linked to the constructs, and the number of constructs in a measurement model. Only the number of items had a significant, but not substantial, effect on χ2/
Similarly, because there were at best moderate relationships between the construct reliability indices and minimal or nonexistent relationships between the measurement model fit metrics, analyses were undertaken to compare the respective magnitudes of the reliability indices and model fit metrics. Since the heuristic for χ2/
Characteristic | CR – α | CR – AVE | α – AVE | CFI – RMSEA | CFI – GFI | CFI – SRMR | GFI – RMSEA | GFI – SRMR | RMSEA – SRMR |
---|---|---|---|---|---|---|---|---|---|
697 | 1.388 | 786 | 218 | 129 | 69 | 123 | 27 | 64 | |
.01** | .18** | .18** | .02** | .05** | .00 | -.04** | -.05** | .01* | |
.00 | .18 | .18 | .02 | .05 | .01 | -.04 | -.04 | .01 | |
.05 | .09 | .09 | .03 | .04 | .03 | .04 | .05 | .03 | |
Minimum | -.20 | -.07 | -.13 | -.17 | -.07 | -.11 | -.23 | -.14 | -.08 |
Maximum | .29 | .53 | .50 | .10 | .18 | .09 | .05 | .05 | .09 |
Range | .49 | .60 | .63 | .27 | .25 | .20 | .28 | .19 | .17 |
Skewness | .57 | .20 | -.19 | -1.54 | .59 | -.77 | -.95 | .06 | .02 |
Kurtosis | 5.92 | .06 | .23 | 7.99 | 2.21 | 3.82 | 3.31 | -.13 | 3.02 |
*
Pair | Sample size |
Number of items |
Number of constructs |
|||
---|---|---|---|---|---|---|
β | β | β | ||||
CR – α | .10** | 1.0 | -.02 | 0.0 | -.17** | 2.7 |
CR – AVE | .02 | 0.0 | .47** | 22.5 | -.18** | 3.4 |
α – AVE | -.07* | 0.6 | .39** | 15.3 | -.10** | 0.9 |
CFI – RMSEA | .04 | 0.2 | -.17** | 4.1 | -.08 | 0.7 |
CFI – GFI | -.13 | 1.8 | .37** | 14.8 | .30** | 9.0 |
CFI – SRMR | -.16 | 2.5 | -.26* | 6.7 | .08 | 0.6 |
GFI – RMSEA | .10 | 1.0 | -.43** | 21.8 | -.32** | 10.3 |
GFI – SRMR | .14 | 2.1 | -.51** | 25.6 | -.23 | 5.3 |
RMSEA – SRMR | .31* | 9.7 | -.08 | 0.8 | -.18 | 3.3 |
*
The present research systematically examined and analyzed actual construct reliability and model fit metric data from 332 studies reporting the results of applying measurement models. The majority of construct reliability indices and model fit metrics examined exceeded the respective heuristic thresholds that have been developed and refined across decades of construct measurement and structural equation modeling. This is to be expected given the typical reporting practices of researchers and the evaluation protocols of journal reviewers and editors when deciding whether a measurement model is acceptable or valid. In particular, based on traditional heuristic criteria, most measurement models reported in journal articles are assumed to be acceptable or valid in that construct reliabilities and model fits are both satisfied. If they were not acceptable, to either the researcher(s) submitting them for publication consideration or the reviewer(s) and editor(s) evaluating them, the measurement model would most likely not appear in a journal article.
Properly assessing and reporting reliability constructs and model fit metrics are imperative for evaluating, interpreting, and communicating the validity of a measurement model. Indeed, researchers are increasingly using construct reliability indices and model fit metrics as complementary measures when gauging the extent to which a measurement model is valid. However, until this study, there has been no documentation or comparison of the distributional properties of reported construct reliability indices alpha, CR, or AVE and model fit metrics χ2/
Because the findings reported in this manuscript are based on a broad range of research domains and a diversity of measurement model applications, they possess generality. Consequently, the findings are amenable to being employed actuarially to both complement and supplement currently recommended threshold heuristics when assessing the “adequacy” of a measurement model to acquire a more nuanced evaluation of model adequacy. Such a use is consistent with the call for flexible threshold criteria when evaluating measurement models (e.g.,
The present research answered the initial three questions guiding the research in that it revealed that empirical relationships among the three construct reliability indices were modest at best, minimal among the five model fit metrics, and practically nonexistent between construct reliability indices and model fit metrics. In brief, the reliabilities of constructs in a measurement model and the associated model fit metrics examined were effectively independent and unrelated. As such, the research empirically corroborates conventional wisdom that construct reliability and measurement model fit are essentially distinct concepts.
The present research complements the research of
The present research likewise empirically demonstrated that while it is possible to obtain acceptable reliabilities but unacceptable model fit metrics, or acceptable model fit metrics but unacceptable reliabilities (e.g.,
Another contribution of the present research is that it documented factors that contribute to differences in the magnitudes of construct reliability indices and model fit metrics, even those created to purportedly measure the same measurement model characteristic. Reliability indices and model fit metrics respectively quantify different aspects of reliability and model fit, and are in many instances based on different assumptions and have been created for different purposes (e.g.,
In addition to demonstrating that the model fit metrics examined were measuring model fit from different perspectives, the present research provided evidence that fit metric values reported for a particular measurement model tended to be significantly different, even though similar cutoff criteria were frequently applied to the fit metrics. Importantly, the present research showed empirically that to a substantial extent differences in the magnitudes of model fit metric values were due to the total number of items incorporated in a measurement model. For example, the larger the number of items incorporated in a measurement model, the smaller the difference between the value of GFI relative to the value of SRMR, and the larger the value of CR relative to the value of AVE. Hence, for a comprehensive assessment of a measurement model, it is necessary to go beyond simply calculating and reporting goodness-of-fit measures and take into consideration research design characteristics. That the relative magnitudes of several of the indices and metrics investigated appeared to be sensitive to the (total) number of items incorporated in a measurement model requires replication and further investigation.
There is virtual consensus that measurement models must be based on theory and not merely be a consequence of “fishing,” “dustbowl empiricism,” HARKing, or “
Stated somewhat differently, assuming there is sound theory, although a full complement of reliability indices and model fit metrics are typically computed when constructing a measurement model, given the present results, only those that can be justified by the underlying theory should be assessed and reported. Indiscriminately reporting all measurement statistics provided by standard computer software programs (e.g., LISREL, AMOS, EQS, Mplus) without linking or justifying the use of individual indices or fit metrics to the theory underlying the measurement model is unwarranted.
Results from the construct reliability analysis revealed that alpha, CR, and AVE respectively had shared variances of 61%, 38%, and 50%. Hence there was unshared variance due to other causes (e.g., measurement error) as well as the fact that what they measure differs (e.g.,
Similarly, model fit metrics had pairwise shared variances ranging from none to 25%. Given the large unshared variances among the model fit metrics, different measurement model characteristics are being assessed by different fit metrics. Hence, discretion should be followed when selecting model fit metrics to compute, evaluate, and report on the appropriateness of a measurement model. Depending on the theory underlying the model and research characteristics such as sample size, number of items, data type, and model testing stage, certain model fit metrics are more appropriate than other model fit metrics (e.g.,
The rhetorical question raised by the present research relates to the broad issue of constructing and evaluating a measurement model. If there are relatively weak relationships between reliability indices and between model fit metrics, and virtually no relationship between the reliability of a measurement model (i.e., its constructs or latent variables) and the fit of the model, what and how should construct reliability indices and model fit metrics be assessed? What does it mean if the various reliability indices and model fit metrics are essentially unrelated?
This study is not without limitations. Analogous to any meta-analysis, contrary to the best intentions, it is possible that relevant studies were missed. And, despite the attempt to capture and quantitatively analyze measurement model data from original research articles, there is the possibility of publication bias that is inherent in quantitative reviews. This could be potentially significant given that, as mentioned previously, measurement models that are found to be “unacceptable” are not likely to appear in the literature. Such possible “truncation” may have influenced the relationships observed in the present research. Moreover, because the analyses were based on widely differing sample sizes, and in some instances rather small sample sizes, a more comprehensive multivariate analysis that could have produced more insights into the relationships between and among construct reliability indices and model fit metrics as well as between the indices and metrics and the research design characteristics investigated was not possible. Consequently, future research efforts should address these possible limitations.
Future research should also strive to better understand the relationship between the reliability of the constructs in a measurement model and the associated measurement model fit metrics. Too often it seems that reliability indices possess no decisional utility. They are simply calculated and reported as an afterthought with no assessment. For example, perusal of the literatures that produced the data for this study revealed an abundance of perfunctory reporting of reliability indices with no commentary on their assessment or application when evaluating the appropriateness of a measurement model.
Three possible extensions of the present study are (i) a further meta-analytic assessment of relationships between and among reliabilities and model fits that also includes factor loadings and selected research design characteristics other than sample size and number of items or constructs and takes into account additional disciplines; (ii) simulations incorporating different levels of measurement error and different test conditions (e.g., dropping items, re-specifying the measurement model) to compare their joint effects on construct reliability indices and model fit metrics; and/or (iii) one or more large-scale empirical studies specifically designed to take into account a variety of variables, constructs, measurement models, research design characteristics, and reliability indices and model fit metrics under controlled conditions. For instance, the impact that the number of items in a measurement model has on the relative magnitudes of the reliability indices and fit metrics requires further study to determine its origin or mechanism.
Moreover, analytical research should be undertaken to demonstrate and understand why construct reliability indices are not related to model fit metrics. The present research and prior simulation research have documented the relationships and discussed the conceptual reasons for them. Analytical demonstrations would seem necessary to provide a transparent and comprehensive perspective to obtain closure on the question.
In conclusion, the present results reinforce the need for additional (global) statistics that take into account and simultaneously incorporate both construct reliability and model fit when assessing a measurement model (e.g.,
The authors have no funding to report.
The authors have declared that no competing interests exist.
The authors have no support to report.