^{a}

Within large-scale international studies, the utility of survey scores to yield meaningful comparative data hinges on the degree to which their item parameters demonstrate measurement invariance (MI) across compared groups (e.g., culture). To-date, methodological challenges have restricted the ability to test the measurement invariance of item parameters of these instruments in the presence of many groups (e.g., countries). This study compares multigroup confirmatory factor analysis (MGCFA) and alignment method to investigate the MI of the schoolwork-related anxiety survey across gender groups within the 35 Organisation for Economic Co-operation and Development (OECD) countries (gender × country) of the Programme for International Student Assessment 2015 study. Subsequently, the predictive validity of MGCFA and alignment-based factor scores for subsequent mathematics achievement are examined. Considerations related to invariance testing of noncognitive instruments with many groups are discussed.

Within large-scale international studies, surveys represent an important approach to the collection of comparative data for understanding individuals’ attributes within and across countries. For instance, the Programme for International Student Achievement (PISA) and the Trends in International Mathematics and Science Study yield cross-national data on students’ academic achievement. Across such studies, surveys play a vital role in the operationalization of theoretically meaningful constructs with scores used to test and develop theory, conduct cross-national comparisons, and inform policy. Irrespective of the construct or study context, two key conditions related to score validity are the accuracy of the adaptation of item content and their statistical equivalency across groups (e.g., culture;

MI indicates that an instrument’s item parameters are equivalent across groups and, thus, a prerequisite to comparisons based on mean score differences (

Multigroup confirmatory factor analysis (MGCFA) is the most commonly used approach to testing item parameter invariance. MGCFA requires the sequential comparison of nested models that differ in terms of the item parameters constrained equal across groups to identify (non)invariant parameters. Typically, comparisons are made across a small number of groups (e.g., ≤ 3) with the aim of establishing at minimum

The alignment method (

In response, MGCFA and the alignment method are used to test item parameter invariance and group-specific factor mean estimation across gender groups within and across countries, resulting in the analysis of 70 groups (i.e., 35 countries × 2 genders). Consideration of gender within countries serves to examine the comparability of these methods for invariance testing and identify factors that may explain noninvariance of international survey data beyond the country level. Further, this study examines the predictive validity of MGCFA and alignment- based factor scores for students’ mathematics achievement.

The following multiple-group factor analysis model identifies the measurement parameters of focus in invariance studies

where _{ipg}

Within MGCFA, invariance testing proceeds through the comparison of nested models that differ according to the equality constraints imposed on model parameters to determine their across group equivalence. An instrument’s factor structure can demonstrate three types of invariance:

MGCFA begins with a baseline model in which to begin the process of identifying invariant item parameters. One approach begins with the configural model, followed by the specification of models with equality constraints imposed on specific parameters to test metric and, subsequently, scalar invariance (see

Concerns have been raise regarding the impracticality of MGCFA to testing MI with many groups (

The alignment method is an approach to estimate group-specific factor means and variances in the presence of measurement noninvariance (

Similar to MGCFA, the alignment method begins with the specification of a statistically well-fitting configural model, referred to model M0 (

Alignment optimization is the second step and focuses on the estimation of the group-specific factor means and variances that reduce the amount of measurement noninvariance of the item parameters across groups. Specifically,

where _{1}_{2}_{pg1}_{pg2}_{pg1}_{pg2}

(

The corresponding weight factor, _{g1,g2}

where _{g1}_{g2}

There are two approaches to alignment optimization. The first is the

Beyond estimating group-specific factor means, the procedure (as conducted in ^{2} reported for each item parameter reports the amount of across group parameter variation in the configural (M0) model accounted for by the variation in the factor means and variances across groups, with values ranging between 0 and 1 (values closer to 1 indicative of higher invariance). Collectively, the information can serve as a guide to subsequent decisions related to item functioning and development.

There are several attractive features of the alignment method for invariance testing. First, the procedure is automatic and, thus, reduces the tediousness associated with inspecting modification indices to conduct numerous statistically driven hypothesis tests. Second, model-data fit after alignment optimization is the same as the configural model. Third, it is applicable to measurement instruments comprised of a small number of items and data collected in a complex sampling framework. Furthermore, the method does not require that the data be normally distributed, an uncommon attribute of item-level data. In addition to information regarding the degree of invariance of item parameters, group-specific factor means and variances are obtained with a factor structure that is not meet scalar invariance, and applicable to group sizes ranging from 2 to 100 (

Recent methodological and applied studies shed light on the contributions of the alignment method for invariance testing of large-scale international surveys. Specifically, simulation studies have supported the method’s recovery accuracy of item parameters and factor means under various conditions (e.g.,

This study extends the literature on the use of MGCFA and alignment optimization in large-scale, international educational survey research through its application to the schoolwork-related anxiety measure administered within PISA 2015 for country by gender, with 70 compared groups. For alignment optimization, a Monte Carlo study is used to examine the accuracy of factor means. The present study builds on the existing literature (e.g.,

Data were based on nationally representative samples of 15-year-old students (

The schoolwork-related anxiety scale includes five selected-response items designed to operationalize “[T]he anxiety related to school tasks and tests, along with the pressure to get higher marks and the concern about receiving poor grades” (

As a first step, MGCFA was used to determine the level of invariance (e.g., metric) among the item parameters. Criteria for metric and scalar invariance included nonsignificant chi-square difference statistic (

For the alignment analysis, the free option was selected with parameter estimation based on the robust maximum likelihood (MLR) estimator using

Subsequently, the predictive validity of factor scores for mathematics achievement was examined using a random intercepts two-level multilevel model (MLM). As PISA reports 10 plausible values (PVs) of mathematics performance, separate analyses were conducted for each PV in which regression coefficients and standard errors were averaged across analyses.

A single-factor model reported acceptable model-data fit (see ^{2}(764) = 5,795.14, RMSEA = 0.43 (90% CI [0.042, 0.044]), CFI = 0.98, SRMR = 0.033.

Model | Number of free parameters | χ^{2} |
RMSEA | RMSEA 90% CI | CFI | SRMR | ΔRMSEA | ΔCFI | ΔSRMR | |
---|---|---|---|---|---|---|---|---|---|---|

Configural | 1,050 | 1,888.58* | 350 | 0.035 | 0.034-0.037 | 0.994 | 0.011 | |||

Metric | 774 | 2,755.57* | 626 | 0.031 | 0.030-0.032 | 0.992 | 0.023 | 0.004 | 0.002 | 0.012 |

Scalar | 498 | 8,099.41* | 902 | 0.047 | 0.046-0.048 | 0.971 | 0.039 | 0.016 | -0.021 | 0.016 |

Partial MI | 636 | 5,610.34 | 764 | 0.042 | 0.041-0.043 | 0.981 | 0.032 | 0.011 | -0.011 | 0.009 |

*

Alignment optimization using the free approach produced an error message that the model may be poorly identified. Correspondingly, the Finland-Females group had the factor mean closest to 0, which was selected as the referent group in the subsequent analysis with the fixed approach.

Item parameter | Fit Function Contribution | ^{2} |
Number of groups with approximate MI | Min |
Max |
||||
---|---|---|---|---|---|---|---|---|---|

Estimate | Group | Estimate | Group | ||||||

Loading | |||||||||

ST118Q01 | -918.74 | 0.42 | 47 | 1.00 | 0.09 | 0.77 | Mexico (F) | 1.28 | Finland (M) |

ST118Q02 | -1,032.56 | 0.28 | 49 | 1.00 | 0.13 | 0.57 | Spain (F) | 1.33 | Finland (F) |

ST118Q03 | -867.17 | 0.61 | 46 | 1.00 | 0.06 | 0.85 | Finland (F) | 1.19 | Spain (M) |

ST118Q04 | -988.81 | 0.14 | 42 | 1.00 | 0.11 | 0.72 | Japan (M) | 1.21 | Iceland (F) |

ST118Q05 | -922.82 | 0.51 | 58 | 1.00 | 0.08 | 0.77 | Japan (M) | 1.13 | Chile (M) |

Sum | -4,730.10 | ||||||||

Intercept | |||||||||

ST118Q01 | -1,186.58 | 0.73 | 16 | -0.34 | 0.20 | -0.83 | Greece (F) | 0.04 | Portugal (F) |

ST118Q02 | -1,421.63 | 0.72 | 18 | -0.33 | 0.29 | -1.02 | Greece (F) | 0.44 | Spain (F) |

ST118Q03 | -1,108.11 | 0.90 | 15 | -0.35 | 0.15 | -0.68 | Turkey (F) | 0.04 | Finland (M) |

ST118Q04 | -1,112.41 | 0.83 | 29 | -0.38 | 0.19 | -1.11 | Austria (F) | -0.03 | Norway (F) |

ST118Q05 | -1,259.18 | 0.85 | 14 | -0.36 | 0.21 | -0.70 | Japan (F) | 0.24 | Greece (F) |

Sum | -6,087.91 | ||||||||

Total | -10,661.82 |

Consequently, the parameters of each item demonstrated some degree of noninvariance, with no item parameter demonstrating approximate MI across groups. Specifically, the number of countries and gender groups with approximately MI among factor loadings ranged from 42 (Item 4) to 58 (Item 5), whereas among intercepts the number ranged from 42 (Item 4) to 58 (Item 8). Column 2 reports ^{2} for each item, which indicates the degree of parameter variation across the groups within the configural model (M0) that is explained by across group factor mean and variance variation (values closer to 1.00 indicative of higher invariance). Overall, 30.86% and 73.71% of the factor loadings and intercepts were noninvariant, which resulted in 52.29% of the parameters being noninvariant, exceeding

Columns 8 and 9 report the groups with the lowest and highest parameter values for each item. Specifically, Item 1 reported the weakest relationship to the schoolwork-related anxiety factor for Mexico-Females (loading = 0.77), whereas its relationship was the strongest for Finland-Males (loading = 1.28). For Item 3, the weakest relationship was for Finland-Females and the strongest for Spain-Males. Similarly, Items 4 and 5 reported the weakest relation to the anxiety factor among Japan-Males, whereas Items 4 and 5 were most strongly related to the factor for Iceland-Females and Chile-Males, respectively. Among intercepts, Greece-Females had the lowest reported levels of schoolwork-related anxiety for both Items 1 and 2, whereas the highest anxiety levels for these items were among Portugal-Females and Spain-Females. Interestingly, whereas Greece-Females reported the lowest anxiety for Items 1 and 2, they reported the highest intercept for Item 5 that assessed anxiety related to getting nervous when not knowing how to solve a task at school.

Alignment-based factor means indicated that Portugal-Females (

Based on the Monte Carlo simulations, correlations between the alignment optimization population and estimated factor means were 0.99 across conditions^{1}

In each Monte Carlo simulation, the number of instances of nonconvergence for specific groups in each sample size condition was: 1 for 500 and 2,000; 3 for 250 and 3,000; and 5 for 1,000.

, except for the group size of 250 which was 0.98, thus meeting the criteria of 0.98 (For the predictive validity of scores (intraclass correlation coefficient = 0.37), across the 10 mathematics achievement PVs, the average regression coefficient for the alignment-based factor score was -12.46 (

Only recently have methodological and applied studies emerged on the utility of the alignment method to invariance testing, including its comparison to alternative approaches with many groups. Within this literature, there is limited application of the method in large-scale, international educational studies, and less illustrations of its use to invariance testing of country by gender comparisons with more than 70 groups. Further, there is limited research regarding the relationship of partial invariance and alignment-based scores to external variables (e.g.,

Within PISA, cross-country scale comparability included strict translation procedures, whereas construct validity was based on within country reliability, and, for item parameter invariance, the root mean square deviance item-fit statistic (RMSD; i.e., difference between model-based and observed item characteristic curves) for country-by-language combinations (

Aligned with previous research, item factor loadings demonstrated higher amounts of invariance compared to intercepts. While the amount of measurement noninvariance exceeded

Within international studies, survey scores provide an important source of information to identify factors that may be associated with intended study outcomes. Whereas Monte Carlo simulation results indicated the trustworthiness of estimate factor means under varying sample sizes, a subsequent question of this study was their predictive validity. Compared to partial invariance scores, alignment-based scores reported a lower relationship to mathematics achievement. Along these lines,

Whereas the alignment procedure has largely been used as an exploratory method,

The author has no funding to report.

The author has declared that no competing interests exist.

The author has no support to report.