Measurement invariance analysis has become a widely used tool in many areas of applied research, e.g., psychology, sociology, and organizational research, and has received a lot of attention in methodological research, e.g., Davidov et al. (2018), Greiff and Scherer (2018), Millsap (2011), and van de Schoot et al. (2015); see Leitgöb et al. (2023) for a recent overview of measurement invariance analysis in the social sciences. The goal of measurement invariance (MI) analysis is to investigate whether measurement properties of a latent variable’s indicators are the same across two or more groups. MI analysis has many facets, e.g., which method is the best when looking for invariant indicators. However, each of these methods requires the application of a certain scaling method. Without scaling, the respective models could not be uniquely estimated. The focus of this paper is on this specific requirement of each MI investigation, i.e. how to set scaling restrictions appropriately when investigating the invariance of factor loadings.
There are several levels of MI that are investigated consecutively (e.g., Brown, 2015, pp. 242–243). The lowest level is configural MI, where the factor structure is the same in all groups. At the next level, metric MI, the invariance condition states additionally that the loadings are equal across groups. The third level is scalar MI, which means that the intercepts are invariant across groups1 . In this paper, we focus on the metric MI model and elaborate on the question of how to scale this model. In general, we examine whether the scaling method affects the discrepancy function, and in turn the resulting test statistic of a model. In particular, we examine whether the selection of a particular referent indicator or, more generally, the selection of certain scaling method affects the discrepancy function, and in turn the resulting test statistic, for the metric MI model.
The test statistics of both the configural and metric MI model are assumed to be -distributed with degrees of freedom under the corresponding null hypothesis. The test statistic’s numeric value results from the estimation process, in which a discrepancy function F is minimized. According to the procedure in Vandenberg and Lance (2000, p. 56), testing for metric MI requires the following steps: In the first step, the configural model is tested. In the second step, the metric model is tested. If the test statistics indicate in both steps that the model fits the data, then a -difference test is conducted in the third step. This last test evaluates whether the metric MI conditions hold. Thus, MI analysis draws on testing single models as well as on testing nested models.
In addition to the level of MI analysis, there is also a distinction between full and partial metric MI models. In the full MI model, all loadings are assumed to be invariant across groups, while in partial MI models, one or more loadings are free to vary between the groups.
As a metric MI model entails latent variables, it is a well known fact that scaling restrictions must be placed, otherwise the model would not be identified. In this paper, we use the term scaling restriction to refer to the fixation of a certain parameter to a given value imposed by the application of a certain scaling method and use the term invariance condition to refer to the imposed equality of the loadings in a metric MI model. This distinction follows Wu and Estabrook (2016) who introduced a similar distinction between scaling restriction and invariance conditions. Finally, we use the term constraints to refer to both, i.e., the combination of the invariance conditions and the scaling restriction in a model. The Fixed Marker ( ) scaling (cf. Kline, 2016, pp. 199–200) is the most commonly applied scaling method and it is the de facto standard in most structural equation software. According to Raykov et al. (2012, p. 968), the method restricts one of the estimated loadings of each latent variable to 1 in each group. The respective manifest variable is then called the referent indicator (RI). From that description, it is obvious that the scaling method has as many different variants as there are indicators for a latent variable. For instance, when there are 6 indicators, then each of these indicators’ estimated loadings can be restricted to 1. In this paper, if the first indicator is the RI, we use the abbreviation . If the second indicator is the RI, we write , and so on.2 Usually, it is suggested that the indicator that best represents the latent variable should be the RI (e.g., Brown, 2015, pp. 242–243).
In the context of metric MI analysis, the use of the method has been critically discussed in textbooks.3 According to Brown (2015, p. 271; cf., Cheung and Rensvold, 1999, pp. 8–9), there may be difficulties in multiple group settings regarding the selection of an RI. For instance, Brown (2015) mentions that non-invariance of the RI may not be detected in a metric MI setting, because it is restricted to 1 in every group, and as a consequence thereof, there is the implicit assumption that the RI is invariant across groups. Similarly, Kline (2016, p. 405) argues that the selection of an RI is arbitrary and that the fixation of the RI to 1 over the groups is tantamount to an assumption of invariance of the RI. Furthermore, Kline argues that the RI is excluded from the test of metric MI, but the RI choice should not affect the overall model fit most of the time. Raykov et al. (2020) provide an example in which a metric MI model is estimated using an non-invariant RI. In a Monte Carlo simulation, the metric MI model failed to reveal the non-invariance of the RI by means of the test statistic most of the time. However, the authors did not consider choosing an invariant RI.
Johnson et al. (2009, p. 654) found in a Monte Carlo simulation for the full metric MI model that when RIs are non-invariant, this non-invariance does not distort the results of metric MI tests. The authors state that differences between groups on the RIs were transferred to other indicators via the constraints on the RI to be equal to 1 in both groups, which in turn affected the estimated loadings of all other indicators by setting a scale. Furthermore, the authors summarize that, in general, the tests of the full metric MI model were accurate in detecting non-invariance when the RI differed across groups, regardless of whether or not other indicators also differed. Given these results, Johnson et al. (2009, p. 644) argue that all estimated loadings are rescaled relative to the RI, by a magnitude of (where denotes the “true” loading of the RI) and although the RI is non-invariant across groups, the parameter estimates will be adjusted by different scaling constants and the loadings are expressed in two different metrics (Johnson et al., 2009, p. 644). This kind of rescaling is also mentioned by Raykov et al. (2012, Appendix A). These authors state that setting the RI to 1 yields a rescaling of the remaining loadings.
The concerns and findings mentioned above were in the context of the full metric MI model. For partial metric MI, Johnson et al. (2009) found simulation evidence that the choice of the RI affects the test results.4 Vandenberg and Lance (2000, pp. 46–47) state that in partial metric MI settings, problems may arise when an RI is in reality non-invariant because that yields different metrics for a latent variable over the groups (cf., Brown, 2015).
Up to here, we have only considered the choice of the RI in the scaling method, but there are two other methods of scaling latent variables in multiple group settings with metric MI conditions (cf. Kline, 2016, pp. 199–200). These are the Effects Coding ( ) method and the Reference Group ( ) method.
The method (Little et al., 2006) restricts the estimated loadings by enforcing that their sum equals the number of indicators. This restriction is equivalent to the restriction that the average of the estimated loadings of a latent variable equals 1.
The method (cf., Kline, 2016, p. 393, Lee et al., 2011, p. 60, Raykov et al., 2012, p. 961, Wu & Estabrook, 2016), which is the multiple group version of the Fixed Factor ( ) scaling, restricts the variances of the latent variables to 1 in the reference group only, while being freely estimated in the remaining groups. As noted by Raykov et al. (2012, p. 969), the identification of this model is achieved by the imposed equality constraints of the estimated loadings in the remaining groups. Put differently, the metric MI conditions propagate the scaling to the other groups. This is in contrast to the FM method, in which an RI is set to 1 in each group (see above).
Obviously, due to the lack of an RI, neither nor contain any similar invariance assumption. Additionally, the scaling method only provides estimated loadings for all indicators but not the RI, whereas the or provide estimated loadings for all indicators. Indeed, van de Schoot et al. (2012, p. 489) recommend using the method when the goal is to compare factor loadings across groups.
In a nutshell, what was said above provides an inconsistent picture on the issues involved in scaling (partial) metric MI models. These inconsistencies can be summed up in the following four questions:
-
Does it matter which indicator serves as RI or is the result of an MI analysis in sense of the model fit of the metric MI model independent of the RI choice?
-
More generally, does the choice of the scaling method in general affect the results of an MI analysis or do all scaling methods provide the same result?
-
Do the same considerations apply in full and partial metric MI models or do we have to tell apart these settings?
-
Can the FM scaling method be applied by only restricting the RI in one group and propagating the scaling restriction via the MI condition like the RG method?
The fundamental topic of the first two questions is if there are any effects of the scaling method on the value of the discrepancy function F. All four questions are not only of interest for methodological considerations, but also for applied researchers who wish to conduct a MI analysis. This paper aims at clarifying these questions and thus aims to bring consistency into the literature concerning the choice of RIs in particular or the choice of a scaling method in general. The general approach of the paper is to present examples to empirically demonstrate some basic facts and ideas, and then to turn to formal considerations. However, the formal part is kept at the minimal required level.
In the following section, we provide the general setup and notations used in this paper. Then, we first present an example showing that neither the scaling method in general nor the choice of an RI in particular affects the discrepancy function of the full metric MI model. We also use the example to introduce the concept of change of scale, which we make use of in the following section to provide a formal proof of the equivalence of all scaling methods for the full metric MI model. Additionally, we derive some corollaries about the resulting test statistic and other fit indices, showing that these are also unaffected by the scaling methods. Furthermore, we also provide a corollary about the degrees of freedom for the -difference test between the configural and the full metric MI model, which we then extend to partial metric MI models. As a consequence, we find that a partial metric MI model with only one invariance condition per latent variable is equivalent to the configural model. Thus, it is impossible to test the invariance of only one indicator (Steenkamp & Baumgartner, 1998). In the following section, we provide a further example, in which we firstly empirically demonstrate these corollaries. Again, the example illustrates that neither the scaling method in general nor the choice of an RI in particular affect the discrepancy function of the partial metric MI model. Afterwards, we provide a formal account for the partial MI model. In particular, we explain why the (non-)invariance of an RI, or more generally the data generating process, does not affect tests of either full or partial metric MI. Afterward, in the conclusion we elaborate on the results obtained in the paper and provide the answers to the four questions. Supplementary materials containing additional information and R scripts for the examples are available in Klopp and Klößner (2022) and referred to in the appropriate places in the text.
Setup
In this section, we introduce the notation we use throughout this article. Basically, we consider measurement models or confirmatory factor analysis models in multiple group settings of the form
1
Let and be the number of manifest and latent variables, resp., and let be an index of the group membership. Then is a vector of manifest variables, is a loading matrix, is a vector of latent variables, and is a vector of measurement residuals. Additionally, represents the loading of the -th indicator on the -th latent variable in group . The other model parameters are indexed analogously.
Under the standard assumptions for confirmatory factor analysis models (e.g., Bollen, 1989, pp. 233–234), this measurement model yields the following model-implied covariance matrix for :
2
where is the covariance matrix of the latent variables and is the covariance matrix of the measurement residuals , which is assumed to be a diagonal matrix containing the manifest residual variances. We collect the parameter matrices in the parameter vector . To simplify our considerations, we assume a fully saturated model for the mean structure.
As is common in statistics, we distinguish between population models and estimated models (cf., Klopp & Klößner, 2021, pp. 185–187, Figure 4). In particular, we refer to a model in the form of Equation (1) as the population model that contains the data generating process (dgp). If the relation
3
holds within a dgp, then this dgp fulfills full metric invariance. In contrast, a dgp fulfills partial metric MI if
4
for all groups and for some combinations of indicator and latent variable for which the corresponding nonzero loadings are invariant across groups, while some loadings are allowed to be non-invariant. We call the dgp either a full or a partial metric MI dgp.
Following the above-mentioned distinction between population and estimated models, the estimated model is stipulated by a researcher. We assume that the estimated model is also a confirmatory factor analysis model of the same structure. The estimated model implied-covariance matrix is
5
The parameter vector collects the model matrices of the estimated model.
The invariance condition for full metric MI on the estimated loadings is
6
For partial metric MI, these invariance conditions are replaced by the less restrictive invariance conditions
7
for all groups and all combinations of indicator and latent variable for which the corresponding estimated nonzero loadings are assumed to be invariant across groups.
In the following, we will always explicitly use either the term full metric MI model or partial metric MI model. If we want to make assertions that refer to both types of models, we use the term metric MI model. Moreover, we call a model such as in Equation (5) in combination with invariance conditions as in Equation (6) or as in Equation (7) the unscaled model, cf. Klopp and Klößner (2021, pp. 185–187), as no scaling restrictions have been added yet.
To actually estimate the model, we have to apply a scaling method. It is important to note that the restrictions of the various scaling methods are applied to the estimated model (Klopp & Klößner, 2021, p. 185), but not to the dgp. We collect the scaling methods in the set
8
To refer to any of the elements of , we use the generic notation S. The estimated model-implied covariance matrix then is
9
We call a model as in Equation (9) a scaled model. Please notice that the estimated manifest residual variances in the scaled model do not have a superscript indicating the scaling method. This is because in order to estimate residual variances, no scaling method needs to be applied and estimated residual variances do not depend on the scaling method employed, i.e., the estimated values are identical for all scaling methods S (Klößner and Klopp, 2019, p. 148; Klopp and Klößner, 2021, p. 185).
In the following, when estimating a metric MI model, we assume that configural invariance is given. Furthermore, we assume that the scaled model is estimated by minimizing a discrepancy function . The discrepancy function is based on a sample covariance matrix and the model implied matrix . To keep the notation simple we write (cf., Bollen, 1989, p. 106, for the properties of a discrepancy function).
As the distinction between a dgp as well as unscaled and scaled estimated models is not common in the literature, we want to explain the relations between these models with the help of Figure 1 (cf., Klopp & Klößner, 2021, p. 186). Firstly, in the upper left corner, there is a dgp that functions in a population and from which a random sample of observations is drawn. The observations are collected in a data set that is later used in the estimation process, which is depicted on the right side of the figure. Secondly, in the lower left corner, there is a researcher who has a substantial theory about the metric MI in the population and is interested in answering the substantial hypothesis “Does metric MI hold in the population?” Importantly, the dgp is unknown to the researcher. To answer the question, the researcher specifies a statistical model like the one given in Eequation (5), with metric MI conditions of the form given in equation (6) or equation (7), i.e., the researcher specifies an unscaled model. This represents the researcher’s statistical hypothesis. To actually estimate the model, the researcher has to apply scaling restrictions, i.e., the researcher creates a scaled model. This can be done with any one of the available scaling methods, of which the two possibilities and are depicted in the figure. With the scaled model, the statistical hypothesis of whether the metric MI model fits to the data is tested5 and afterward, the researcher arrives at a conclusion about the substantial hypothesis. The figure illustrates two issues: Firstly, the model used by the researcher is a distinct entity from the dgp. Secondly, the researcher starts with invariance conditions in the unscaled model, to which the scaling methods are then added. Thus, the scaling of the model is not related to the dgp. We will now move on to our first example.
Figure 1
Full Metric Measurement Invariance: An Example
In the following example, we present a simple two-group case which demonstrates that regardless of the scaling method, the same value of the discrepancy function results when a full metric MI model is estimated. The aim of this section is to provide an intuitive and heuristic understanding, both of the equivalence of the scaling methods for the full metric MI model and of the concept of change of scale. A change of scale allows to convert the estimates obtained under a certain scaling method to those obtained under any other scaling method, without re-estimating the model. This concept is vital to the more formal account in the next section.
We consider a model with only one latent variable with six indicators in two groups, indicates the reference group and indicates the focal group (cf., Holland & Thayer, 1988). The example’s dgp is given by (see also Figure 2 for a graphical display of the dgp):
10
Figure 2
This is a partial metric MI dgp. The loadings of the last two indicators are non-invariant. Especially, both loadings differ to a different extent. If the concerns about non-invariant RIs in the FM scaling methods were justified, then the estimated values of the discrepancy function (on which the test statistic rests) would differ.
We now take the role of the researcher from Figure 1 and set up our unscaled model as in Equation (5) and with the full metric invariance condition . In detail, this invariance condition entails the following set of invariance conditions:
11
To scale the model, we apply each scaling method from the set and start with the scaling method, using the first indicator as RI6 , i.e., . The scaling restriction is . Considering the other indicators as RI, we can express all variants of the scaling method as
12
Notably, the invariance condition in Equations (11) yields and we see that the invariance condition propagates the scaling restriction from the reference to the focal group.
For the scaling method, the scaling restriction we set in the reference group is
13
As stated in the introduction, this scaling restriction is equivalent to the restriction that the average of the estimated loadings equals 1, i.e.,
14
Again, the invariance conditions cause the propagation of the scaling restriction to the focal group.
The most important observation from these scaling examples is that the scaling restrictions are applied in one group only. As we have seen, a common feature of all scaling methods is the propagation of the scaling restrictions into the focal group via the invariance conditions. Because of its importance, we want to set up this rule in colloquial terms, which is Apply the scaling restriction in one group only, the invariance conditions do the rest!
Finally, for the scaling method, the scaling restriction in the reference group is
15
In contrast to the previous scaling methods, this restriction does not directly propagate to the focal group. However, the restriction on the estimated variance in the reference group scales the estimated loadings in this group indirectly. Due to the invariance conditions, this restriction is then propagated to the focal group, thus overall acting as a scaling restriction.
We now estimate the full metric MI model for ten different generated samples, using all scaling methods, i.e., the six different variants of the FM scaling as well as the EC and RG scaling method. We use the ML estimator. The sample size per group is , thus in total. We chose this sample size because it reflects a number often achieved in real studies. All calculations for the data generation and model estimation were done in R (R Core Team, 2022), using the packages MASS (Venables and Ripley, 2002) and lavaan (Rosseel, 2012).
Regarding the degrees of freedom, there are unique entries in the covariance matrix in each group, i.e., with manifest variables there are unique entries in each group and thus unique pieces of information in total. In each group, we have to estimate 6 loadings, 6 manifest residual variances and 1 latent variance, i.e., 13 parameters per group and 26 parameters in total. Because we have a full metric MI model, there are 6 invariance conditions (see Equation 11). Additionally, there is 1 scaling restriction, regardless of the scaling method; see Equations (12), (13), and (15). Thus, there are 7 restrictions in total, which gives free parameters. This results in degrees of freedom.
The results are presented in Tables 1 and 2. Table 1 shows the values of the ML discrepancy function resulting from the estimation, multiplied by the factor , while Table 2 shows the estimated loadings and latent variances for the first generated sample. In each sample, the value of the ML discrepancy function is the same regardless of the scaling methods. In summary, we can state as a first observation that neither the specific choice of an RI nor in general the choice of the scaling method affect the full metric MI model’s resulting value of the discrepancy function. Consequently, the test statistic and all fit indices derived from the test statistic are also identical.
Table 1
Sample | EC | RG | ||||||
---|---|---|---|---|---|---|---|---|
1 | ||||||||
2 | ||||||||
3 | ||||||||
4 | ||||||||
5 | ||||||||
6 | ||||||||
7 | ||||||||
8 | ||||||||
9 | ||||||||
10 |
Table 2
Loadings | EC | RG | ||||||
---|---|---|---|---|---|---|---|---|
This result is in line with Johnson et al. (2009) and demonstrates empirically that any concerns about truly non-invariant RIs are unwarranted, at least in this example. Considering the distinction between population and specified models explains this finding. The non-invariance is in the dgp, not in the specified, scaled model, in which an invariance condition is set by the researcher. This restriction represents an invariance assumption that applies to the estimates only, it is used to test if this restriction mirrors the invariance properties of the dgp.
Now, we turn to a second observation: Parameters estimated under one scaling method can be converted to those estimated under another scaling method (Klopp and Klößner, 2021, Proposition 2; Newsom, 2015, p. 4). We illustrate this conversion and convert the estimated parameters under scaling to those obtained under and EC scaling for the first out of the ten generated samples. The estimated parameters under scaling are:
16
and those estimated under scaling are:
17
Now, we choose as constant, which is the square root of the estimated latent variance , and apply the following conversion:
18
The choice of the constant exemplifies the mechanism of the conversion: It yields an “estimated” latent variance in the reference group of , which corresponds to the restriction prescribed by the RG scaling method.
To convert from to , we must bear in mind that the EC scaling restriction requires . The loadings estimated under scaling are:
19
Denoting the average of the estimated loadings under the scaling as , then choosing the constant gives the conversion:7
20
Klopp and Klößner (2021, Proposition 2) provide a theoretical account that demonstrates how to convert the estimated parameters obtained under any scaling method to the estimates obtained under any other scaling method, without the need to re-estimate the model, given that the estimated model-implied covariance matrices under the two different scaling methods are identical. To start, we note that the scaling method that was actually used has no effect on the conversion, only the “target” scaling method is of relevance (Klopp & Klößner, 2021, p. 190). For our further considerations, let , be two different scaling methods. The model was actually estimated with scaling and is the target scaling method. Additionally, let be a constant. For our one latent variable example, the conversion equations according to Klopp and Klößner (2021) are
21
22
Obviously, for our example with one latent variable, a conversion according to Equation (21) and Equation (22) with a constant does not change the estimated model-implied covariance matrix, because
23
The type of relation in Equation (23) is widely known in the SEM literature (e.g., Jöreskog, 1978; Mulaik, 2010, p. 443; Yoon and Millsap, 2007); Klopp and Klößner (2021, p. 185) call this relation change of scale. Note that this relation deserves special attention in the multiple group context. Because of the invariance conditions over the two groups, only one constant must be used simultaneously for both groups.
A general version of a change of scale for the model in Equation (9), which also easily accommodates more than one latent variable, would use non-singular diagonal matrices containing the factors for each latent variable. This general form is
24
In the following section, we turn over to the formal side of this exemplary consideration and provide a general proof that each scaling method results in the same value of the discrepancy function. As we will see, the idea of a change of scale and the idea of the propagation of the scaling restriction over the groups via the invariance conditions will be essential in proving the equivalence of the scaling methods.
Full Metric Measurement Invariance: Theory
The considerations laid out above lead to the following proposition.
Proposition 1
If the full metric MI model is estimated by minimizing a discrepancy function, then the resulting optimal values of the discrepancy function as well as the estimated model-implied covariance matrices do not depend on the particular method used for scaling the MI model.
The outline of the proof is as follows: First of all, we will explain why the minimum of the discrepancy function, taken over all parameters fulfilling full metric MI, coincides, for every scaling method, with the discrepancy function’s minimum taken over all parameters simultaneously fulfilling full metric MI as well as the restrictions stemming from the corresponding scaling method. After having established this fact, it will be obvious that the resulting optimal discrepancy value does not depend on the particular scaling method, as the optimal values for different scaling methods all take the same value (namely, the discrepancy function’s minimum taken over all parameters fulfilling full metric MI). As a by-product of proving the invariance of the optimal discrepancy value, the proof will also show that the estimated model-implied covariance matrices do not depend on the method used for scaling the MI model.
To follow the outline described above, we denote by an arbitrary parameter which fulfills full metric MI and minimizes the discrepancy function when no scaling restrictions are imposed. By definition, for all parameters which fulfill full metric MI, with F denoting the discrepancy function. For all scaling methods, this in particular implies , with denoting the estimate for the parameters for a given scaling method S. We will now show that the reverse relation holds, too, i.e. . When this relation has been established, it is clear that we have equality, , implying that all scaling methods lead to the same value for discrepancy function, , for the full metric MI model. To this end, we show that, for each scaling method S, there exists a change of scale which transforms the parameter to a new parameter , such that the transformed parameter fulfills the constraints associated with the scaling method S as well as full metric MI.
-
For the method, we apply to the change of scale which divides all loadings of indicators belonging to latent variable in group by , where denotes the RI used for scaling factor , while multiplying latent covariances between latent variables and in group by .8 This transformation results in a new parameter for which the RI’s loading always equals 1 (due to ) and all indicators are invariant (as the already invariant loadings of are divided by a quantity which is also invariant across groups). Thus, the transformed parameter fulfills the constraints associated with FM scaling as well as full metric MI.
-
For the method, we apply to the change of scale which divides all loadings of indicators belonging to latent variable by , the average of all the loadings of latent variable in group , while multiplying latent covariances between latent variables and in group by . This transformation results in a new parameter for which the average loading of latent variable ’s indicators always equals 1 (due to ) and all indicators are invariant (as the already invariant loadings of are divided by a quantity which is also invariant across groups). Thus, the transformed parameter fulfills the constraints associated with EC scaling as well as full metric MI.
-
For the method, we apply to the change of scale which multiplies all loadings of indicators belonging to latent variable by , latent variable ’s standard deviation in the reference group , while dividing latent covariances between latent variables and in group by . This transformation results in a new parameter for which the latent variable’s variance in group equals 1 (due to ) and all indicators are invariant (as the already invariant loadings of are multiplied by a quantity which is also invariant across groups). Thus, the transformed parameter fulfills the constraints associated with RG scaling as well as full metric MI.
In all three cases, the change of scale can be described by and for all groups , where is a diagonal matrix (with entries for FM, for EC, and for RG scaling). From
25
it follows that, in all groups, the estimated model-implied covariance matrices are identical for and . As the discrepancy function depends on the parameters only through the estimated model-implied covariance matrices, we can conclude that , as minimizes the discrepancy function among all parameters which fulfill both full metric MI and the restrictions coming with the scaling method S, a set to which belongs.9 Additionally, Equation (25) shows that the estimated model-implied covariance matrices do not depend on the method used for scaling the full metric MI model.
Corollary 1
If the full metric MI model is estimated by minimizing a discrepancy function, then its , value, RMSEA, and other fit measures do not depend on the particular method used for scaling the model.
This immediately follows from Proposition 1, as all these quantities are calculated using the full metric MI model’s likelihood value, which does not depend on the particular method used for scaling the model.
Corollary 2
The results of the difference test which compares the full metric MI model to the configural MI model do not depend on the scaling methods used for estimating the configural and full metric MI model. This in particular holds for the difference statistics, the value, RMSEA, and differences of other fit measures.
This immediately follows from the preceding corollary and the well-known fact that the corresponding quantities for the configural model do not depend on the particular method used for scaling the configural model, either.
Concerning the -difference test comparing the configural to the full metric MI model, we would also like to shed some light on how to determine its degrees of freedom. Our explanations above show that scaling the full metric MI model consists of imposing exactly restrictions, for instance by fixing all latent variables’ variances to 1 in one group, as it is the case when scaling is done using the RG method. On the other hand, it is well-known that scaling the configural model consists of imposing exactly restrictions, namely one per group and latent variable. Thus, there is a difference of between the numbers of restrictions needed to scale the configural model and the full metric MI model, respectively. This difference needs to be taken into account when calculating the degrees of freedom of the -difference test. Intuitively, one would think that the degrees of freedom were determined by the number of restrictions due to the full metric MI model, i.e. by the number of conditions required to ensure all loadings’ invariance. This number is given by , as, for every indicator, the loadings in the first group must match the loadings in the remaining groups. However, this number has to be corrected to account for the difference induced by the scaling restrictions, i.e. by . These considerations yield the following corollary:
Corollary 3
The degrees of freedom of the -difference test between the configural and full metric MI model are given by:
Corollary 3 can easily be adapted to partial metric MI models. In this case, the overall number of invariant indicators will be denoted by . To calculate the degrees of freedom for the difference test comparing the partial metric MI model to the configural one, we can apply exactly the same approach as above: the number of restrictions due to the partial metric MI model, intuitively given by , has to be corrected to account for the difference induced by the scaling restrictions, i.e. by . This results in the following corollary:
Corollary 4
The degrees of freedom of the -difference test between the configural and a partial metric MI model are given by:
The preceding corollary implies a special case which deserves particular attention: the case of , which leads to the -difference test having zero degrees of freedom. Regarding this case, we first we need to investigate when it appears: the condition means that the number of indicators required to have invariant loadings, , equals the number of latent variables, . This happens if exactly one indicator per latent variable is presumed to be invariant. Regarding the consequences, the fact that the -difference test has zero degrees of freedom implies that these particular partial metric MI models have as many degrees of freedom as the configural model. Even more, as partial metric MI models are covariance-nested within the configural model (cf., Bentler & Bonett, 1980, p. 592f), all these partial metric MI models are indeed equivalent to the configural model. Together, this leads to the following corollary:
Corollary 5
A partial metric MI model in which only one indicator per latent variable is presumed to be invariant is equivalent to the configural model.
Partial Metric Measurement Invariance: An Example
Up to this point, we have mainly considered full metric MI models. In this section, building on the corollaries developed in the preceding section, we want to present a further example, in which we look at partial metric MI models. The example consists of three scenarios, A, B, and C. Each of thee scenarios is further divided into two settings. The dgp stays the same as in the previous example, please see the model given in Equation (10). As before, we use the ML discrepancy function and take the data from the first sample of the previous example’s simulation.
In the first scenario A, we want to look at the special case with an invariance condition on the estimated loading of only one indicator. In this case, there is and the model under consideration is equivalent to the configural model, which has degrees of freedom (please see the supporting information for how the degrees of freedom are calculated). Estimating the configural model10 results in a value of for the discrepancy function.
In the first setting of scenario A, we examine the case in which the estimated loading of the first indicator, which is actually invariant according to the dgp, is assumed to be invariant by the researcher. Thus, the unscaled model is identical to Equation (5) with and with the metric invariance condition . To scale this model, we start with the scaling method, see Table 3 for details.11 Once again, we consider every indicator as a potential RI, i.e., we look at . For the scaling, the invariance condition yields that estimated loadings equal in both groups. For the to scaling, the invariance condition regarding the first loading propagates the scaling from the reference to the focal group. Concerning the method, a first variant is to constrain the sum of only invariant indicators’ estimated loadings, which is for instance the way the semTools package (Jorgensen et al., 2021) applies this scaling method in the context of partial metric MI models.12 In this case, this leads to restricting the estimated loading of the first indicator to ( ). Thus, coincides with the scaling in this case. A second possibility is to constrain the sum of the estimated loadings of indicators which are not presumed to be invariant, such that their sum equals ( ). As a third possibility, we might also incorporate all estimated loadings, regardless of them being invariant or not, resulting in the condition that the sum of all estimated loadings equals ( ). Finally, there is the scaling method ( ).
Table 3
Setting 1 | ||
---|---|---|
Invariance condition | ||
Scaling | : | |
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
Setting 2 | ||
Invariance condition | ||
Scaling | : | |
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: |
The second setting of scenario A entails an invariance condition on the estimated loading of the fifth indicator, i.e., , which is non-invariant in the dgp. The scaling options are basically the same as in the first setting, they differ only with respect to which indicators the researcher deems invariant or non-invariant, see Table 3 for details.
Estimating the models produces the results shown in Table 4. The table shows that in both settings, the value of the discrepancy function multiplied by the factor is . Thus, estimation leads to the same value of the discrepancy function, although there are different invariance conditions, and these partial models are indeed equivalent to the configural model. The scenario shows exemplarily that a partial metric MI condition on only one estimated loading is not feasible. Scrutinizing the various scaling options in Table 3 provides insight into this issue. For example, the scaling in the first setting as well as the scaling in the second setting correspond to the respective scaling of the configural model, where the RI’s loading is set to in both groups. The reason is the propagation of the scaling via the invariance condition. Interestingly, this way of looking at the configural model is not new: Reise et al. (1993) developed a scaling method for the configural model, in which they set the variance of the latent variable to 1 in one group and required one loading to be invariant across groups. Their scaling method for the configural model is therefore identical to the scaling method for a partial metric MI model with only one invariant indicator. Steenkamp and Baumgartner (1998, p. 81) also pointed out that testing the invariance of only one loading is not meaningful. Following from our results, it is impossible to test the invariance of only one indicator because such a model will always correspond to the configural model.
Table 4
Setting | Discrepancy function values | |||||||
---|---|---|---|---|---|---|---|---|
1 | ||||||||
2 | ||||||||
In order to get around the problem arising in Scenario A, one might be tempted to ignore the rule of applying scaling restrictions in only one group and apply them in both groups. Doing so, there would be one degree of freedom for the -difference test, and the partial metric MI model would no longer be equivalent to the configural model. In particular, such an approach would lead to a model with degrees of freedom, but at the cost of violating the rule to apply the scaling restriction in only one group.13
In the following Scenario B, we will study this situation. The unscaled models in the two settings are identical to those in Scenario A. The various scaling options are shown in Table 5. Please note that in the first setting, scaling coincides with the configural model, as restricting the first loading in each group is the same as imposing an invariance condition. An analogous assertion holds for scaling in the second setting. The resulting values for the ML discrepancy function are given in Table 6. Somewhat surprisingly, almost all scaling methods yield different values for the discrepancy function (for the exception scaling we will provide hints below). Thus, the model fit as well as test statistics for the -difference test are idiosyncratic to each scaling method, a phenomenon called constraint interaction (Klößner & Klopp, 2019; Steiger, 2002). The reason is that enforcing scaling restrictions in both groups leads to the hypothesis about the indicators’ invariance becoming dependent on the scaling method employed. The findings of Johnson et al. (2009) mentioned in the introduction are also driven by this effect, which we will now explore in more depth.
Table 5
Setting 1 | ||
---|---|---|
Invariance condition | ||
Scaling | : | |
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
Setting 2 | ||
Invariance condition | ||
Scaling | : | |
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: |
Table 6
Setting | Discrepancy function values | |||||||
---|---|---|---|---|---|---|---|---|
1 | ||||||||
2 | ||||||||
To do so, let us have a look at the situation in Setting 1 and the use of to scale the model. In this model, the first indicator’s estimated loading is invariant due to the condition , while the second indicator’s loading indirectly becomes invariant due to the scaling restrictions and , which obviously imply . Therefore, this model does not presume that only the first indicator’s loading is invariant, but it implicitly stipulates that the first and second indicators’ loadings are both invariant. Indeed, as the conditions and are equivalent to the conditions and , the -scaled model in the first setting is actually equivalent to a partial metric MI model where the first two indicators’ loadings are presumed to be invariant and scaling is used in the first group only. Thus, the -scaled model in the first setting is a model with , not , invariance conditions, investigating whether the first two indicators’ loadings are invariant across groups.
Scenario B demonstrates another aspect, too. There may be conditions under which identical values of the discrepancy function emerge, even though the model is not scaled according to the rule of setting scaling restrictions in one group only, as it was the case for the scaling variants in this scenario14 .
We now consider the last scenario, Scenario C, in which we want to showcase the scaling of a partial metric MI model with indicators presumed to be invariant across groups and with obeying the rule of applying scaling restrictions in only one group. Consequently, this partial metric MI model has degrees of freedom. The first setting represents a situation in which the first four estimated loadings are investigated with respect to invariance across groups, i.e., the invariance conditions stipulated by the researcher correspond exactly to the invariant indicators in the dgp. Again, the unscaled model is as given in Equation (5) with , and with the metric invariance conditions for .
For scaling, we use all possible versions of the , , and scaling methods. For the scaling method, these are six different versions, i.e., . For indicators with invariance conditions, we see that if we set the scaling restriction in the reference group, the invariance conditions automatically propagate the corresponding condition to the focal group. For instance, for the scaling, the restriction is propagated to the focal group via the invariance conditions . This reasoning applies for , , and scaling, too. For and scaling, where the reasoning given above no longer applies, the partial metric invariance conditions on the first four loadings nevertheless propagate the scaling from the reference to the focal group.
For the scaling method, there are again several ways to set the restriction. The first way ( , see Table 7), consists in restricting the invariant loadings such that their sum equals , while the second way leads to restricting the two loadings without invariance conditions ( ). Finally, it is also possible to apply the scaling restrictions such that the sum of all loadings equals ( . As above, these restrictions are applied in the first group only, following our rule that the scaling restrictions must not be applied in both groups.
Table 7
Setting 1 | ||
---|---|---|
Invariance conditions | , , , | |
Scaling | : | |
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
Setting 2 | ||
Invariance conditions | , , , | |
Scaling | : | |
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: | ||
: |
Finally, for the scaling method, we set the variance of the latent variable in the reference group to 1. In contrast to the and scaling method, it is obvious for the scaling method that the scaling restriction is only applied in one group, as this feature is part of the description of this scaling method, as outlined in the introduction. The invariance conditions placed on the first four indicators certainly propagate the scaling to the focal group also for the and scaling.
In the second setting, there are four indicators again with invariance conditions on the estimated loadings. However, in this setting, the invariance conditions are placed on the estimated loadings of the first and second indicator, which are invariant in the dgp, and on the estimated loadings of the fifth and sixth indicator, which are non-invariant in the dgp. Thus, the metric invariance conditions are for . The scaling options are basically the same as in the first setting, they differ only with respect to which indicators the researcher deems invariant or non-invariant, see Table 7 for details. We would like to draw the reader’s attention to the fact that the scaling methods follow the researcher’s choices of invariant and non-invariant indicators, irrespective of the indicators’ invariance or non-invariance in the dgp, which is unknown to the researcher.
The results are presented in Table 8, which shows the values of the ML discrepancy function multiplied by the factor . Obviously, the value of the discrepancy function, and therefore the test statistic and all other fit indices are identical, irrespective of the scaling method. Additionally, the example also demonstrates that the invariance or non-invariance of certain loadings in the dgp does not interact with the specific scaling method the researcher chooses for scaling the partial metric MI model.
Table 8
Setting | Discrepancy function values | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
1 | ||||||||||
2 | ||||||||||
In the following section, we will provide the formal proof that for partial metric MI models, the scaling method does not affect the value of the discrepancy function, as long as scaling restrictions are applied in one group only.
Partial Metric Measurement Invariance: Theory
The following proposition formalizes the results of the previous section.
Proposition 2
If a partial metric MI model is estimated by minimizing a discrepancy function, then the resulting optimal values of the discrepancy function as well as the estimated model-implied covariance matrices do not depend on the particular method used for scaling the partial metric MI model, as long as the scaling method is applied in only one group, .
The proof is essentially the same as the one for Proposition 1. The only detail that needs to be given precisely is how exactly to construct the changes of scale that transform a given parameter to a new parameter which fulfills the restrictions associated with the Scaling Method S.
-
For the method, where the scaling restrictions consist of latent variable ’s (potentially non-invariant) reference indicator ’s loading being 1 in Group , we apply to the change of scale which divides all loadings of indicators belonging to latent variable in Group by , while multiplying latent covariances between latent variables and in Group by . This transformation results in a new parameter for which the reference indicator’s loading in Group equals 1 (due to ) and all indicators supposed to be invariant across groups stay invariant (as the already invariant loadings of are divided by a quantity which is also invariant across groups). Thus, the transformed parameter fulfills the constraints associated with FM scaling as well as those of the partial metric MI model.
-
For the method, we apply to the change of scale which divides all loadings of indicators belonging to latent variable by , the average of the corresponding loadings of latent variable in Group , while multiplying latent covariances between latent variables and in Group by . This transformation results in a new parameter for which the average loading of the latent variable ’s indicators in Group equals 1 (due to ) and all indicators supposed to be invariant across groups stay invariant (as the already invariant loadings of are divided by a quantity which is also invariant across groups). Thus, the transformed parameter fulfills the constraints associated with EC scaling as well as those of the partial metric MI model.
-
For the method, we apply to the change of scale which multiplies all loadings of indicators belonging to the latent variable by , latent variable ’s standard deviation in the Reference Group , while dividing the latent covariances between the latent variables and in Group by . This transformation results in a new parameter for which the latent variable’s variance in Group equals 1 (due to ) and all indicators supposed to be invariant across groups stay invariant (as the already invariant loadings of are divided by a quantity which is also invariant across groups). Thus, the transformed parameter fulfills the constraints associated with RG scaling as well as those of the partial metric MI model.
Corollary 6
If a partial metric MI model is estimated by minimizing a discrepancy function, then its , value, RMSEA, and other fit measures do not depend on the particular method used for scaling the model, as long as the scaling method is applied in only one group.
This immediately follows from Proposition 2, as all these quantities are calculated using the partial metric MI model’s likelihood value, which does not depend on the particular method used for scaling the model.
Corollary 7
The results of the -difference test which compares a partial metric MI model to the configural MI model do not depend on the scaling methods used for estimating the configural and partial metric MI model, as long as the scaling method used for the partial metric MI model is applied in only one group. This in particular holds for the -difference statistics, the value, RMSEA, and differences of other fit measures.
This immediately follows from the preceding corollary and the well-known fact that the corresponding quantities for the configural model do not depend on the particular method used for scaling the configural model, either.
Conclusions, Remarks, and Answers
In this paper, our goal was to clarify the impact of the various scaling methods on the estimation results for metric measurement invariance models. To this end, we addressed both full and partial metric MI models, and the results were laid out by means of worked examples as well as theoretical results with formal proofs. A first important insight of the paper is that scaling restrictions for metric MI models must be placed in one group only, which is of particular importance for partial metric MI models if the or methods are used and non-invariant loadings are involved. If the scaling restrictions are set in one group only, key quantities like optimal discrepancy values, , RMSEA, and other fit measures do neither depend on the choice of the scaling method in general nor on the choice of the RI in case of using the method. Thus, all the concerns about the choice of RIs mentioned in the literature can be put aside, as long as scaling is done properly: in particular, it does not matter whether a chosen RI is truly invariant or not. We are now ready to answer the first two questions: All scaling methods provide the same results in sense of the numerical value of the discrepancy function (and therefore all other fit indices) but the scaling must obey the rule, Apply the scaling restriction in one group only, the invariance conditions do the rest! The inconsistencies found in the literature regarding the RI choice may be a result of the lacking distinction between dgp, scaled, and estimated model, which we introduced in the multiple-group context. Following this distinction, it is obvious that a non-invariant indicator in the dgp (in other words: a truly non-invariant indicator) does not affect the estimation of the scaled model. Irrespective of the nature of the dgp, the estimated model, which is stipulated by the researcher, provides the same value of the discrepancy function regardless of the scaling. As an extreme example, the dgp could be completely different from a confirmatory factor analysis model, but an estimated confirmatory factor analysis model would still provide the same results regardless of the applied scaling method. However, in this paper, we adopted a realistic philosophy of science and made the more or less implicit assumption that the dgp matches the estimated model structurally (cf., Klopp & Klößner, 2021, pp. 191, 202).
For partial metric MI models, the issues discussed in the literature regarding scaling and in particular RI selection originate from scaling restrictions being set in all groups, instead of in one group only. For instance, when using the method, one implicitly adds the RI to the set of indicators that are being tested empirically for loading invariance, even if one originally did not want to presume invariance of this indicator’s loadings.15 Thus, choosing different RIs leads to different sets of indicators whose loadings’ invariance is under examination. Consequentially, the results differ, a phenomenon known as constraint interaction. These problems, however, can easily be avoided by setting scaling restrictions properly, i.e. by placing them in one group only. Apart from scaling in all groups instead of in only one, some of the concerns regarding the RI selection as well as concerns regarding other scaling methods, e.g., that the RG method implies an invariance assumption about the latent variances in the reference group (cf., Kline, 2016, p. 405), result from confounding the characteristics of the dgp with the scaling restrictions in the estimated model. As we have illustrated in Figure 1, these are distinctive entities.
One of the surprising results of this paper is that it is impossible to test the invariance or non-invariance of one specific loading by restricting only this loading to be invariant across groups, because in this case a partial metric MI model with the correct number of degrees of freedom is equivalent to the configural model. However, concerning the scaling of (partial) metric MI models, our research also showed that it is not necessary to choose an invariant RI, as long as the scaling is done in accordance with the rule introduced in this paper. This provides the answer to the third question: Also in partial metric MI models, all scaling methods provide the same results in sense of the numerical value of the discrepancy function (and therefore all other fit indices) but the scaling must obey the rule, Apply the scaling restriction in one group only, the invariance conditions do the rest! However, in partial metric MI settings, the researcher has to bear in mind that it is not possible to test the invariance of only one indicator.
With regard to the number of degrees of freedom for full and partial metric MI models, we provided formulas for calculating these easily. Given the findings of Schroeders and Gnambs (2018) regarding published papers with discrepancies with respect to the reported degrees of freedom, these formulas could prove very useful to applied researchers, when they try to determine the correct number of degrees of freedom for their models. Observing our rule, Apply the scaling restriction in one group only, the invariance conditions do the rest! in combination with the formulas for the degrees of freedom even provides a means for checking whether the scaling restrictions were set correctly.
The corollaries concerning the degrees of freedom also point at another issue. For the FM scaling method, Raykov et al. (2012, Appendix A) mentioned that the test of metric MI is incomplete because the MI model only tests the group equality of the scaled16 subset of indicators but not the RI. As the test statistic does not depend on the scaling method, the and scaling methods also provide an incomplete test. However, the term incomplete as introduced by Raykov et al. (2012) in the context of the scaling method cannot be applied to the other two scaling methods. The corollaries provide firstly an explanation why the test is incomplete and secondly, explain what incomplete means in the context of the and . As can be seen in the corollaries about the degrees of freedom in the term, one degree of freedom is lost due to the scaling restriction in the reference group regardless of the scaling method and this restriction is then propagated to the other groups via the invariance conditions. Thus, all scaling method have in common that they lose one degree of freedom due to the need to scale the metric MI model and that represents a meaning of the term incomplete which is common for all scaling methods. In addition, as all scaling methods are equivalent, the RI is not somehow excluded from invariance testing, as it is treated in the same way as the other indicators that are presumed to be invariant. And, finally, the underlying hypothesis being tested empirically does not depend on the scaling method a researcher decides to employ.
Concerning the fourth and last question, all scaling methods have the same mechanism: the scaling restriction is only applied in the reference group and propagated by means of the MI conditions to the other groups. Thus, there are again no differences between the various scaling method in this regards. In particular, this mechanism is the foundation for the, Apply the scaling restriction in one group only, the invariance conditions do the rest! rule that is of utmost importance in the scaling of metric MI models.
To sum up, there are no potential issues concerning the choice of the RI when using the scaling method and more generally, all scaling methods yield the same result. Consequently, any chapter concerning this topic can be deleted from the textbooks. Instead, a thorough explanation of the scaling methods and the rule, Apply the scaling restriction in one group only, the invariance conditions do the rest! should be included. In order to set the focus on the scaling issues of the metric model, we did not address the scaling of scalar measurement invariance models, which in contrast to metric MI models additionally incorporate a mean structure. We expect that future research will produce results similar to the current paper, probably by using techniques resembling the ones used in this paper.
At this point, we want to emphasize that our consideration refers in the first line to the discrepancy function and, and in turn, to the LR-test in the form of the -difference tests in the model testing sequence (see the Introduction section). The results should not be naively generalized to Wald or score tests, that are also sometimes used to investigate measurement invariance models. However, the developed scaling rule and the consideration about the degrees of freedom apply regardless of the respective statistical test.
We want to note that some parts of the results we presented are already present in the current literature. For instance, Wu and Estabrook (2016) provide a comprehensive account on the identification of confirmatory factor analysis models. However, they focus on the special case of models with ordinal indicator and scrutinize the identification of models with combinations of some invariance conditions. They also only consider the RG scaling method. In contrast, the approach in this article is to focus only on the scaling of the metric MI model considering the most common scaling methods. As mentioned above, we left the scaling restriction of the intercepts (or thresholds) behind.
Finally, we want to note that we also did not turn our attention to factors that potentially have effects on metric MI tests, e.g., the size of the manifest variables’ residual variances. For instance, the example of Raykov et al. (2020) mentioned in the introduction, is the result of overlarge manifest residual error variances. As shown in Klopp and Klößner (2022), all possible scaling methods, i.e., all variants of the scaling as well as the and scaling methods, provide the same result test statistic. The authors also demonstrated by means of a Monte Carlo simulation, that lowering the size of the residual variances increased the chance to detect the violation of the metric MI condition. Another restriction was to focus on the multi-group context, ignoring longitudinal ones. However, all our examples and theoretical results can easily be translated to models with dependent variables, in particular, longitudinal models, where time essentially takes the role that groups take in this paper. In a nutshell, the scaling method does not affect results for longitudinal MI models of full or partial metric MI, as long as latent variables measured repeatedly over time are scaled by imposing a scaling restriction at only one point in time.