Factor mixture modeling (FMM) has been increasingly used in social, behavioral, and health sciences to examine unobserved population heterogeneity. It enables researchers to model both dimension and typology simultaneously by integrating common factor model and latent class analysis. such that latent classes (i.e., unobserved subgroups) would emerge to capture differences in the common factor model. Latent classes that encapsulate differences in the common factor model among individuals would emerge from the FMM analyses. FMM has been applied with behavioral and health outcomes to examine heterogeneity among psychological trauma victims based on posttraumatic stress disorder symptoms (Elhai et al., 2011), breast cancer patients that reported fatigue symptoms (Ho et al., 2014), and patients with eating disorders based on their emotion regulation profiles (Nordgren et al., 2022), just to list a few.
Among FMM application, covariates (e.g., gender, race) play a critical role in FMM as they are essential to understanding the formation and characterization of latent classes. Specifically, covariates serve as the predictors of latent class membership via multinomial logistic regression in which the log odds of the probability of belonging to a certain class as opposed to a reference class are predicted by covariates. For example, Elhai et al. (2011) found that patients that experienced more traumas and female patients were more likely to be in a more severely symptomatic class as compared with the least symptomatic class.
Despite the prevalence of covariate inclusion, interaction effects among covariates have received considerably less attention. In the context of FMM, covariate interaction refers to the interplay between covariates in affecting latent class membership. In other words, the relationship between latent class membership and one covariate might depend on one or more other covariates. Take children’s executive function skills as a hypothetical example. From a developmental perspective, older children have more developed executive function skills compared to their younger counterparts and thus are more likely to be classified into a high executive function class versus a low executive function class. However, this gap in classification between age groups might be smaller for children with severe traumatic brain injuries (TBIs) as executive function skills of both age groups would be negatively affected by the injuries. Therefore, examining covariate interaction effects on latent class membership can offer us a more accurate and nuanced understanding of population heterogeneity, as it is often the complex and multifaceted interplay among factors that impact the outcome. In addition, the identification of covariate interactions can guide the development and implementation of tailored intervention programs that can improve individual outcomes more effectively. For instance, an intervention program to improve the executive function of children with TBIs can leverage the age by TBI severity interaction and tailor its design and/or implementation accordingly.
Although it is critical to identify covariate interactions, they have not been considered or tested in substantive research based on a non-exhaustive review of fifty-nine FMM applications we conducted. Such lack of investigation into covariate interactions in FMM stands in stark contrast to the common testing of interaction effects in other statistical models (e.g., regression) across applied research (Babikian et al., 2011; Ware et al., 2020; Yeates et al., 2010). The lack of attention on covariate interactions in FMM might be attributable to the fact that interaction effects cannot be identified in a straightforward fashion. That is, a major source of covariate selection has been theories or substantive knowledge of researchers; however, it can be a challenging task for applied researchers to come up with hypotheses regarding potential covariate interactions given the unobserved nature of heterogeneity in FMM (Brandmaier et al., 2013; Jacobucci et al., 2017). On the other hand, if an exploratory approach is taken to test all possible interactions, the number of interactions (including higher-order interactions) will increase exponentially as the number of covariates increases, which leads to a complicated model that is difficult to fit and interpret (Moons et al., 2015).
To address this gap in the literature, this study demonstrates the utility of a machine learning approach to identifying covariate interactions that might potentially explain the heterogeneity identified by FMM. Specifically, this study adopted the structural equation model or SEM trees which was proposed by Brandmaier et al. (2013) as a model-based decision tree approach to finding covariates and covariate interactions that impact parameter estimates of the specified model. SEM trees, as other decision tree approaches, have the capacity of automatically searching for covariate interactions (Arnold et al., 2021; Jacobucci et al., 2017). Leveraging this capacity, this study presents a novel integration of SEM trees into FMM for the purpose of identifying potential covariate interactions that explain latent class membership in FMM. This approach was demonstrated using the Traumatic Brain Injury Model System National Database (TBIMS-NDB) April 2020 version), the country’s largest multi-center database tracking the rehabilitation trajectories for individuals at least 16 years old treated for inpatient TBI rehabilitation. Through this demonstration, this study aims to provide an exploratory tool for FMM users to identify potential covariate interactions, which offers a more nuanced and sophisticated interpretation of heterogeneity and furthers theunderstanding of intersectionality.
Factor Mixture Modeling
Factor mixture modeling (FMM) is a combination of common factor model and latent class analysis (LCA), allowing us to model unobserved heterogeneity in parameters of the common factor model. The common factor model can be written as:
1
is a J × 1 vector of responses for an individual i that is assigned to class k (k = 1, 2, …, K), with J denoting the number of items; is a J × 1 vector of item intercepts; is a J × R matrix of factor loadings and R refers to the number of factors; is a R × 1 vector of factor scores; and a J × 1 vector of item residuals that are assumed to be normally distributed with a mean of zero and variance of . According to Equation (1), item response is a function of intercepts, factor loadings, factor scores, and residuals, as in a typical common factor model. However, the subscript k associated with the model parameters indicates that they are allowed to vary across latent classes except some constraints needed for model identification. That is, a commonly used identification strategy is to fix the first item loading to be one across classes and the factor mean of the last class is fixed to be zero. Factor scores are assumed to be normally distributed with representing the vector of factor means and the covariance matrix of factors. Thus, the class-specific mean vectors and class-specific variance-covariance matrices can be expressed as:
2
3
In FMM, the number of classes is often unknown a priori and needs to be determined by fitting models with varying numbers of classes and comparing model fit using information criteria (ICs), including Akaike information criterion (AIC; Akaike, 1974), Bayesian information criterion (BIC; Schwarz, 1978), and sample size adjusted BIC (saBIC; Sclove, 1987). In addition to evaluating model fit, these ICs penalize model complexity by accounting for the number of parameters. Smaller IC values indicate a better trade-off between model fit and model complexity. Additionally, likelihood-based tests can be used in model selection, such as the Lo–Mendell–Rubin test (LMR; Lo et al., 2001), the adjusted LMR (aLMR; Lo et al., 2001), and the bootstrap likelihood ratio test (BLRT; McLachlan & Peel, 2000). These tests compare the fit of models with k and (k-1) classes and a significant test result (e.g., p < .05) support the k classes over the (k-1) classes.
In addition to the number of classes, measurement invariance (MI) is an important assumption of valid factor mean comparison across classes that needs to be tested (Clark et al., 2013; Kim et al., 2017; Lubke & Muthén, 2005; Wang et al., 2021). Models with different levels of equality constraints on measurement parameters can be constructed and compared, including configural invariance which requires the same factor structure across classes but factor loadings and intercepts are freely estimated, metric invariance that imposes the equality constraints on factor loadings across classes, and scalar invariance which adds additional equality constraints on intercepts. Note that scalar invariance is often considered as a sufficient prerequisite to factor mean comparison in FMM and multiple-group analyses (Lubke & Muthén, 2005; Meredith, 1993). Beyond MI testing on measurement parameters, the equality of other model parameters (i.e., residual variances, factor variances and covariances) across classes can also be tested to facilitate the understanding and interpretation of latent classes and their differences (Clark et al., 2013).
Structural Equation Model (SEM) Trees
SEM trees integrate SEM into a model-based decision tree paradigm in which the data set is recursively partitioned into subsets based on the splitting of covariates so that differences in SEM parameter estimates are maximized across subsets (Brandmaier et al., 2013; Jacobucci et al., 2017). SEM trees are useful when researchers are interested in finding the influence of covariates and covariate interactions on the SEM model. SEM is a family of statistical procedures that has been widely adopted in social and behavioral sciences to model the relationships among multiple variables (Kline, 2015). One of the key features of SEM is its capacity to model latent constructs (or factors) that are measured by a set of items (or observed variables) and take into account measurement errors. Examples of commonly used SEM procedures include path analysis, the common factor model, structural equation modeling (relationships among multiple factors), and latent growth curve models. Built on the SEM model, SEM trees serve as a tool for exploratory discovery of influences and interactions of covariates on SEM model parameters via the decision tree paradigm.
The decision tree is a supervised machine learning algorithm for prediction and classification (Gupta, 2014; Song & Lu, 2015). It grows a tree structure via recursive partitioning of the covariate space so that individuals classified into the same subset are relatively homogenous in terms of the outcome variable. Figure 1 presents an illustrative example of a scatterplot of a binary outcome variable, diagnosis of the Alzheimer's disease (triangles for Alzheimer's and squares for non-Alzheimer's) on the left and the resultant tree structure on the right, using age and education level as the covariates. The tree structure can be interpreted as a set of “if-then” statements. For instance, if age ≤ 65 and education level ≤ 2, the predicted outcome is Alzheimer’s diagnosis. The splitting of the data set can occur based on multiple criteria and the figure demonstrates a simple rule that constructs a decision tree with a minimal misclassification rate which is also referred to as an incorrect prediction rate (Gupta, 2014).
Figure 1
Example of Decision Tree
Algorithms
Integrating features of SEM and decision tree, Brandmaier et al. (2013) proposed SEM trees to partition the data set with respect to covariates to maximize difference in SEM parameters across subsets. SEM trees are performed in three steps. First, define a template SEM which is referred to as , and fit to the data set. The following equation shows the minimization of a fit function with q degrees of freedom via maximum likelihood estimation (Arnold et al., 2021):
4
In this equation, is a vector of observed means; is the observed covariance matrix; indicates the number of observed variables in SEM; is a vector of model parameter estimates; is the model-implied covariance matrix; and is a vector of model-implied means.
Second, to evaluate a possible split based on a covariate, the full data is partitioned into subsets where , and the template SEM model is fitted to each subset. Given that the subsets are non-overlapping, the fit of all SEMs across subsets is evaluated independently based on Equation (4) and these models are referred to as . Then the fit of and is compared using the likelihood ratio test:
5
N and refer to the sample size for the full data set and the subset l. follows the chi-square distribution with degrees of freedom. All possible splits are evaluated for each covariate, and the split with maximum increase in the LR is chosen.
Lastly, repeat the steps for each subset due to the chosen split to find further partitions that significantly improve the model fit; if the partition does not improve the model fit, then further partitioning is terminated. Results of SEM trees can be visualized as a tree structure with nodes. The inner node (i.e., node that has successors) represents a cut point with respect to a covariate, and leaf nodes are associated with an SEM that represents the induced subsamples of the data (Brandmaier et al., 2013).
Model Constraints
Similar to FMM, constraints on SEM model parameters can be imposed in SEM trees. Specifically, there are two types of restrictions in a tree: a global restriction and a local restriction. A global restriction can be imposed on any parameter(s) in the SEM model in which the value for the constrained parameter is estimated with the full data set and fixed across all subsequent models. A local restriction is imposed only for split evaluation such that the parameters are equal across all models that share the same inner node, but the resultant leaf nodes can have different values of the parameters. In other words, parameters are allowed to be different across models, but their differences do not contribute to the split evaluation.
Integrating SEM Trees Into FMM
Among a few applications of SEM trees that have been identified (Ammerman et al., 2019; de Mooij et al., 2018; Li et al., 2021; Sagan & Łapczyński, 2020), interaction among covariates was present. For instance, Li et al. (2021) included a total of 33 covariates to examine their associations with students’ attitudes towards collaboration, and found that student gender affected the CFA model parameters of students’ attitudes towards collaboration, but only for those with above-average home educational resources, which indicated an interaction effect between gender and home educational resources. Given the advantage of SEM trees in automatically searching for covariate interactions, this study proposes an integrated use of SEM trees and FMM such that covariate interactions that are identified by SEM trees might potentially explain heterogeneity in FMM.
The proposed integrated use consists of the following five steps:
-
Identify constructs and items for the FMM analyses, as well as covariates that might potentially explain the distinction among latent classes. Constructs refer to the latent factors that are measured by a set of items, which is the basis of FMM analyses as shown in Equation (1).
-
Conduct unconditional FMM analyses (without covariates) based on the identified constructs and items. Specifically, given that the number of classes and the class-varying parameters are unknown, a series of FMMs can be specified and fitted to the data, including 1-class, 2-class configural, metric, and scalar invariance models, 3-class configural, metric, and scalar invariance models, etc. The fitted models can be compared in terms of fit based on multiple ICs, such as AIC, BIC, and saBIC1. Model with the smallest ICs can be chosen as the best-fitting model.
-
Examine the substantive interpretability of the best-fitting model based on parameter estimates.
-
Conduct SEM trees analyses to identify covariate interactions that could potentially explain latent class membership in FMM. To maximize the chance that covariate interactions selected by the SEM trees would explain latent class membership in FMM, we propose that the specification of parameter restrictions between these two approaches should be matched. That is, the level of invariance (i.e., configural, metric, or scalar) that is identified in FMM is also adopted in SEM trees via the global constraint function.
-
Multinomial logistic regression is conducted with covariate interactions that are detected by the SEM trees as well as all main effects to examine correlates of latent classes. The three-step approach to covariate inclusion is adopted here, given that the identification of latent classes is done without the influence of covariates, and the impact of covariates and covariate interactions is examined while taking into account classification errors (Asparouhov & Muthén, 2014; Vermunt, 2010).
Demonstration
This demonstration serves as example of the integrated use of FMM and SEM trees via the five steps proposed above. The sample came from the Traumatic Brain Injury Model System National Database (TBIMS-NDB) obtained as public datasets with version date of April 2020. TBIMS-NDB was funded by the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) as a prospective, longitudinal, multicenter database to examine the health outcomes of more than 17,000 individuals who experienced TBIs that require inpatient rehabilitation in the United States. All data were collected using surveys, with baseline data collected at the time of discharge from inpatient rehabilitation settings and follow-up data collected at 1-, 2-, 5-, 10-, 15-, 20-, 25-, and 30-years post-injury. This demonstration used the 1-year post-injury data that consisted of 9,741 individuals. A full description of the sociodemographic characteristics of the sample as well as other descriptive statistics of the variables is provided in Table 1. Annotated codes for the following analyses are included in the electronic Supplementary Materials.
Table 1
Descriptive Statistics of Variables and Sample Sociodemographic Characteristics
| Variable/Characteristic | Statistic | ||
|---|---|---|---|
| Life Satisfaction | N | M | SD |
| 1. Ideal life | 9717 | 4.06 | 2.08 |
| 2. Excellent life conditions | 9728 | 4.06 | 2.08 |
| 3. Satisfaction with life | 9729 | 4.60 | 2.05 |
| 4. Important things in life | 9723 | 4.71 | 1.99 |
| 5. Life lived over | 9709 | 3.84 | 2.22 |
| Continuous Covariates | N | M | SD |
| TBI severity | 5529 | 11.21 | 4.06 |
| FIM Cognition | 9695 | 16.03 | 7.58 |
| Categorical Covariates | N | % | |
| Sex | |||
| Females | 2751 | 28.25 | |
| Males | 6988 | 71.75 | |
| Race | |||
| White | 6897 | 70.82 | |
| Black | 1596 | 16.39 | |
| Hispanic | 849 | 8.72 | |
| Others | 397 | 4.08 | |
| Age Group | |||
| AYAs | 2994 | 30.74 | |
| Adults | 5108 | 52.44 | |
| Older Adults | 1639 | 16.83 | |
| Pre-Injury Employment Status | |||
| Employed | 6389 | 66.12 | |
| Student | 706 | 7.31 | |
| Unemployed | 2568 | 26.58 | |
| Pre-Injury Impairment | |||
| Yes | 368 | 5.49 | |
| No | 6333 | 94.51 | |
| Pre-Injury Physical Limitation | |||
| Yes | 491 | 7.33 | |
| No | 6206 | 92.67 | |
Note. Ideal life = In most ways my life is close to my ideal; Excellent life conditions = The conditions of my life are excellent; Satisfaction with life = I am satisfied with my life; Important things in life = I have gotten important things I want in life; Life lived over = If I could live my life over, I would change almost nothing. AYAs = adolescents and young adults.
For Step 1, the 5-item Satisfaction with Life Scale (SWLS) was used as the outcome assessment for life satisfaction levels among individuals following TBI (Diener et al., 1985; Pavot & Diener, 1993). Each item scored from 1 (lowest life satisfaction) to 7 (highest life satisfaction) asking different aspects of a patient’s perception of his/her life conditions. A total of seven covariates were identified, including Functional Independence Measure (FIM) Cognitive on Admission (Linacre et al., 1994), pre-injury disability and pre-injury limitations (National Research Council, 2004), TBI severity (Teasdale & Jennett, 1976) as measured by patients’ total Glasgow Coma Scores, age at injury, biological sex, race, and pre-injury employment status. All covariates were collected at baseline visit. Age at injury was recoded as a categorical variable: adolescents and young adults (AYAs; ≤ 25), adults (26–59), and older adults or seniors (≥ 60).
For Step 2, unconditional FMM analyses were conducted with life satisfaction in Mplus 8.42 (Muthén & Muthén, 1998-2017). Table 2 presents model fit comparisons of FMMs. All fitted models converged except the 4-class configural and scalar models. Among converged models, AIC, BIC, and saBIC consistently showed that the 4-class metric model had a superior fit.
Table 2
Model Fit Comparison of Factor Mixture Modeling
| Model | Parm | LL | AIC | BIC | saBIC | Entropy | Class Proportions |
|---|---|---|---|---|---|---|---|
| 1-class | 15 | -94483 | 188996 | 189104 | 189056 | ||
| 2-class conf | 31 | -88689 | 177440 | 177663 | 177565 | .90 | .72/.28 |
| 2-class metric | 27 | -88795 | 177644 | 177838 | 177753 | .90 | .73/.27 |
| 2-class scalar | 18 | -93401 | 186838 | 186967 | 186910 | .92 | .38/.62 |
| 3-class conf | 47 | -85263 | 170619 | 170957 | 170807 | .91 | .14/.58/.28 |
| 3-class metric | 39 | -85345 | 170769 | 171049 | 170925 | .91 | .14/.58/.28 |
| 3-class scalar | 21 | -93411 | 186863 | 187014 | 186947 | .65 | .40/.39/.21 |
| 4-class conf | Non-convergence | ||||||
| 4-class metric | 51 | -84430 | 168961 | 169328 | 169166 | .87 | .14/.25/.33/.28 |
| 4-class scalar | Non-convergence | ||||||
Note. conf = configural invariance; metric = metric invariance; scalar = scalar invariance; Parm = number of free parameters; LL = log-likelihood; AIC = Akaike information criterion; BIC = Bayesian information criterion; saBIC = sample size adjusted BIC.
For Step 3, interpretability of the 4-class metric model was examined. Table 3 presents the parameter estimates of this model by latent class. While loadings were constrained to be equal across classes, intercepts, factor mean, and factor variance were allowed to be freely estimated.3 Factor means were estimated to be -4.61, -3.01, and -1.98 for Classes 1, 2, and 3 respectively, with Class 4 serving as the reference group (factor mean 0). Note that although factor mean comparison is not permitted with a metric invariance model, factor means of Classes 1, 2, and 3 were statistically significantly different from zero. Class 3 had the largest proportion, .33, followed by Class 4 (.28), Class 2 (.25), and Class 1 (.14).
Table 3
Parameter Estimates of the Four-Class Metric Invariance FMM
| Intercept
|
|||||
|---|---|---|---|---|---|
| Item/Statistic | Loading | Class 1 | Class 2 | Class 3 | Class 4 |
| Item | |||||
| Ideal | 1.00 | 6.12 | 6.12 | 6.12 | 6.12 |
| Cond | 1.15 | 6.87 | 6.47 | 6.48 | 6.13 |
| Satisfied | 1.05 | 6.45 | 5.78 | 8.00 | 6.24 |
| Important | .94 | 6.90 | 6.79 | 6.74 | 6.21 |
| Live again | .88 | 5.57 | 6.16 | 5.64 | 5.23 |
| Statistic | |||||
| Factor mean | -4.61 | -3.01 | -1.98 | 0 | |
| Factor variance | .23 | .43 | .34 | .32 | |
| Class proportion | .14 | .25 | .33 | .28 | |
Distinction of the latent classes was further interpreted based on the life satisfaction item mean by class, as illustrated in Figure 2. ANOVAs with Bonferroni adjustment were conducted to compare the item means across classes and results showed statistically significant mean differences between any two groups. Class 4 had the highest mean across all items, followed by Class 3, Class 2, and Class 1. Of note is that Class 3 had relatively high mean on the item, “I am satisfied with my life”, which might correspond to the high item intercept in the 4-class metric invariance FMM.
Figure 2
Life Satisfaction Item Mean by Latent Class
For Step 4, SEM trees were performed in the semtree package in R (Brandmaier et al., 2021; R Core Team, 2021). A CFA model of life satisfaction measured by five items was specified and a total of 12 covariates were included. Given that a 4-class metric invariance model was supported in FMM, metric invariance was also established in SEM trees via the global constraints function such that factor structures and loadings were constrained to be equal across groups whereas intercepts, factor mean, and residual variances were freely estimated. The resulting tree was displayed in Figure 3. There were four splits among which the first two occurred on age and the other two on race. The first split divided the whole sample into two, older adults (n = 1639) versus the rest (n = 8102). The second split further divided those that were not older adults into two, adults (n = 5108) versus AYAs (n = 2994). Each of these two groups was split again on whether or not the patient was Black. Therefore, there were a total of five groups as a result of SEM trees, older adults, Black adults, adults that were not Black, Black AYAs, and AYAs that were not Black, n = 1639, 921, 4187, 502, 2490 respectively.
Figure 3
Tree Plot of SEM Trees
Note. N refers to the sample size at each split; LR is the likelihood ratio statistic with the difference in degrees of freedom (df); ages and agem refer to older adults and adults, respectively; black refers to the race group of Black.
Given that split occurred on whether or not the patient was Black for both adults and AYAs but not older adults, an interaction effect was signified between the race category of Black and older adults. In other words, the impact of being Black on CFA model parameters was absent for older adults and present for the rest of the sample.
For Step 5, the interaction effect between older adults and Black that was detected by SEM trees was included in the multinomial logistic regression on top of all main effects. Results (see Table 4) showed that the interaction effect was significant for Class 2, B(SE) = -.88(.35), p = .013, which indicates that the impact of race on the likelihood of being assigned to Class 2, a somewhat satisfaction class, depended upon age group. That is, for individuals that were AYAs, the odds of being in Class 2 (versus Class 4, the reference group) for Black people were 2.24 times that of White people, controlling for all other covariates in the model. However, for older adults, Black individuals experienced a reduction of 7% in the odds of being in Class 2 compared to the White. In other words, seniority positively related with life satisfaction for Black individuals, and the Black AYAs were at a higher risk for life dissatisfaction.
Table 4
Results of Multinomial Logistic Regression via the Three-Step Approach
| Class 1
|
Class 2
|
Class 3
|
||||
|---|---|---|---|---|---|---|
| Covariate | Est (SE) | OR | Est (SE) | OR | Est (SE) | OR |
| TBI severity | -.04 (.02) | 0.96* | -.01 (.01) | 0.99 | -.01 (.01) | 0.99 |
| FIM cognition | -.01 (.01) | 0.99 | -.02 (.01) | 0.98* | -.01 (.01) | 0.99 |
| Adults | .63 (.18) | 1.87*** | .51 (.14) | 1.66*** | -.21 (.13) | 0.81 |
| Older Adults | -.56 (.24) | 0.57* | -.06 (.18) | 0.94 | -.63 (.16) | 0.54*** |
| Female | .04 (.14) | 1.04 | .12 (.11) | 1.12 | .10 (.11) | 1.10 |
| Black | .72 (.18) | 2.06*** | .81 (.16) | 2.24*** | .54 (.16) | 1.71** |
| Hispanic | .05 (.20) | 1.05 | .22 (.16) | 1.24 | -.10 (.16) | 0.90 |
| OtherRace | -.58 (.39) | 0.56 | .37 (.22) | 1.44 | -.28 (.24) | 0.76 |
| Student | -.10 (.33) | 0.91 | .07 (.24) | 1.07 | .04 (.22) | 1.04 |
| Unemployed | .64 (.15) | 1.89*** | .28 (.12) | 1.32* | .29 (.11) | 1.34** |
| Pre-impairment | -.22 (.27) | 0.80 | -.002 (.20) | 1.00 | .02 (.19) | 1.02 |
| Pre-phylimit | .38 (.22) | 1.47 | .16 (.18) | 1.18 | .18 (.18) | 1.19 |
| Older Adults*Black | -.82 (.52) | 0.44 | -.88 (.35) | 0.42* | -.29 (.32) | 0.75 |
Note. Pre-impairment = pre-injury impairment; pre-phylimit = pre-injury physical limitation; the missing groups for categorical covariates are the reference groups (i.e., AYAs, Male, White, and Employed). Est (SE) = estimated regression coefficient (standard error); OR = odds ratio.
*p < .05. **p < .01. ***p < .001.
The interaction between age group and race is further illustrated in Table 5 in which the composition of Classes 2 and 4 with regards to age group and race is presented. That is, among 435 Black people that were assigned to Class 2, the somewhat satisfaction class, only 7.59% were senior, whereas 20.66% of Black people in Class 4, the high satisfaction class, were senior. The discrepancy in percentages was not as substantial as above for the Black AYAs, the White seniors, or the White AYAs. In addition to the interaction effect, adults were more likely to be in Class 2 than AYAs and those that were unemployed were associated with a higher likelihood of being in Class 2 than those that were employed.
Table 5
Age Group by Race Interaction Effect
| Race and Age Group | Class 2 | Class 4 |
|---|---|---|
| Black | ||
| AYAs | 119 (27.36%) | 80 (29.52%) |
| Adults | 283 (65.06%) | 135 (49.82) |
| Older Adults | 33 (7.59%) | 56 (20.66%) |
| Total | 435 (100.00%) | 271 (100.00%) |
| White | ||
| AYAs | 378 (23.46%) | 664 (31.77%) |
| Adults | 929 (57.67%) | 926 (44.31%) |
| Older Adults | 304 (18.87%) | 500 (23.92%) |
| Total | 1611 (100.00%) | 2090 (100.00%) |
Note. AYAs = adolescents and young adults.
For the other classes (i.e., Classes 1 and 3), despite the absence of a significant interaction effect, age, race, and unemployment all had significant impact on the latent class membership. That is, adults were more likely to be in Class 1 which were characterized by low life satisfaction, compared with AYAs. Older adults were less likely to be in Classes 1 and 3 which were the low and moderate life satisfaction classes, respectively, compared with AYAs. Individuals who were Black were more likely to be in Classes 1 and 3 than Class 4, compared with those that were White. Those that were unemployed were associated with a higher likelihood of being in Classes 1 and 3 compared with those that were employed.
Discussion
This study aimed to demonstrate the utility of a machine learning approach, SEM trees, for the identification of covariate interactions that potentially explain latent classes in FMM. Specifically, this study tapped into the advantage of SEM trees in automatically searching for covariate interactions and showed that covariate interaction that was detected by SEM trees can be incorporated into FMM to explain the distinction among latent classes. As demonstrated, SEM trees revealed the interaction between race and age group, which provided a more nuanced understanding of how these factors interplayed to affect life satisfaction. That is, the impact of being Black on individuals’ likelihood of being assigned to a somewhat satisfaction versus a high satisfaction class depended on age group, which clearly indicates seniority as a protective factor against life dissatisfaction. Retrospectively, this interaction effect is in alignment with the prior literature on life satisfaction and other psychological and health outcomes (Ajrouch et al., 2001; George et al., 1985; Phatak et al., 2013; Shaw et al., 2010). Overall, this demonstration provides an example of how intersectionality can be examined and understood with an integration of FMM and SEM trees.
Despite the utility of the SEM trees in identifying covariate interactions, there is no guarantee that the interaction terms will turn out to be the sources of heterogeneity in FMM. For example, the race by age group interaction was statistically significant in one latent class, but not for the other two classes. This possible discrepancy between FMM and SEM Trees occurred due to the drastic differences between the two approaches in how heterogeneity is modeled (Jacobucci et al., 2017). That is, in FMM, latent classes formed on the basis of the estimated model parameters (e.g., intercepts, loadings, factor mean, factor variance), whereas splits of the sample in SEM trees depend upon covariates. Note that although a conditional FMM might be more comparable to SEM trees given that the contribution of covariates to the formation of latent classes is allowed, we adopted unconditional FMM in our study which allows researchers to first examine heterogeneity based on the outcome of interest and subsequently explore the impact of covariates. This has been aligned with the vast majority of FMM applications (e.g., Babusa et al., 2015; Bernstein et al., 2013; Elhai et al., 2011).
The possible discrepancy between FMM and SEM trees in identifying covariate interactions does not undermine the utility of SEM trees in suggesting potential interactions. Especially when intersectionality is of interest to applied researchers but substantive theories or knowledge regarding the form of interactions are lacking, SEM trees offers a data-driven and exploratory approach that can be adopted to identify possible interaction effects that explain latent classes in FMM. As demonstrated in the paper, an unconditional FMM can be conducted first to identify latent classes and the level of equality constraints on parameters across classes. Next, the SEM trees can be conducted with a comparable level of constraints to FMM (e.g., loadings are equal across classes) and the suggested covariate interactions could be added to the multinomial logistic regression on top of the main effects via the three-step approach. Alternatively, if hypothesis regarding interaction effects is available, the two modeling approaches can be used concurrently and SEM trees at least offer an alternative perspective into how heterogeneity is shaped by covariates.
While we highlight the utility of SEM trees in suggesting covariate interactions, a few caveats are worth mentioning. First, future Monte Carlo simulation studies are needed to systematically evaluate the efficacy of this approach of integrating SEM trees with FMM. For example, multiple splitting methods and options to control the growth of the tree are available in the implementation of the SEM trees approach, and simulation studies are needed to examine which method and option would be optimal under which data conditions (Jacobucci et al., 2017). Additional factors that can be considered in simulation studies include numbers of latent classes, degrees of class separation, number of covariates, forms of interactions (e.g., two-way or higher-order interactions), etc. Second, the SEM trees approach should not be considered as a replacement of substantive theories or knowledge in identifying covariate interactions (Brandmaier et al., 2013). Covariate interactions suggested by the SEM trees should be meaningful and interpretable through a retrospective check with theories or knowledge of researchers, prior to the addition of interactions into the multinomial logistic regression. Third, this study demonstrated the utility of the SEM trees for FMM and future research is needed to examine the potential of this approach for other mixture models (e.g., growth mixture model, latent class analysis) via demonstrations and Monte Carlo simulations. Despite these caveats, we encourage FMM users to tap into the advantage of the SEM trees in identifying potential covariate interactions that advance their understanding of intersectionality and heterogeneity.
This is an open access article distributed under the terms of the Creative Commons
Attribution License (