Modeling the Influences of Social Mobility Net of Origin and Destination Based on the Front-Door Criterion: A Simulation Study

The consequences of social mobility have been a persistent theme on the research agenda of social scientists, but the estimation of the net mobility effect controlling for both social origin and destination confronts with the identification problem. This research 1) highlights the mechanical identification approaches deployed by the conventional methods—the square additive model, the diamond model, and the diagonal reference model; 2) draws on the directional acyclic graphs to present an identification framework that is based on the intermediate variables; and 3) elaborates the specific identification strategies in typical research scenarios: independent mechanism, joint mechanism, partial mechanism, and intermediate confounded mechanism. The results of the Monte Carlo simulations suggest that the mechanism-based identification approach helps to obtain an unbiased estimate of the net mobility effect.

. This task, however, can be challenging. Since the operationali zation of social mobility is often a mathematical function of the origin and destination, the estimation of the net mobility effect could fail. This is essentially a problem of model identification, by which we mean that not all of the effects of origin, destination, and mobility can be uniquely ascertained (Blalock, 1966). This problem in the mobility research has been noted as early as the 1960s, where Blalock (1966) shows that not all coefficients of origin (O thereafter), destination (D thereafter), and their difference (O-D thereafter) can be uniquely identified in a linear or generalized linear model (depending on the properties of the outcome variable) that is used to explain some specific outcome of mobility.
In this case, the many models conventionally used to reveal the net mobility effectthe square additive model (SAM thereafter) proposed by Duncan (1966), the diamond model (DM thereafter) proposed by Hope (1971Hope ( , 1975, and the diagonal reference model (DRM thereafter) proposed by Sobel (1981Sobel ( , 1985-can be seen to be various efforts to resolve this identification problem. A commonplace of these models, as is going to be shown below, is that they mathematically transform at least one operationalization of O, D, and mobility insofar as to ensure a full-rank design matrix for a linear or generalized linear model. This identification strategy is statistically workable, but one shortcoming is that they rely greatly on "mechanical" transformation of the measures of O, D, or mobility. As a result, the analytical results sometimes fall short of substantive and theo retical interpretability. This paper draws on the rising interests in the mechanism-based model identification and presents an identification framework based on the front-door criterion in the causal inference literature (Winship & Harding, 2008). Instead of relying on omnibus mathematical operations, this approach is theory-directed by specifying and controlling for the meaningful mediators through which O, D, or social mobility functions on the outcome variable.
The mechanism-based identification strategy is not new to empirical researchers. The underlying rationale has much to do with the well-known proxy variable methods and has been proposed to handle other "multiple clock" problems (e.g., the age-period-cohort [APC] modeling; Bijlsma, Daniel, Janssen, & De Stavola, 2017;Winship & Harding, 2008). But to the best of our knowledge, this research is the first one that systematically discusses the mechanism-based identification framework in social mobility research.

The Identification Problem and Existing Strategies
The identification problem concerns the question of "whether or not there are too many unknowns for solution" for a given model configuration (Blalock, 1966, p. 52). A fre quently met unidentifiable case is the multicollinearity problem, where some predictors are functions of the others so that their independent effects are not estimable. As well noted, this problem comes into being because the design matrix of these covariates is not full-rank. In the research of net mobility effect, the rank deficiency is exactly the cause of the identification challenge. Specifically, social mobility concerns two basic facets: mobility direction (upward, downward, or static) and mobility length (steps of move on the status ladder). A handy approach to construct this measure is to compute the difference between O and D. For example, suppose both O and D refer to occupational categories with three levels (1 = high, 2 = middle, 3 = low), then their difference O-D would be valued from −2 to 2. Clearly, a positive, negative, and zero value would correspond respectively to upward, downward, and static (no) mobility. The numerical value stands for the steps of move, that is, from a low status to a high status would span over two steps, which returns a value of positive two. Such steps can be understood as the difficulty of mobility (i.e., the number of barriers one has to pass through in order to mobilize upwardly or downwardly).
Despite the theoretical relevance and simplicity, taking into account O-D along with O and D in a linear or generalized linear model would result in the failure of effect identification, because the design matrix is always one less than full column rank. To solve this problem, several models have been proposed. The first one is the SAM. Instead of using O-D, the SAM parameterizes the mobility effect to be the product of O and D, that is, the interaction term O*D (Duncan, 1966). However, this approach has been criticized for not being able to capture the net mobility effect because the mobility effect O*D cannot be entirely separated from the main effects of O and D (Hope, 1971(Hope, , 1975. To preserve the logical measure of O-D, Hope (1971Hope ( , 1975 proposes the DM, where a common dimension of social status without distinction between origin and destination is used. This common status measure can be parameterized to be the summation of O and D (House, 1978). This algebraic operation reduces the number of predictors to be two (one is for O+D and the other is for O-D), so the design matrix now is full. However, an overall status dimension has been called into question because it is in conflict with the multidimensional conceptualization of social status (e.g., Weber, 1978). Also, without controlling for each of O and D, as long as O and D have different effects on the outcome, the O-D term would always appear to have some effect even though such effect does not indeed exist (House, 1978).
Perhaps the most widely used model in the current literature on social mobility is the DRM (Sobel, 1981(Sobel, , 1985. A conventional interpretation of this model is that the net mobility effect is estimated with reference to the diagonal cells of a mobility table. Specifically, using Sobel's notations, the outcome Y of individual k with origin i and destination j is parameterized to be Y ijk = pμ ii +(1-p) μ jj +γ(O-D) ijk +ε ijk , where μ ii and μ jj are the population means in the iith and jjth cells of the mobility table, ε ijk is the random error, and the mobility effect is captured by γ. The DRM makes sense because those who do not mobilize provide the status cues for the mobilizers to get acculturated from. To see how the DRM solves the identification problem, we turn to another DRM parameterization proposed by Sobel (Model 3.4 in Sobel, 1981, p. 898), which is Simply switching the order of p and μ ii as well as the order of 1-p and μ jj , Model (1) is equal to Model (2): In this model, note that neither μ ii nor μ jj is treated as data inputs. Instead, they are statistics that should be estimated from the data (but the sample estimators are not the sample means of the outcome in the diagonal cells of the mobility table, as noted by Sobel, 1981). Seeing μ ii and μ jj to be some unknown coefficients, it is immediately clear that, relative to the unidentifiable generic model with the predictors of O, D, and O-D, the DRM introduces a new unknown coefficient p.
In summary, from the perspective of model identification, all of the three existing models estimating the net mobility effect provide different ways to ensure a full-rank design matrix. One commonplace of the SAM, DM, and DRM is that certain mathemat ical operation is deployed and imposed on either the main effects of O and D (the case of the DM and DRM) or the measure of mobility (the case of the SAM). This is a straightforward way of identifying statistical models, but mechanical and often falls short of theoretical reasoning and justification. In the following two sections, we will present a mechanism-based identification framework that stands for an entirely different approach from the three existing models. Before showing the details, it is necessary to familiarize readers with the directional acyclic graph (DAG).

The Directional Acyclic Graph: An Overview
The mechanism-based identification of the net mobility effect is mostly based on the so-called front-door criterion in the literature of the DAG (Pearl, 1995(Pearl, , 2009). The DAG is a graphical representational system that can be used to show the interrelationship be tween variables. Two fundamental criterions of effect identification have been proposed by Pearl (1995), which are respectively named the back-door criterion and the front-door criterion.
The back-door criterion, intuitively, requires controlling for all of the confounders C that determine the value of both predictor X and outcome Y. This is tantamount to cut off all of the potential confounding paths between X and Y. For example, in Figure 1A, the causal effect of X on Y cannot be identified unless the confounding path X←C→Y is blocked by fixing C. To follow the conventional notation, we use the square symbol to denote statistical controlling, so this is tantamount to X←C→Y. The front-door criterion shifts attention to the mediators that bridge X and Y. If all of the connections between X and Y go through the mediator M, then the causal effect of X on Y would be the estimable as the product of the effect of X on M and the effect of M on Y. Of course, the X-M and M-Y effects should be estimated without confoundedness, which might call for the deployment of the back-door criterion. This identification strategy is also called the path-tracing rule. However, readers should note that this rule may not apply in the case of nonlinear modeling, where the Monte Carlo simulation can be deployed (Bijlsma et al., 2017). In Figure 1B, suppose the link between X and Y is fully mediated by M, the causal effect of X on Y would be a*b. When estimating a, we do not need to control for any variable because the confounding path X←C→Y←M is automatically blocked by the collider Y. However, when estimating b, we may have to control for X or C to make sure the confounding path M←X←C→Y is no longer working.

Illustrations the Back-Door and Front-Door Criteria
The DAG provides a very handy analytical tool to interrogate and fix the confound ing paths, as in Figure 2. In Figure 2A, both O and D are related to D-O, so the mobility effect confronts with three confounding paths: D-O←D→Y, D-O←O→Y, and D-O←D←O→Y. However, due to the multicollinearity problem, we cannot control for O and D simultaneously, so at least one confounding path would still be effective. If we change the measure of mobility (e.g., the SAM), we can control for both O and D, as in Figure 2B. In this case, all confounding paths are blocked: D*O←D→Y, D*O←O→Y, and D*O←D←O→Y. In Figure 2C, the DM uses a common measure of status O+D, so after fixing this term, we can cut off the confounding path D-O←O+D→Y. This is also the case for the DRM in Figure 2D, where the three confounding paths are disabled through statistical controlling: D-O←(1-p)D→Y, D-O←pO→Y, and D-O←(1-p)D← pO→Y.

Figure 2
The Identification Problem and the Existing Solutions Using the DAG, especially the front-door criterion, we in the following section will present the mechanism-based identification framework for the net mobility effect esti mation.

Scenarios of the Mechanism-Based Identification
The mechanism-based identification sets its basis on the introduction of the mediators that fully bridge the link between a specific predictor and the outcome. This approach is desirable because it encourages a more nuanced reflection on "how" social mobility comes into being. In a sense, the mechanism-oriented research is not new to social scientists who have long been interested in the causal chain from one variable to another (e.g., the structural equation modelling as in Blau & Duncan, 1967, also see Kelley, 1973), and this is also true for the literature on the consequences of social mobility. For instance, an early review article on the association between social mobility and fertility by Kasarda and Billy (1985) has already called for scholars' more attention to the "intermediate variables. " As is going to be shown below, introducing the "intermediate variables" into the analysis not only enriches theoretical arguments, but also provides one workable way to identify the net mobility effect. Specifically, there are four research scenarios, as shown in Figure 3.

Independent Mechanism
By independent mechanism, we mean that at least one of the effects of O and D on Y is fully mediated. Figure

Joint Mechanism
Joint mechanism means that O and D share the full mediator of M od . That is to say, the origin and the destination of status work on the outcome variable through the same set of intermediate variables. This is shown in Figure 3B. In this scenario, the identification of the net mobility effect calls for blocking three potential confounding paths by virtue of controlling for M od , as in D-O←O→M od →Y, D-O←D→M od →Y, and D-O←D←O→ M od →Y.

Partial Mechanism
Partial mechanism refers to the situation where there are some missing or unobserved mediators. This is a practical situation since scholars might not be able to get access to all of the mediators for a particular predictor. To illustrate this case, we in Figure 3C

The Mechanism-Based Identification Framework
This partial mechanism is problematic because it violates the front-door criterion that all of the mediators are taken into account. One possible way out is to introduce the mediators for the mobility measure. Indeed, the mediators for the variable of mobility should also be full, but it is still meaningful to check this approach, for at least two reasons. First, relative to the main effects of O and D, the process of status mobility is more specific and better defined, so it is relatively easier for researchers to identify its intermediate variables (e.g., Kasarda & Billy, 1985). Second, if the research objective is to estimate the net mobility effect, the discussions here suggest that as long as at least one predictor among O, D, and mobility can identify sufficient mediators, we could obtain an unbiased estimate for the net mobility effect. This would allow more leeway in the identification for empirical researchers. With

Intermediate Confounded Mechanism
The intermediate confounded mechanism captures the situation where there exist con founders that link the mediator and the outcome. This would make the estimation of the net mobility effect tricky, as illustrated in the left subfigure of Figure

A Simulation-Based Example Setup
We use Monte Carlo simulations to illustrate the mechanism-based identification frame work. Without loss of generality, both O and D are configured to have three categories (1 = low, 2 = middle, 3 = high). The multinomial distribution is used to generate the distribution of 10,000 cases among the three categories of O, with the probabilities to be 0.3, 0.6, and 0.1. The probability of getting into the low, middle, and high statuses of D is respectively 0.6, 0.3, and 0.1 for those from the low status of O, respectively 0.1, 0.8, and 0.1 for those from the middle status of O, and respectively 0.1, 0.3, and 0.6 for those from the high status of O. The measure of mobility is the difference between O and D. One caveat is necessary. We configure O and D to be ordinal variables to be consistent with the current literatures on mobility, where the characteristics of the two generations at issue often take the form of an ordinal gradient, such as the prestige of occupations or the quantiles of income. In this regard, the difference of them make practical sense. Note that the measures of O and D should have comparable scales. Otherwise, some standardization has to be deployed. Of course, we may focus on the continuous measures such as income, and in this case, the difference between O and D would be continuous.
For the case of independent mechanism, the data-generating process configures M o and Y to be both continuous and simulate their values using the following formulas: and Y = 100 + 0.9*M o + 2*D + 1* O−D + ε , where ε N 0,10 and ω ⊥ ε When estimating the net mobility effect, we fit the ordinary least squares (OLS) model Clearly, O is simply replaced with M o compared with the unidentifiable model. When O and D have the same mediator M od , the data-generating process would follow the formulas of M od = 0.5*O + 0.6*D + ω , where ω N 0,1 and Y = 100 + 0.9*M od + 1* O−D + ε , where ε N 0,10 and ω ⊥ ε (7) Since the effects of O and D on Y are all mediated by M od , the OLS model we use to estimate the net mobility effect is The partial mechanism scenario requires introducing an unobserved mediator U o and the mediator for the mobility variable M mobility into the modelling process. To do so, we use the following data-generating rules: M o = 1.5*O + ω M , where ω M N 0,1 M mobility = 0.5* O−D + ξ , where ξ N 0,1 and Y = 100 + 0.9*M o + 0.8*U o + 2*D + 2*M mobility + ε , where ε N 0,10 All random errors are mutually independent. In order to estimate the net mobility effect, but without considering U o , we need to fit two OLS models. One is M mobility = ρ 0 + ρ 1 O−D + π , where π N 0,1 and the other is The point estimate of the net mobility effect is then ρ 1 *β 1 . Lastly, the scenario of intermediate confounded mechanism is simulated as follows: and Y = 100 + 0.9*M o + 0.8*U o + 2*D + 2*M mobility + ε , where ε N 0,10 Again, all random errors are mutually independent. Without considering U o, the net mobility effect is estimated based on the models of M mobility = ρ 0 + ρ 1 O−D + π , where π N 0,1 and Y = β 0 + β 1 M mobility + β 2 O−D + ε , where ε N 0,1 with the point estimate to be ρ 1 *β 1 . To simplify our expository simulations, we configure the continuous outcome varia ble Y, when justifies the OLS model. This can be straightforwardly extended to discrete Ys, where different link functions are adopted in the generalized linear model framework (Faraway, 2016).
Standard errors are computed using the bootstrap method (iteration = 500).

Results
The results of the simulations can be found in Figure 4. It is shown that across the four research scenarios, the mechanism-based identification approach can help to estimate the net mobility effect, with the sample mean and the pre-set effect of one lying within the 95% confidence intervals (CI). Therefore, the mechanism-based identification strategy adds a new tool for social scientists who are interested in the consequences of social mobility net of the influences of social origin and destination. In a sense, the mechanism-based approach might perform better than the convention al modelling approaches. To see this, we use the simulated data of O, D, and O-D to fit the SAM, the DM, and the DRM. Unsurprisingly, the interactive term in the SAM (O*D) contrasts greatly with the effect of the difference measure of O-D, with its coefficient to be 0.147 (the 95% CI is [−0.277, 0.571]). Relatively, the DM model preserves the difference measure O-D, with the 95% CI of the coefficient of O-D to include the pre-set value one (the point estimate is 0.865, with the 95% CI to be [0.556, 1.175]). Lastly, the result of the DRM is not statistically significant, and the point estimate is extraordinarily huge (−112.477). In light of these findings, it seems that except for the DM, neither the SAM nor the DRM could estimate the net mobility effect in an unbiased fashion.

Concluding Remarks
Social scientists are familiar with the mechanism-oriented research (Hedström & Ylikoski, 2010), but thus far, few apply this line of thinking to handle the model identifi cation issue in the analysis of the net mobility effect. This article revisits the previously used methods-the SAM, the DM, and the DRM-from the perspective of model identifi cation. Moreover, we drawing on the DAG, especially the front-door criterion, to present a mechanism-based identification framework. Four analytical scenarios are elaborated, and the Monte Carlo simulation analyses suggest that the mechanism-based approach works well to reveal the consequences of social mobility net of the influences from either social origin or social destination.

Results of the Bootstrap Simulations
Note. The dotted lines refer to the sample mean and its 95% CI. The solid line is the pre-set population net mobility effect of one.
However, this mechanism-based identification framework is no panacea. The key for its success lies in whether or not one is able to collect the variables that fully mediate the relationship between one predictor and the outcome. Since the intergenerational transition of social status could bring about a wide range of changes in one's life, it is no easy task to do so. For practical researchers, the situation of the partial mechanism as illustrated earlier could be fairly common. In this case, one tool that could be of helpful is the mediation analysis (i.e., Imai, Keele, Tingley, & Yamamoto, 2011). If the mediators available to researchers can fully mediate a predictor and the outcome, the estimated direct effect in the mediation analysis-the "residual" effect between the predictor and the outcome net of the mediation-should be not statistically significant. This mediation analysis could also enable scholars to pin down the key mediators. For instance, among the multiple mediators, there could be some crucial ones that essentially play the media tion role insofar that the other ones are only proxies of the key ones. By conducting the mediation analysis, these key mediators can be identified and taken into account when performing the mobility analysis.
Another issue that deserves more discussion is the measurement of social status. Although sociologists traditionally gravitate toward an ordinal measure, more recent re search starts to shift attention to continuous measures such as income. When examining intergenerational income mobility, the identification problem discussed in this article still exists, that is, the coefficients for parental income, children's income, and their difference cannot be uniquely estimated. Therefore, the mechanism-based approach proposed in this paper should be enlightening.
To conclude, we would like to emphasize that the mechanism-based approach matters not only for model identification, but also for the elaboration of theories. Without good understanding of the underlying mechanisms, scholars cannot be sure whether and how social mobility causes a specific outcome variable, which might further call into question the analytical values of the whole status classification scheme (Weeden & Grusky, 2005).

Funding:
The author has no funding to report.