Research questions pointing to the risk of experiencing an event as well as respective data sets are frequently found in social science research, e.g., in family formation (Kingsley, 2018; Kurz et al., 2006), educational attainment (Ameri et al., 2016), recidivism (Skardhamar & Telle, 2012) or reemployment (Hägglund & Bächmann, 2017). They are commonly estimated with hazard models from the field of time-to-event analysis. These research questions, including the examples from above, often involve time-varying covariates (TVC) which allow to model the impact of covariates that change over time. The classical approach to include the TVCs in time-to-event models is based on assuming that the value does not change between observations and hence carrying the last observation forward until the next observation. We call this last value carried forward (LVCF) assumption. This allows to make individual-specific predictions for each time point during their individual observation period, yet relies on the questionable assumption, that the value does not change between observation times. We refer to this modelling strategy as TVC approach. Furthermore, this strategy only holds appropriate estimation results, when the TVC is exogenous, i.e., is independent of the time-to-event outcome (event happened vs. censored). The exogeneity assumption does not hold in cases of anticipatory effects of the event or when the trajectory of the TVC is highly correlated with the outcome (more on the definition of exogeneity and endogeneity in the section Types of Time-Varying Covariates). Both assumptions, LVCF and exogeneity, are particularly questionable for many frequently changing and self-reported covariates as they are common in social sciences when individual trajectories of TVC and person-related events are of interest.
In these cases a joint model for longitudinal and time-to-event data (Wulfsohn & Tsiatis, 1997) is an appropriate estimation routine. It combines a longitudinal estimation procedure for the TVC and a classical time-to-event model. Including the estimates of the longitudinal model in the time-to-event model links them to a joint model. The estimation of the regressions coefficients of the two linked submodels is carried out simultaneously. It therefore allows to model the relationship between an endogenous covariate and the risk of an event appropriately and represents a useful tool to investigate complex social research questions. Shifting the focus to the longitudinal part, a joint model can be used to model missing not at random dropout from a longitudinal study (see also the subsection A Different Perspective on Joint Models). Further the link between the two models can be adapted to more complicated relationships (more on this in the Summary & Conclusion section).
Joint models are standard tools in biostatistics, e.g., to investigate the relationship between the trajectory of a biomarker in blood cells on all-cause mortality (Nüñez et al., 2014) or on recurrence of cancer (Ferrer et al., 2016) but do not yet belong to the standard toolkit of social science researchers (Cremers et al., 2021). In order to increase the usage in social science applications a low threshold introduction to the method is needed. Therefore, this tutorial paper aims to guide the reader through a social science application of a joint model. As an illustrative application, we will model the marital satisfaction and the risk of marriage dissolution using a German panel data base (Huinink et al., 2011).
Approaches to model two processes simultaneously as well as decomposition of effects already exist in social science literature, e.g., simultaneous equation models (Lillard, 1993) sometimes referred to as multi-process models (Mikolai & Kulu, 2018) or structural equation models (Rosen-Grandon et al., 2004) which target related yet different problems (latent variables and model accuracy for structural equation models, common unobserved heterogeneity and reciprocal causality for simultaneous equation models and in multi-process models). Modelling the trajectory of a longitudinal variable together with its influence on a time-to-event outcome is best modelled by a joint model for longitudinal and time-to-event data.
The following section highlights the definition of endogeneity and exogeneity in order to lay the foundation to identify endogenous covariates properly. The Joint Models for Longitudinal and Time-to-Event Data section introduces the method of joint models for longitudinal and time-to-event data and its properties using the example of marital satisfaction and the risk of marriage dissolution. After a short review of the literature of marital satisfaction and dissolution, the Application: Marriage Satisfaction and Time to Marriage Dissolution section describes the data base, the model specification for the application and discusses the estimation results. In the Comparison to Other Modelling Approaches subsection, the results are compared to other modelling strategies (classical TVC approach, two-stage model). The tutorial concludes with a summary and a discussion of possible extensions as well as problems of the application. The example is executed using the Software R and can be reproduced using the web supplementary material of this tutorial (see Potts et al., 2026).1
Types of Time-Varying Covariates
Regarding the question whether a classical TVC approach models the data appropriately, the type of the time-varying covariate is crucial. The usage of the TVC approach in time-to-event models does not pose problems, when the modelled covariate is exogenous. In contrast, endogenous variables are problematic in the case of classical hazard models as they do result in biased estimates and one needs to exercise caution when making causal statements (Box-Steffensmeier & Jones, 2004). Thus, researchers should consider using a joint model when exogeneity is questionable. Therefore, we shortly review the definition of exogeneity and endogeneity in time-to-event models before describing the method.
Kalbfleisch and Prentice (2002) divide exogenous time-varying variables into two subcategories: defined and ancillary TVCs. Defined ones have a predetermined path in advance for all subjects of the study, e.g., historical period, cohort or age of the individual. In contrast, an ancillary TVC is the result of a stochastic process, which is external to the observation unit such as population level characteristics, e.g., unemployment rate in an economy (Yamaguchi, 1991). However, such exogenous covariates may be rather rare in micro-level analyses in social science research, since many time-varying covariates describe individual-specific changes.
Endogeneous time-varying variables are also categorized into two different sub-types (Kalbfleisch & Prentice, 2002): state dependent and rate dependent TVCs. The former comprises variables whose path is not independent of the state of the outcome variable. Consequently, they result in different paths of the TVC depending on the respective time-to-event outcome (e.g., marital satisfaction trajectories of still married vs. separated). ”In other words, the value of the time-dependent covariate carries information about the state of the dependent process.” (Blossfeld & Rohwer, 2001, p. 132)
Rate dependent covariates are directly correlated with the hazard rate of an event such that not only the trajectory of the TVC correlates with the outcome but the estimated risk of having an event influences the trajectory as well. This can be for example due to anticipation of the event, e.g., the effect of the anticipation of divorce on the working behaviour of women (Poortman, 2005). Other examples of possible endogenous covariates in social sciences include the number of failed/passed credits regarding the event of student drop-out or the amount of working hours and the timing of births.
Joint Models for Longitudinal and Time-to-Event Data
In this section, we start with an illustrative example and point out the shortcomings of classical time-to-event models when dealing with endogenous time-varying covariates. This is followed by an explanation of the method of joint models for longitudinal and time-to-event data and their advantages.
An Illustrative Example: Time-to-Event Model and Joint Model in Comparison
In the example we are interested in — the risk of marriage dissolution (event) and how covariates influence this risk — we may include covariates that are changing over time such as subjective marital satisfaction (TVC). Research interest lies in the influence of the trajectory of marital satisfaction and the risk of marriage dissolution. Assume marital satisfaction has been captured multiple times over the years and information on the start and possible end of the marriage are available. Figure 1 serves as an illustration for the fictional example using one single individual: the upper trajectory depicts his/her marital satisfaction measure, where the points correspond to the measurements in time. The measurement points are connected via the smooth trajectory function (solid line). The lower part represents the estimated hazard rate for marriage dissolution over time. In this example higher values of satisfaction (upper panel) go hand in hand with smaller estimated hazard rates, i.e., lower risk of ending the relationship (lower panel).
Figure 1
Scheme of Two Related Processes and Possible Modelling Strategies
Note. Upper panel: Measurements of the time-varying covariate (longitudinal process). Lower panel: Risk of having an event (time-to-event process). In a joint model the influence of the trajectory in the upper panel on the risk is estimated by an association parameter α and both processes can be modelled as functions of (shared) covariates.
In order to include the time-varying covariate marital satisfaction into a time-to-event model, one could use the classical TVC approach. This approach yields several problems since it assumes that the respective variable, (1) does not change between the observation times and, (2) is exogenous. These two assumptions are particularly questionable for volatile and self-reported variables. The first assumption would result in a step-function (dashed line in upper panel of Figure 1) between the measurements in the upper panel instead of the smooth path. Especially for infrequently measured variables with long periods between two measures this approach may model the trajectory inappropriately.
Additionally, as marital satisfaction does neither evolve from a stochastic process, which is external to the individual under study (ancillary TVC), nor can be calculated as a defined covariate, it can be called an endogenous TVC and therefore violates the second assumption. Figure 2 may be an indicator for state dependence of marital satisfaction showing the smoothed average trajectory of marital satisfaction clustered by marital status: (a) persons still in the relationship and (b) persons that ended their relationship to their married partner. The fact that the trajectories (both, for men and women) differ significantly between the marital status groups, is a strong indicator for a state-dependent TVC and thus a modelling technique able to appropriately adress this endogeneity has to be applied. Arguably rate dependence may also apply in this example, as Clark et al. (2008) found a strong anticipatory effect of the life event divorce regarding life satisfaction for men and women. It seems plausible that a similar effect exists for marital satisfaction. Besides the proper estimation in the presence of endogeneous time-varying covariates, an additional difference between the joint model and the classical time-to-event model is the inclusion of predictors for modelling the trajectory of the TVC. The basic approach of a TVC does not allow to investigate its predictors (Figure 1: Covariate 1, 2, 3 would not be taken into account for the upper panel).
Figure 2
Estimated Average Trajectory of Relationship Satisfaction of Persons by Event Status and Sex
Note. Non-linear smoother by sex (dark: men, light: women). Based on the illustration by Crowther et al. (2013).
Even though a two-stage approach can be applied to first model the TVC as a function of covariates in order to include their effects and further overcome the problem of LVCF, it has some unfavourable statistical properties mainly arising from the independent estimation of the two submodels. The subsequent time-to-event model treats the prediction of the firstly fitted longitudinal outcome, as if it was estimated without any uncertainty. This results in an underestimation of standard errors for the TVC and thus cannot be interpreted appropriately in terms of statistical inference. For an in-depth investigation on the consequences of using the aforementioned approaches instead of a joint model see Sweeting and Thompson (2011).
Compared to these two approaches (TVC approach, two-stage approach) the joint model for longitudinal and time-to-event data is advantageous as it:
Can handle endogenous TVC in time-to-event models.
Allows for proper inference in the presence of endogenous covariates due to simultaneous estimation routine.
Allows for changes in the TVC between observation points.
Deals with informative drop-out in longitudinal studies (see the A Different Perspective on Joint Models subsection).
A main drawback of joint models is the computational effort during the fitting procedure. In contrast, two-stage models are less computationally demanding. Therefore, some research focuses on combining the advantages of the two methods — the unbiased estimates from the joint model and the fast estimation of the two-stage approach (Leiva-Yamaguchi & Alvares, 2020).
Method
Joint models (Faucett & Thomas, 1996; Rizopoulos, 2012; Wulfsohn & Tsiatis, 1997) overcome the above mentioned shortcomings as they allow for joint modelling of a repeatedly measured outcome alongside the risk of having an event of interest. Rather than using the observations of the TVC, the joint model considers the repeatedly measured TVC as the result of a longitudinal process subject to its own model (marital satisfaction, Figure 1: upper part). This longitudinal process is combined with the related time-to-event process (risk of marriage dissolution, Figure 1: lower part).
The two submodels are described for the example in Figure 1 before merging them to the joint model. For a more general notation we refer the reader to Hickey et al. (2016).
First, the time-varying covariate is modeled via an appropriate model, in general a linear mixed model (LMM)2, allowing for intra-individual variance along the time axis captured by random intercepts (b0i) and possibly random slopes b1i (commonly used on time as a covariate). The model corresponding to Figure 1 can be written as
1
with εi(t) ∼ N (0, σ2) where i indicates the individual and t the time point of the measurement. In vectorized form the model can be rewritten as
2
where is a row vector with all covariate values and a leading 1 for the intercept for person i at time t and zi(t)′ = (1 t)′ holds the covariate values of the random effects, in this case a random intercept and a random slope on time t. Using this model allows to incorporate covariates as predictors of the estimated values of the TVC yi(t). The covariates may be time-constant or time-varying with β being the regression coefficient vector. As in a classical LMM, the random intercepts and random slopes are assumed to stem from a multivariate normal distribution bi ∼ (0, Q).
The second related process is a time-to-event model,3 which is used to model the risk of having an event over time. The general form is a proportional hazards model, which consists of a baseline hazard h0(t) scaled by a covariate part. The corresponding equation to Figure 1 is given by
3
It can be rewritten in vectorized form as
4
The modeled hazard function hi(t) states the instantaneous risk of person i of having an event at time t (i.e., have a marriage dissolution). This model also contains covariates (Figure 1: Covariates 3–5), which may be exogenous time-varying or time-constant and a vector of coefficients γ.
In order to take marital satisfaction as an endogeneous covariate into the model, a joint model links the estimated value of the TVC process mi(t) to the time-to-event model by incorporating it as a predictor and thus estimates their coefficients jointly:
5
The coefficient α is called the association parameter. In contrast to the two-stage approach all coefficients (β, b, γ, α) are estimated simultaneously such that all uncertainty is included in the estimation procedure. Estimating the value of marital satisfaction involves all available observations of the person, such that the hazard in a joint model at this time does implicitly also depend on the covariate history Mi(t).
By including a covariate in both submodels, e.g., xi3(t) in Equation (5), its direct and indirect effect on the risk of having an event can be separated. Thinking of a variable which has a strong influence on the TVC and further a smaller but significant impact on the survival: By estimating a unique coefficient as well as a coefficient and the association parameter we decompose the total effect via: . Hereby indicates the mediated effect of the covariate via the trajectory and represents the direct effect on the risk of having an event. This decomposition is especially helpful to understand the effect pathway of the respective covariate.
Estimation of the coefficients can be done using a Maximum Likelihood approach (Expectation-Maximization Algorithm) or Bayesian methods (Markov-Chain Monte-Carlo sampling). The different estimation strategies for joint models are presented and compared by Rappl et al. (2021). Joint models for longitudinal and time-to-event data are implemented in several R packages (Philipson et al., 2018; Rizopoulos, 2010) and are also available in STATA (Crowther, 2020; Crowther et al., 2013). Furthermore, joint models allow for individual-specific predictions as they control for individual characteristics via the random effects in the longitudinal model.
A Different Perspective on Joint Models
In the previous section, the joint model was presented with particular emphasis on the time-to-event submodel. However, there is also a body of literature focusing on the longitudinal submodel where the time-to-event submodel is used to model informative censoring from the longitudinal study (Asar et al., 2015; Hogan & Laird, 1997; Vonesh et al., 2005; Wu & Carroll, 1988). The utilisation of a joint model may be advantageous for example in situations involving group comparisons of trajectories in the presence of censoring or missing data mechanisms related to the trajectories. One approach to account for the informative censoring is to utilise an association structure based on the random effects of the longitudinal submodel to be included in the survival submodel. This strand of literature refers to joint models as shared parameter models and contributed to the strand of literature on misspecified regression models, e.g., ignored measurement error in covariates or ignored random effects (see e.g., Bound et al., 2001; Molenberghs & Verbeke, 2001).
Application: Marriage Satisfaction and Time to Marriage Dissolution
In order to demonstrate the use of joint models in social sciences, the relationship between satisfaction with the marriage and the time to marriage dissolution is investigated. There is a huge body of literature in the field of marital satisfaction, predictors of marriage dissolution/divorce and interrelations of the two. Some studies focus on questions of general development of marital satisfaction throughout the marriage (e.g., Lorber et al., 2014; Williamson & Lavner, 2019), others investigate predictors for marital satisfaction (e.g., Elmslie & Tebaldi, 2014; Huss & Pollmann-Schult, 2019). Some include marital satisfaction as a mediator between the risk of marriage dissolution and other effects in regression models, e.g., personality traits (Solomon & Jackson, 2014) or household work (Frisco & Williams, 2003). There are cross-sectional and longitudinal studies, with different degrees of exploitation of the longitudinal structure (two time-points vs. whole trajectory) of the data. Different methods were applied to investigate the effect of marital satisfaction on marriage dissolution. The latent growth curve approach of Lorber et al. (2014) for example indicates that the trajectory of marital satisfaction throughout the period of marriage should be modelled individual-specific. Most empirical studies fit separate models to male and female respondents, since the determinants of marriage dissolution and marital satisfaction differ between sexes. For a review of theoretical models regarding marital satisfaction evolution see Caughlin and Huston (2006). Following the large body of literature, we chose the most common covariates for marital satisfaction; our selection largely matches the findings of the meta analysis of Karney and Bradbury (1995).
We would like to highlight the paper of Frisco and Williams (2003) as it analyses the relationship between the two outcomes of interest in a regression. Their focus is to determine the influence on household work (in)equity on the odds of divorce and the possible mediating effect of marital satisfaction. Without considering individual trajectories, they find a small mediating effect of marital satisfaction but still state a significant direct positive effect of unfair high workload of household work on the odds of divorce eight years later for women. The study is based on measurements at two points in time.
To the best of our knowledge, so far no one has used a joint model for longitudinal and time-to-event analysis to investigate the relationship between marital satisfaction and time to marriage dissolution yet. As this model type allows to exploit the whole richness of data, i.e., the longitudinal character of the data as well as the information of timing of an event, we believe that it is highly suitable to generate more well-founded answers to the questions: “What determines marital satisfaction?”, “What determines marriage dissolution?” as well as “How does marital satisfaction mediate influences on the hazard of marriage dissolution?” with respect to the joint evolution of both processes over time.
Data Set
The analysis is based on the German pairfam data (Brüderl et al., 2023). Pairfam (“Panel Analysis of Intimate Relationships and Family Dynamics”) is a longitudinal study with 14 annual waves contributing to shed light to changes in family and relationship structures. It started in 2008 with over 12,000 respondents. Another sample of 1,489 East-German anchor persons (“DemoDiff”) is merged as a supplementary to the data base. A detailed description of the pairfam study can be found in Huinink et al. (2011). The relationship biographies of the respondents as well as the annual questionnaire about the satisfaction with the relationship can be used to build a joint model.
The final sample consists of all persons in the pairfam data, who were married in their first marriage over the course of at least three interviews. This restriction has been made due to two reasons: First, some measurements of the longitudinal variable on marriage satisfaction are needed for proper analysis. Second, using only the first marriage of a person is based on previous empirical findings that relationship stability differs between the first and following marriages and that a selection bias may be present (Jensen et al., 2016), which might also skew the results of the model. This leaves us with a final sample size of N = 3,616 persons/marriages of which 247 (≈ 7%) stated an end of this relationship during the observation period (number of events). We did not take the actual month of divorce as event time but the stated end of relationship (marriage dissolution).
Marital satisfaction is measured as the answer on an 11-point scale to the question “All in all, how satisfied are you with your relationship?” Some exemplary trajectories of marital satisfaction are depicted in Figure 3. Note that marriages which started before the first interview such as the person in Panel C in Figure 3, are not left censored for the time-to-event model since we know the start of their marriage. They just start at a different point in time with time-dependent information. In order to allow users to reproduce the analysis, a synthesized data set can be found in the web supplementary material at Potts et al. (2026).4
Figure 3
Example Trajectories of Relationship Satisfaction From Pairfam Participants
Note. Vertical dashed lines indicate the time of marriage dissolution.
Table 1 summarises the time-constant variables used from the pairfam data that are included in the models. Relationship duration at marriage, age at marriage and premarital cohabitation were included as time-constant covariates and years of education, personal net income, amount of household work, labor force status, children (presence of preschool child and number of children under 18 in the respondents household) and gender role attitudes are included as time-dependent exogenous variables. The variable on the division of household work is a weighted sum index resulting from five survey items. They are measured on a 5-point scale with endpoints indicating that the respondent does all the work (high values) or the partner does all household work (low values) and thus is a relative measure between the spouses. Further, gender role attitudes is also a sum index over three items. Note that all metric variables were z-score standardized for model estimation purposes and the scale of the time is changed from months to vary in the interval [0, 1] with 1 being the overall latest time-point measured in a marriage in the sample. Further note that for some of the covariates labelled as exogenous, the exogeneity assumption can also be called into question, as they may also respond to couple-separation (e.g., labor market status, number of children). However, for the sake of simplicity, we focus on one single endogenous covariate in this tutorial (for possible extensions see the Summary & Conclusion section).
Table 1
Description of the Final Sample
| Descriptive Statistics | ||
|---|---|---|
| Variable | Women | Men |
| Number of persons | 2,141 | 1,475 |
| Number of events | 161 (7.5%) | 86 (5.8%) |
| Number of observations per person | 7.6 (6) | 7.5 (7) |
| Age at marriage | 27.0 (26) | 29.2 (29) |
| Relationship duration at marriage in months | 64.2 (54) | 64.9 (54) |
| Premarital cohabitation | Yes: 1800 (84%) | Yes: 1235 (84%) |
| No: 341 (16%) | No: 240 (16%) | |
Note. For the covariates means (medians) of non-standardized time-constant variables are presented.
Model Specification
Based on the reviewed literature on marital satisfaction and marital dissolution, two joint models will be fitted separately for men and women. They are identical in terms of included covariates, method and association structure.
The longitudinal model on satisfaction with the marriage will be modelled by an LMM including a random intercept and a random slope term for the duration of marriage (t). We included a random slope since intercepts and slopes differ between individuals (see Figure 3 for some example trajectories). The above mentioned variables are taken into account as covariates to model the longitudinal variable properly. Hereby, i is an indicator for the person and j denotes the measurement at time-point j:
6
The time-to-event model on time to marriage dissolution will be modelled jointly with the longitudinal model from Equation (6), where the estimated values of satisfaction explain the risk of marriage dissolution. Specifically, the time-to-event model includes a B-Spline approximation of the baseline hazard h0(t).
Equation (7) shows the final representation of our main joint model on marital satisfaction and the risk of marriage dissolution.
7
Implementation
For implementation of the above model we use the package (Rizopoulos, 2010) for (Version 4.4.0) (R Core Team, 2024). combines two models that are built with their specific packages. The longitudinal model is constructed using from the package (Pinheiro et al., 2023) and therefore requires the usual data structure in long format, where each individual spans several rows corresponding to the observation time points, each holding the covariate value of the time point, respectively.
The time-to-event model is fitted using from (Therneau, 2024). The structure of the underlying data set is equivalent to the long format start-stop-event logic when further exogenous TVCs are used, i.e., several rows per individual, indicating the current measured values. In case of lack of other exogenous TVCs in the model, the data set for the time-to-event model reduces to one row per individual and supplements the long format data set for the longitudinal submodel (see Rizopoulos (2012)). Since our model includes other exogenous TVCs (e.g., labor force status) one single data set in the classical start-stop-event logic is used.
Both models serve as inputs for the final command. For further information on the (optional) arguments in the function, we refer to Rizopoulos (2010). -code and a synthesized data set to replicate the example can be found in the web appendix of this paper (see Potts et al., 2026).
# time-to-event model
modsurv_female <- coxph(Surv(time = t1, time2 = t2, event = status)~
yeduc + ageatm + preschoolchild + nchild + premarcohab +
sw_weight + incnet + relduratmar + lfs_rec + genderatt_s,
data = df_female, x = TRUE, model=T, cluster = id)
# longitudinal model
modlong_female <- lme(sat31 ~ t + I(t^2) + sw_weight + ageatm +
preschoolchild + nchild + premarcohab + yeduc + incnet +
relduratmar + lfs_rec + genderatt_s,
data = df_female, random = ~ t | id)
# joint model for longitudinal and time-to-event data
modjoint_female <- jointModel(modlong_female, modsurv_female,
timeVar = "t", method = "spline-PH-GH",
control = list(verbose=T, iter.EM=100))
Estimation Results
Separate models were fitted for men and women. This section starts with the estimation results of the joint model for women.
The longitudinal submodel estimates the relationship between the covariates and marital satisfaction (outcome) (Table 2, left side). The model results in a U-shaped effect of time. The number of years of education shows a positive, significant effect on marital satisfaction. Some other covariates show negative, linear effects on satisfaction with the relationship: higher age at marriage and higher personal net income are associated with a significantly lower marital satisfaction. Women with children (compared to childless women) also reveal lower values of satisfaction. There is even an additional negative effect if there are preschool children present in the household. The index for the division of household work reveals a negative and statistically significant effect, i.e., women who stated to do more household work are less satisfied with their relationship during marriage. The relationship duration at time of marriage and the gender role attitudes show insignificant coefficients in this model. Next, we examine the time-to-event submodel for women (Table 2, right side) including the properly modelled endogenous variable marital satisfaction. This model indicates which variables still have a direct effect on the risk of marital dissolution when controlling for marital satisfaction. Looking at the association parameter (last row in Table 2), we observe the expected strong negative effect of marital satisfaction on the risk of marriage dissolution: the higher the current value of satisfaction with the relationship, the lower the risk of marriage dissolution. Besides this relationship, there are only few significant effects in the time-to-event submodel. Higher educated women and women that were in a long relationship with their married partner before marriage have a lower risk of marriage dissolution. As an example for the decomposition of effects, we focus on the variable of relative household work in the following. In contrast to other research results (e.g., Frisco & Williams, 2003)5, the relative load of household work done by a person has no direct effect for women on their risk of marriage dissolution. There still remains the indirect effect via the mediator of marital satisfaction: a higher share of household work done by the female respondent results in a significantly lower marital satisfaction which results in significantly higher risk of marriage dissolution. The estimated total effect of the index on the divison of household work on the risk of marriage dissolution adds up to −0.5552 × −0.1369 + 0.1089 ≈ 0.1849.
Table 2
Model for Women: Regression Coefficients of the Joint Model for Longitudinal and Time-to-Event Data
| Longitudinal submodel | Time-to-event submodel | |||||
|---|---|---|---|---|---|---|
| (Marital satisfaction) | (Risk of marriage dissolution) | |||||
| Variable | Estimate | Std. err. | p-value | Estimate | Std. err. | p-value |
| (Intercept) | 8.7403 | 0.1054 | 0.0000 | |||
| Time | -3.0387 | 0.4023 | 0.0000 | |||
| Time2 | 2.0987 | 0.5022 | 0.0000 | |||
| Relative load of household work | -0.1369 | 0.0191 | 0.0000 | 0.1089 | 0.0793 | 0.1694 |
| Premarital cohabitationa : yes | -0.1036 | 0.0788 | 0.1885 | 0.1297 | 0.2225 | 0.5599 |
| Age at marriage | -0.1820 | 0.0361 | 0.0000 | -0.1546 | 0.1070 | 0.1483 |
| Preschool child(ren) in hha : yes | -0.0693 | 0.0412 | 0.0924 | -0.3286 | 0.2121 | 0.1213 |
| Number of children in hhb : 1 | -0.2520 | 0.0648 | 0.0001 | 0.0893 | 0.2969 | 0.7637 |
| Number of children in hhb : 2 | -0.2594 | 0.0729 | 0.0004 | 0.0508 | 0.3082 | 0.8691 |
| Number of children in hhb : more | -0.2182 | 0.0921 | 0.0178 | 0.0799 | 0.3649 | 0.8266 |
| Years of education | 0.0672 | 0.0310 | 0.0299 | -0.2228 | 0.0969 | 0.0214 |
| Personal net income | -0.0457 | 0.0248 | 0.0651 | -0.0230 | 0.1462 | 0.8750 |
| Relationship duration at marriage | 0.0183 | 0.0340 | 0.5910 | -0.2570 | 0.1047 | 0.0140 |
| Labor force statusc : not working | 0.0474 | 0.0625 | 0.4479 | -0.3514 | 0.3051 | 0.2494 |
| Labor force statusc : other | -0.0577 | 0.0677 | 0.3941 | -0.0238 | 0.2830 | 0.9329 |
| Labor force statusc : part-time employed | -0.0170 | 0.0555 | 0.7594 | -0.1381 | 0.2422 | 0.5685 |
| Gender role attitudes | -0.0082 | 0.0219 | 0.7069 | 0.0819 | 0.0896 | 0.3606 |
| Satisfaction () | -0.5552 | 0.0551 | 0.0000 | |||
Note. Reference categories: a no, b zero, c full-time employed.
For men, the results reveal interesting differences in both submodels (Table 3): Regarding the marital satisfaction (longitudinal model ), socio-economic variables such as education as well as personal net income do not show significant effects. Similarly to the model for women, children decrease the relationship satisfaction in the marriage compared to childless men and the effect size is larger than for women. In contrast to the model for women, there is no additional significant, negative effect of preschool children. Even though the labor force status shows an influence on the marital satisfaction for men, these results have to be interpreted with caution, as over 80% of the person periods for men indicate a full-time employment. The relative load of household work has a smaller effect on marital satisfaction and is also negative and statistically significant.
Table 3
Model for Men: Regression Coefficients of the Joint Model for Longitudinal and Time-to-Event Data
| Longitudinal Submodel | Time-to-Event Submodel | |||||
|---|---|---|---|---|---|---|
| (Marital Satisfaction) | (Risk of Marriage Dissolution) | |||||
| Variable | Estimate | Std. err. | p-value | Estimate | Std. err. | p-value |
| (Intercept) | 9.0393 | 0.1233 | 0.0000 | |||
| Time | -2.8529 | 0.4976 | 0.0000 | |||
| Time2 | 1.5018 | 0.7095 | 0.0343 | |||
| Relative load of household work | -0.0603 | 0.0267 | 0.0237 | 0.2413 | 0.1312 | 0.0660 |
| Premarital cohabitationa : yes | -0.2385 | 0.1052 | 0.0234 | -0.3357 | 0.3020 | 0.2663 |
| Age at marriage | -0.1686 | 0.0417 | 0.0001 | 0.1291 | 0.1383 | 0.3508 |
| Preschool child(ren) in hha : yes | 0.0249 | 0.0507 | 0.6238 | 0.0196 | 0.2806 | 0.9442 |
| Number of children in hhb : 1 | -0.3302 | 0.0790 | 0.0000 | -0.1158 | 0.4041 | 0.7744 |
| Number of children in hhb : 2 | -0.3910 | 0.0901 | 0.0000 | 0.3430 | 0.3825 | 0.3699 |
| Number of children in hhb : more | -0.3562 | 0.1152 | 0.0020 | 0.3927 | 0.4633 | 0.3967 |
| Years of education | 0.0341 | 0.0393 | 0.3864 | -0.1266 | 0.1232 | 0.3043 |
| Personal net income | 0.0017 | 0.0192 | 0.9307 | -0.0481 | 0.1647 | 0.7703 |
| Relationship duration at marriage | -0.0015 | 0.0420 | 0.9708 | -0.0709 | 0.1208 | 0.5573 |
| Labor force statusc : not working | -0.2574 | 0.0977 | 0.0085 | 0.0613 | 0.4931 | 0.9012 |
| Labor force statusc : other | -0.3296 | 0.0927 | 0.0004 | -0.0242 | 0.3599 | 0.9465 |
| Labor force statusc : part-time employed | -0.2288 | 0.1191 | 0.0548 | -1.1292 | 1.0137 | 0.2653 |
| Gender role attitudes | 0.0281 | 0.0263 | 0.2846 | 0.1294 | 0.1162 | 0.2654 |
| Satisfaction () | -0.4534 | 0.0702 | 0.0000 | |||
Note. Reference categories: a no, b zero, c full-time employed.
The estimated association parameter in the time-to-event submodel is also significant but the effect size is smaller compared to the model for women. In other words, the estimated marital satisfaction appears less predictive for the risk of marriage dissolution for men than for women. Another difference between the sexes in the submodel regarding the risk of marriage dissolution is the effect of the division of household work: This covariate has a positive and statistically significant effect on the risk of marriage dissolution in the model on men (p = 0.0660). In contrast to the model for women, the indirect mediated effect is supplemented by a direct effect of household work on the risk of marriage dissolution. The estimated total effect of this predictor variable on the risk of marriage dissolution can be derived as −0.4534 × −0.0603 + 0.2413 ≈ 0.2686.
Besides the decomposition of interpretable covariate effects, another strength of a joint model is the option to predict individual probabilities to still be in the relationship after the last measurement of a person. This may be useful for intervention planning in different research questions on the micro-level. In contrast to a classical TVC approach, these predictions are based on the whole estimated longitudinal trajectory of marriage satisfaction and do not only rely on the last measured value. The orange line in Figure 4 shows the predictions of a fictional part-time working woman, who lived together with her partner before marriage. She lives with two children, at least one being a preschool child and shows median values (for women) for the other covariates. The only part that varies between the four plots is her trajectory of marital satisfaction indicated by the crosses left to the dashed line showing the time of the last measurement. Looking at a certain point of time in the future (black solid line), the different trajectories result in different predicted probabilities to still be in the marriage, even though each trajectory ends with the same last measurement of a 4 on the
Figure 4
Predicted Probability of Still Being in the Marriage for a Person Varying Only the Marital Satisfaction Trajectory
Note. Covariate values: part-time working, 2 children, at least one preschool child, premarital cohabitation, median values (for women) for the other covariates.
11-point scale. This feature also allows to update the prediction with every newly obtained measurement on a person such that a researcher can trace the development dynamically (see Appendix A.4).
Comparison to Other Modelling Approaches
In this section, we compare the estimation results, i.e., coefficients, standard errors, predictive performance (Mean Squared Error), of the joint model with two other modelling approaches (TVC approach in a Cox-proportional hazards model and a two-stage model). Comparison tables of the estimation results can be found in Appendix A.1.
A classical TVC approach in a time-to-event model 6 would heavily underestimate the effect of marital satisfaction on the risk of marriage dissolution (e.g., women: TVC approach: −0.3083, joint model: −0.5552). This may result from the fact that the risk of marriage dissolution is not only dependent on the observed current value but due to the estimation routine on the whole trajectory of the marriage satisfaction until the point in time. Modelling marital satisfaction as an endogenous covariate results in some differences regarding the other covariates in the time-to-event submodel as well. For example, the effect of the division of household work for women on the risk of marriage dissolution is overestimated in a classical TVC model and the standard error is underestimated which leads to a smaller p-value. These differences highlight the importance of the correct model choice when dealing with endogenous covariates in time-to-event models.
In our example the differences between a two-stage model and a joint model are only small and the tendency to underestimate the uncertainty of the satisfaction variable can only be found in the model for women. The regression coefficients for marital satisfaction differ only slightly with −0.5347 in the two-stage model and −0.5552 in the joint model, and the standard error of 0.0530 in the two-stage model is smaller than the standard error of 0.0551 of the presented joint model (men: two-stage: −0.4240, s.e. = 0.0708; joint model: −0.4534, s.e. = 0.0702).
Focussing on the longitudinal submodel, we compare the standalone longitudinal model (mixed model fitted with and the longitudinal submodel of a joint model which controls for the non-random drop-out due to marriage dissolution. There are only small differences in coefficients and standard errors between the two models. However, in the joint model for women the effect of time of marriage has a more pronounced U-shape i.e., larger absolute coefficients for the linear and quadratic term. In the model for men, modelling the non-random drop-out by marriage dissolution leads to a change in the size and significance of the coefficient of premarital cohabitation. Note that the differences between the longitudinal model approaches may be larger in other applications when a larger share of events (higher number of drop-outs, i.e., more marriage dissolutions) is present.
We further evaluated the predictive performance of the three modelling approaches via predicting the event probability for several time points after the last individual measurement of marital satisfaction. Figure A.1 shows the mean squared error (MSE) of the models at specific points in time and highlights the advantages of a joint model regarding predictions in the longrun. While the MSE is smallest for the TVC approach when predicting the event outcome up until six months after the last measurement, the MSE for the joint model is smaller when looking at later times and outperforms the two competing models after ten months.
Summary & Conclusion
This tutorial paper aimed to introduce the method of a joint model for longitudinal and time-to-event data in the field of social science research. We demonstrate the suitability and added value of answering research questions with endogenous covariates in an application on marital satisfaction and marriage dissolution. Based on the pairfam data, our results indicate that the effect of marital satisfaction on the risk of marriage dissolution is larger than a Cox model with TVC suggests. The strength of the decomposition of effects has been demonstrated and shows for our sample, e.g., that the relative load of household work in a marriage has no direct effect on the risk of marital dissolution for women but for men and a strong indirect effect via marital satisfaction for both sexes. We believe that this model class is a useful tool in social science research and hope to contribute to its increasing usage.
For the sake of illustration, this tutorial paper kept the modelling structure as simple as possible such that several extensions of the same data example may arise. A first option is the consideration of a different linkage between the longitudinal and the time-to-event model, as there exist several options of association structures. Using the current value of the longitudinal model mi(t), as presented in our main joint model in Equation (5), associates the predicted value at each time-point with the hazard function at the same time-point. One can also think of the slope of the estimated trajectory of the TVC (marital satisfaction) to be important for the hazard function. An increase or decrease in marital satisfaction independent of the actual level may influence the risk of marriage dissolution. The options can be combined. Note that the effect size of the slope association structure depends on the units of time, whereas the current value does not. Thus, the estimates of the association structures cannot be compared in their effect size. Both options can be used in lagged versions as well. We therefore tested the joint model with different association structures (current slope, current value and current slope, cumulative effect, lagged effect, see Appendix A.3). Model choice criteria suggest to favor the current value association with a lag over the other association structures for men whereas the current value and current slope association is the favored model for women. Another common approach for the association structure is the usage of estimated random effects of the longitudinal submodel of each person and link them to their survival. The interpretation of the respective association parameter does not depend on time, since random intercept and random slope do not depend on time by default. For an overview of association structures see Cremers et al. (2021).
In terms of model set-up, the variable of marital satisfaction is not perfectly normally distributed and another outcome distribution modelled via a Generalized Linear Mixed Model may be a more appropriate choice for the longitudinal submodel. This can be included using the package (Rizopoulos et al., 2024) in R. Furthermore, the individual-specific effect of time in the LMM could be modelled non-linearly with semi-parametric methods via splines. Combining statistical modelling with machine learning methods for variable selection may be useful. A model-based boosting algorithm for joint models has been developed by Griesbach et al. (2023). Another extension is the use of a competing risks model in the survival part, which might be especially useful to account for different types of events or different reasons of dropout and is therefore appealing when focus is on the longitudinal submodel as well. For an extensive overview on recent developments in the joint modelling literature see Papageorgiou et al. (2019).
Regarding the content level, several extensions are conceivable: In order to exploit the data richness of pairfam even further, couple-level data analysis might give additional insights (Hickey et al., 2016; Ruppanner et al., 2017). This is a strong limitation of the performed analysis, since the occurrence of marriage dissolution and its timing are assumed to be influenced only by the satisfaction level of one partner in the marriage. The predictive performance might be improved using variables of the partner in the model as well. In addition, one could rethink the exogeneity assumption of the other time-varying covariates in the time-to-event model and also model them as endogeneous in a joint model, e.g., regarding the potential anticipation effect of divorce and its connection to working behaviour for women (Poortman, 2005).
This is an open access article distributed under the terms of the