Haupt et al. (2014) point out a gap between modern statistical methods and routine practice in applied psychological research. They argue that robust statistical methods are not widely used, and that analysis of variance (ANOVA) or ordinary least squares (OLS) regression are still employed frequently, even when the underlying parametric modelling assumptions are not met (e.g., in the presence of heteroscedasticity, outliers in the response, or other forms of nonnormality like skewness or multimodality).
This is also the case when imputing missing values in a dataset, where creating multiple imputations (MI) of each missing value based on Rubin’s (1987) theory is one of the stateoftheart approaches (cf. Schafer & Graham, 2002). Most of the currently available MI procedures involve strong parametric assumptions, which are often not met by empirical data in various fields of the social sciences: Haupt et al. (2014) discuss methods to account for outliers in the response and heteroskedasticity in their analysis of risk factors for corporal punishment of children by their fathers; Kleinke, Reinecke, and Weins (2020) analyse highly skewed longitudinal data on juvenile delinquency with missing values and compare modelling results based on normal modelbased MI to results based on imputation methods, which rely on more suitable distributional assumptions. Further examples are response times in psychological testing (e.g., Ranger & Wolgast, 2019) – where empirical distributions are typically not normal and often difficult to parameterise – or research on substance abuse and (behavioural) addiction (e.g., Chodkiewicz et al., 2020; Sugiyama et al., 2020), which commonly involves skewed empirical distributions. Though, fully parametric normal modelbased MI is still employed (e.g., Wayman, 2002), approaches beyond these methods are often more suitable. An argument frequently made in applied research is that MI is “robust” in general. See the discussion regarding the selfcorrecting property of MI (i.e., masking bias in the model of scientific interest or analysis model due to violated assumptions by overestimated standard errors) in Kleinke, Reinecke, Salfrán et al. (2020, Chapter 4.5). However, this is true only to a certain extent. Unfortunately, even imputation methods, which are believed to be robust against mild to moderate violations of parametric assumptions like predictive mean matching (PMM) for example, tend to produce biased inferences, when empirical data are highly skewed (Kleinke, 2017; Yu et al., 2007). Yu et al. (2007) recommend that the use of imputation methods with misspecified distributional assumptions should be avoided, when the given empirical data deviate too strongly from these assumptions. Alternatives to normal modelbased MI are quantile regression (QR)based MI (Geraci, 2016a, 2016b, 2017) and MI of missing values based on (semiparametric) generalized additive models for location, scale, and shape (GAMLSS; see e.g., Rigby & Stasinopoulos, 2005; Stasinopoulos & Rigby, 2007; for an implementation in R, see Salfrán & Spiess, 2018).
The two approaches provide flexibility in modelling parametric distributions (GAMLSSbased MI) – location, scale and shape of the distribution, to be more specific – or avoid parametric distributions (QRbased MI) and focus on modelling conditional quantiles of interest directly (Koenker, 2005; Koenker & Hallock, 2001). Despite their obvious advantages over normal modelbased MI, systematic evaluations of QR and GAMLSSbased MI are still scarce. Notable exceptions are Bottai and Zhen (2013) and Geraci and McLain (2018). The former paper investigated different sample sizes and error distributions and found that QRbased MI was superior to standard imputation methods like PMM. The latter paper examined a QRbased MI approach for bounded data, and found superior performance compared to PMM. Little is known, yet about QRbased MI in the case of discrete data (see the discussion in Bottai & Zhen, 2013, regarding QRbased MI of count data) or quasicontinuous data: In the social sciences, scale scores are often derived from ordinal data (e.g., the mean of Likert scale items). Resulting scale scores are often treated as quasicontinuous and analysed by methods for continuous data.
This paper evaluates QRbased MI in a setting common in the social sciences, where empirical data are frequently not genuinely continuous, normal, and homoscedastic and often contain outliers in the response. Our contribution is twofold: First, we provide a Monte Carlo simulation in which we compare QR and GAMLSSbased MI to normal modelbased MI and PMM. Second, we replicate and extend the empirical analysis of the risk factors for corporal punishment of children by their fathers in Haupt et al. (2014) and Fritsch et al. (2019) with QRbased MI. In both papers, the results were based on the complete cases only and the response was derived from Likert scale items.
The paper is structured as follows: First, we briefly introduce the theoretical background of MI and QR. We then introduce the QR and GAMLSSbased MI approaches. In Section 3, we present and discuss the results from our Monte Carlo simulation. In Section 4, we replicate the results in Haupt et al. (2014) and Fritsch et al. (2019) based on QRbased MI and contrast our results with the findings for the complete cases. A general discussion concludes.
Theoretical Background
Multiple Imputation
Multiple imputation based on Rubin’s (1987) theory is a threestep process: First, each missing value is imputed multiple times by plausible values, so that researchers end up with $M$ complete datasets. Plausible in this context refers to any value that leads to unbiased statistical inferences.^{1} Second, these $M$ datasets are analysed separately by any complete data technique such as OLS regression, or QR and yield $M$ sets of modelling results. In the final step, the $M$ sets of modelling results are combined into an overall result based on combination rules such as the largesample formula given in Rubin (1987) or adaptations for small samples (Barnard & Rubin, 1999). For a detailed discussions, see Kleinke, Reinecke, Salfrán et al. (2020, Chapter 4). The combination formulae yield point estimates and standard errors for each model parameter ${\mathit{\theta}}_{p}$, $p=1,\dots ,P$. The combined point estimate is simply the average of the $M$ estimates. Corresponding standard errors are obtained by combining a withinimputation and a betweenimputation variance component. The latter component reflects the amount of missing information with respect to ${\mathit{\theta}}_{p}$: Fewer information to predict the unobserved values leads to higher uncertainty in the imputations and a larger between variance. Note that if the missing values could be predicted without any uncertainty, the estimator ${\widehat{\mathit{\theta}}}_{p,m}$ obtained from the $m=1,\dots ,M$ imputed datasets would not vary over the datasets – resulting in a between variance of zero.
More comprehensive introductions to MI and helpful resources for applied researchers are given in: Kleinke, Reinecke, Salfrán et al. (2020), who provide a general introduction and review recent developments and various applications of MI. Chapters 1, 4, and 5 in Little and Rubin (2020) summarize the problem of missing values, missing value patterns and mechanisms; additionally, a taxonomy of methods for imputing missing values and a discussion of the rationale of MI in comparison to other methods to account for uncertainty in parameter estimates due to missing values are given. Applications of MI in various contexts including linear and generalized linear models, categorical data analysis, survival analysis, structural equation modelling, longitudinal data analysis, and complex survey settings are discussed in Raghunathan et al. (2018).
Assumptions
Approaches for MI typically assume that the mechanism generating the missing values in a variable $y$ is ignorable (Rubin, 1976, 1987; see also Kleinke, Reinecke, Salfrán et al., 2020, Chapter 2): Observing or not observing values in $y$ can be seen as the result of a dichotomous process generating a binary response $r$, which obtains the value ‘1’ if the value of the corresponding variable is observed and ‘0’ otherwise. Transitioning from the realized sample to the observed subsample can be interpreted as an additional sampling step. The underlying process, however, is usually unknown during data collection and assumed to be a random process. When the mechanism generating the missing values is ignorable, datasets can be analysed without the need to explicitly model the mechanism. Stated differently, an ignorable mechanism can be assumed when the missing values are generated at random, given all observed information (i.e., the probability of not observing certain values of a variable depends on observed information only). To ensure the credibility of this assumption, applied researchers typically specify rich imputation models and include all variables related to patterns of missing values into the imputation model – even if these variables are omitted from the analysis model (cf. Collins et al., 2001).
How Imputations Are Generated
MI based on Rubin’s (1987) theory can be considered a Bayesian approach: Imputations are generated by drawing from the posterior predictive distribution of the variables whose values are missing given all observed values. From a mathematical point of view, the posterior predictive distribution is usually difficult to approximate, because in most datasets different types of variables are present, for example, continuous, truncated, categorical or count, and missing value patterns are usually not monotone.
A popular approach to handle these situations is the mice (multiple imputation by chained equations) framework (van Buuren, 2018; van Buuren & GroothuisOudshoorn, 2011; van Buuren et al., 2006), where users can specify separate conditional imputation models for each variable with missing values. The models can be specifically tailored to the distribution and type of each variable (e.g., methods norm for a standard linear regression model, logreg for a logistic regression model or polyreg for a polytomous regression model). The software then cycles iteratively through all variables with missing values and imputes them step by step using predictions from a prespecified regression model. Values of all other (observed and imputed) variables are used by the procedure. The approach is implemented in the R package mice (van Buuren & GroothuisOudshoorn, 2011). Extensions of the functionality of the package are available in other packages such as Qtools (Geraci, 2016a, 2016b) for QRbased MI and ImputeRobust (Salfrán & Spiess, 2018) for GAMLSSbased MI. The methodology underlying both approaches is described in the next two subsections.
QR and QRBased MI of Missing Values
The QR model assumes a linear additive functional form for the $\tau $quantiles of the response ${y}_{i}$ conditional on the $P$ predictors ${\mathit{x}}_{i}$, where $i=1,\dots ,n$ are the sample observations. Define the conditional $\tau $ quantile as
1
${Q}_{{y}_{i}\mid {\mathit{x}}_{i}}\left(\tau \right)={\mathit{x}}_{i}^{T}{\mathit{\beta}}_{\tau}\phantom{\rule{2.77695pt}{0ex}},$2
$\sum _{i=1}^{n}}{\rho}_{\tau}\left({y}_{i}{\mathit{x}}_{i}^{T}{\mathit{\beta}}_{\tau}\right),\phantom{\rule{1em}{0ex}}\text{w}\text{i}\text{t}\text{h}\phantom{\rule{1em}{0ex}}{\rho}_{\tau}\left(\cdot \right)=\{\begin{array}{cc}\tau ,\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\hfill & \text{i}\text{f}\phantom{\rule{1em}{0ex}}\left({y}_{i}{\mathit{x}}_{i}^{T}{\mathit{\beta}}_{\tau}\right)\ge 0\hfill \\ \tau 1,\phantom{\rule{1em}{0ex}}\hfill & \text{o}\text{t}\text{h}\text{e}\text{r}\text{wise}\phantom{\rule{2.77695pt}{0ex}},\hfill \end{array$The following data generating process illustrates the exact fit property of QR:
3
${y}_{i}={\mathit{x}}_{i}^{T}{\mathit{\beta}}_{\tau}+{\u03f5}_{\tau ,i}\phantom{\rule{2.77695pt}{0ex}},\phantom{\rule{1em}{0ex}}\text{w}\text{i}\text{t}\text{h}\phantom{\rule{1em}{0ex}}{\u03f5}_{\tau ,i}~{F}_{\tau ,i}\phantom{\rule{1em}{0ex}}\text{a}\text{n}\text{d}\phantom{\rule{1em}{0ex}}{F}_{\tau ,i}^{1}\left(\tau \right)=0\phantom{\rule{2.77695pt}{0ex}},$To illustrate QRbased MI of missing values, denote the response vector by $\mathit{y}$, the matrix of $P$ predictors by $\mathit{X}$, and assume that all variables contain missing values (or incomplete cases). Define the dataset with missing values by $\mathit{U}=\left(\mathit{y},\mathit{X}\right)$, where each of the $i=1,\dots ,n$ lines of the dataset is at least partially observed. The columns of the dataset $\mathit{U}$ are referred to by $q=1,\dots ,Q$, with $Q=P+1$. Note that the total number of observations in the dataset is $N=n\cdot Q$. Denote the total number of missing values as ${N}_{mis}$ and the number of observed values (or complete cases) as ${N}_{obs}=N{N}_{mis}$. The number of missing values in each column of the dataset is referred to as ${n}_{q,mis}$, such that ${N}_{mis}={n}_{1,mis}+\dots +{n}_{Q,mis}$.
We aim to impute the missing values in $\mathit{U}$ and use them in the analysis model. The matrix of imputed values is represented by $\mathit{V}$. To simplify the exposition, assume that $\mathit{U}$ is sorted such that the lines with the observed values are given first and refer to the individual lines via ${\mathit{u}}_{j}$, $j=1,\dots ,J$ ( $J=min\left({n}_{1,obs},\dots ,{n}_{Q,obs}\right)$). The remaining lines with the missing values are denoted by ${\mathit{u}}_{k}$, $k=1,\dots ,K$ ( $K=NJ$). The definition of ${\mathit{v}}_{j}$ and ${\mathit{v}}_{k}$ is analogous.
In the following, we describe function mice.impute.rq from package Qtools for QRbased MI in greater detail. Note that we do not consider approaches restricting regression quantiles to avoid quantile crossing. Based on the defined notation, function mice.impute.rq proceeds according to the Algorithm 1 (Geraci, 2016a).
Algorithm 1
Data: $\mathit{U}=\left(\mathit{y},\mathit{X}\right)$, with $\mathit{X}=\left({\mathit{x}}_{1},\dots ,{\mathit{x}}_{P}\right)$, $Q=P+1$, and ${N}_{mis}$ missing values 
Result: Datasets with imputed values ${\mathit{V}}_{1},\dots ,{\mathit{V}}_{M}$ 
Initialization: Set number of iterations $C$ and number of bootstrap samples $M$; 
for $c=1,\dots ,C$ do 
for $m=1,\dots ,M$ do 
if c=1 then 
assign complete cases ${\mathit{u}}_{j}$ to temporary dataset ${\mathit{D}}_{m,c}$ 
else 
update temporary dataset ${\mathit{D}}_{m,c}$ with imputed dataset from previous iteration 
${\mathit{V}}_{m,c1}$ 
end 
for $q=1,\dots ,Q$ do 
(1) Draw bootstrap sample ${\mathit{B}}_{m,c}$ of size ${n}_{q,mis}$ with replacement from lines of ${\mathit{D}}_{m,c}$; 
(2) Partition ${\mathit{B}}_{m,c}$ into its $q$th column ${\mathit{b}}_{m,c,q}$ and ${\mathit{B}}_{m,c,q}$; 
(3) Draw $K$ random values ${\tau}_{1},\dots ,{\tau}_{K}$ from the uniform distribution $\mathcal{U}\left(\epsilon ,1\epsilon \right)$, with $0<\epsilon <0.5$; 
(4) Estimate the QR ${Q}_{{b}_{i,q}\mid {\mathit{b}}_{i,q}}\left(\tau \right)={\mathit{b}}_{i,q}^{T}\phantom{\rule{.3em}{0ex}}{\mathit{\beta}}_{\tau}$, with $i\in {\mathit{B}}_{m,c}$; 
(5) Use ${\stackrel{\u0303}{\mathit{\beta}}}_{\tau}$ to impute missing values via ${\stackrel{\u0303}{\mathit{u}}}_{i,q}={\mathit{u}}_{i,q}^{T}\phantom{\rule{.3em}{0ex}}{\stackrel{\u0303}{\mathit{\beta}}}_{\tau}$, with $i\in {\mathit{U}}_{K}$; 
end 
assign imputed values for all columns $q=1,\dots ,Q$ to ${\mathit{V}}_{m}$ 
end 
end 
return ${\mathit{V}}_{1},\dots ,{\mathit{V}}_{M}$ 
Algorithm 1 details the steps required for QRbased MI of missing values. First, choose the number of iterations $C$ and the number of bootstrap samples $M$. For bootstrap iteration $m=1,\dots ,M$, proceed as follows: Define a temporary dataset ${\mathit{D}}_{m,c}$; in the first iteration $c=1$, the lines of the dataset without missing values ${\mathit{u}}_{j}$ are used. At subsequent iterations $2,\dots ,C$, the imputed values from the previous iteration ${\mathit{V}}_{m,c1}$ are employed and the following five steps are used on all columns $q=1,\dots ,Q$: In (1), draw a bootstrap sample ${\mathit{B}}_{m,c}$ of size ${n}_{q,mis}$ from the lines of ${\mathit{D}}_{m,c}$ with replacement. In (2), partition ${\mathit{B}}_{m,c}$ into its $q$ th column ${\mathit{b}}_{m,c,q}$ and all other columns ${\mathit{B}}_{m,c,q}$. In (3), draw $K$ random values ${\tau}_{1},\dots ,{\tau}_{K}$ from the uniform distribution $\mathcal{U}\left(\epsilon ,1\epsilon \right)$, with $0<\epsilon <0.5$ (note that the function default is $\epsilon =0.001$). In (4), fit a QR model based on the partitioned bootstrap sample ${Q}_{{b}_{i,q}\mid {\mathit{b}}_{i,q}}\left(\tau \right)={\mathit{b}}_{i,q}^{T}\phantom{\rule{.3em}{0ex}}{\mathit{\beta}}_{\tau}$ based on the bootstrap sample ${\mathit{B}}_{m,c}$. In (5), use the parameter estimates ${\mathit{\beta}}_{\tau}$ from (4) to impute the missing values for each column of the dataset with missing values ${\mathit{U}}_{K}$. Collect the imputed values for all $q=1,\dots ,Q$ columns and assign them to ${\mathit{V}}_{m}$. Then, iterate the five steps until the prespecified number of iterations $C$ is reached.^{3} The algorithm yields $M$ datasets with imputed values ${\mathit{V}}_{1},\dots ,{\mathit{V}}_{M}$.
GAMLSS and GAMLSSBased MI of Missing Values
De Jong (2012) and de Jong et al. (2016) have proposed a MI procedure based on a class of semiparametric GAMLSS (Rigby & Stasinopoulos, 2005; Stasinopoulos & Rigby, 2007), which is implemented in R package ImputeRobust (Salfrán & Spiess, 2018) and extends the functionality of the mice package. Compared to standard normal modelbased MI, which only considers modelling the location of the distribution, GAMLSSbased MI also allows to account for the scale, and the shape of the distribution and can be better tailored to a given empirical dataset.
In the following, we slightly extend the notation from the previous subsection to illustrate GAMLSSbased MI briefly.^{4} Assume that the conditional distribution of the $q$th column ${\mathit{u}}_{q}$ of dataset $\mathit{U}$ given the observed values ${\mathit{u}}_{J,q}$ and the other columns ${\mathit{U}}_{q}$ is
4
${\mathit{u}}_{q}{\mathit{u}}_{J,q},{\mathit{U}}_{q}~\mathcal{D}\left({\mathit{\theta}}^{\left(1\right)},{\mathit{\theta}}^{\left(2\right)},{\mathit{\theta}}^{\left(3\right)},\dots ,{\mathit{\theta}}^{\left(S\right)}\right)$.5
${g}_{q}\left({\mathit{\theta}}_{q}\right)={\mathit{U}}_{q}{\mathit{\gamma}}_{q}+{\displaystyle \sum _{j=1}^{J}}{h}_{j,q}\left({\mathit{u}}_{J,q}\right)\phantom{\rule{2.77695pt}{0ex}},$Since package gamlss provides different approaches to model $h\left(\cdot \right)$, such as cubic smoothing splines, penalized splines, and local regression, the following algorithm provides a general description of GAMLSSbased MI (Salfrán & Spiess, 2018).
Algorithm 2
Data: $\mathit{U}=\left(\mathit{y},\mathit{X}\right)$, with $\mathit{X}=\left({\mathit{x}}_{1},\dots ,{\mathit{x}}_{P}\right)$, $Q=P+1$, and ${N}_{mis}$ missing values; 
assume ${\mathit{u}}_{q}{\mathit{u}}_{J,q},{\mathit{U}}_{q}~\mathcal{D}\left({\widehat{\mathit{\theta}}}_{k,q}^{\left(1\right)},{\mathit{\theta}}_{k,q}^{\left(2\right)},{\mathit{\theta}}_{k,q}^{\left(3\right)},\dots ,{\mathit{\theta}}_{k,q}^{\left(S\right)}\right)$ 
Result: Datasets with imputed values ${\mathit{V}}_{1},\dots ,{\mathit{V}}_{M}$ 
Initialization: Choose number of bootstrap samples $M$ and set $m=1$; 
assign ${\mathit{B}}_{1}={\mathit{U}}_{J}$ and set starting values ${\mathit{\theta}}_{0,k,q}^{\left(1\right)},{\mathit{\theta}}_{0,k,q}^{\left(2\right)},{\mathit{\theta}}_{0,k,q}^{\left(3\right)},\dots ,{\mathit{\theta}}_{0,k,q}^{\left(S\right)}$ ; 
for $q=1,\dots ,Q$ do 
(1) Specify penalized likelihood function ${\mathit{\ell}}_{\text{p}\text{e}\text{n}}$ based on chosen $\mathcal{D}$; 
(2) Maximize ${\mathit{\ell}}_{\text{p}\text{e}\text{n}}$ based on ${\mathit{B}}_{1}$ to obtain parameter estimates 
${\widehat{\mathit{\theta}}}_{1,k,q}^{\left(1\right)},{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(2\right)},{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(3\right)},\dots ,{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(S\right)}$ and fitted values ${\widehat{\mathit{u}}}_{J,q}$; 
for $m=2,\dots ,M$ do 
(3) Draw bootstrap sample ${\mathit{B}}_{m}$ of size ${n}_{q,mis}$ with replacement from $\left({\widehat{\mathit{u}}}_{m1,q},{\mathit{U}}_{J,q}\right)$; 
(4) Maximize ${\mathit{\ell}}_{\text{p}\text{e}\text{n}}$ based on ${\mathit{B}}_{m}$ and update parameter estimates 
${\widehat{\mathit{\theta}}}_{m,k,q}^{\left(1\right)},{\widehat{\mathit{\theta}}}_{m,k,q}^{\left(2\right)},{\widehat{\mathit{\theta}}}_{m,k,q}^{\left(3\right)},\dots ,{\widehat{\mathit{\theta}}}_{m,k,q}^{\left(S\right)}$ and fitted values ${\widehat{\mathit{u}}}_{m,q}$ 
end 
assign fitted values for all columns $q=1,\dots ,Q$ to ${\mathit{V}}_{m}$ ; 
end 
return ${\mathit{V}}_{1},\dots ,{\mathit{V}}_{M}$ 
Algorithm 2 details GAMLSSbased MI of missing values. First, choose the number of bootstrap samples $M$, set $m=1$, use the lines of the dataset without missing values ${\mathit{U}}_{J}$ as initial bootstrap sample ${\mathit{B}}_{1}$, and choose starting values ${\mathit{\theta}}_{0,k,q}^{\left(1\right)},{\mathit{\theta}}_{0,k,q}^{\left(2\right)},{\mathit{\theta}}_{0,k,q}^{\left(3\right)},\dots ,{\mathit{\theta}}_{0,k,q}^{\left(S\right)}$ for the parameters of the distribution $\mathcal{D}$. In (1), the penalized likelihood function ${\mathit{\ell}}_{\text{p}\text{e}\text{n}}$ is specified based on the distribution $\mathcal{D}$. In (2), maximise ${\mathit{\ell}}_{\text{p}\text{e}\text{n}}$ based on ${\mathit{B}}_{1}$ to obtain parameter estimates ${\widehat{\mathit{\theta}}}_{1,k,q}^{\left(1\right)},{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(2\right)},{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(3\right)},\dots ,{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(S\right)}$ and fitted values for column $q$ of the dataset ${\widehat{\mathit{u}}}_{J,q}$. Next, set $m=2$. In (3), the bootstrap sample ${\mathit{B}}_{m}$ is updated by drawing ${n}_{q,mis}$ observations with replacement from the fitted values from the previous bootstrap iteration ${\widehat{\mathit{u}}}_{m1,q}$ and the corresponding observed columns of the dataset besides the $q$th column ${\mathit{U}}_{J,q}$. In (4), maximise ${\mathit{\ell}}_{\text{p}\text{e}\text{n}}$ based on ${\mathit{B}}_{m}$ and update the parameter estimates ${\widehat{\mathit{\theta}}}_{1,k,q}^{\left(1\right)},{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(2\right)},{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(3\right)},\dots ,{\widehat{\mathit{\theta}}}_{1,k,q}^{\left(S\right)}$. Then, repeat (3) and (4) until $m=M$. The algorithm yields $M$ datasets with imputed values ${\mathit{V}}_{1},\dots ,{\mathit{V}}_{M}$.
Imputation Based on PMM
Another approach that was recommended for a wide range of scenarios is PMM (cf. van Buuren, 2018, Chapter 3): The method samples from a pool of observed cases, which are ‘similar’ to the incomplete case. This socalled donor is then used to fill in the corresponding missing value of the donee. The similarity of observations is defined by similarity in predicted means for the observed and unobserved part of the variable based on a linear regression model. Details about the implementation in package mice (function mice.impute.pmm) are given in van Buuren (2018, Chapter 3).
PMM is considered a semiparametric imputation procedure: The parametric linear model is used only as a means for matching the incomplete case to a potential donor. Then, actually observed values are imputed. This preserves many properties of the observed part of the data, which makes PMM robust against mild to moderate violations of distributional assumptions of the normal linear regression model. This was systematically evaluated by Kleinke (2017) for different degrees of skewness, different sample sizes, and varying percentages of missing values. Their results indicated that: 1) especially when sample size was small, small donor pools yielded better results in terms of unbiased parameter estimates and standard errors compared to large donor pools; 2) PMM generally worked well, unless the data were highly skewed and/or more than roughly 20% to 30% of the data were imputed; 3) PMM generally performed better when sample size was large.
Disadvantages of PMM are that the approach can neither extrapolate beyond the range of the observed data, nor interpolate when regions of the data are sparsely populated. Additionally, PMM fails when there are substantial violations of the assumptions of the normal linear regression model.
Monte Carlo Evaluation of QRBased MI
Overview
In this section, we evaluate QRbased MI by Monte Carlo Simulation based on a typical setting in the social sciences: Skewed and heteroskedastic data. The underlying data were taken from Haupt et al. (2014), who investigated risk factors for corporal punishment by fathers with a QR model. We proceed as follows: First, draw 1000 bootstrap samples with replacement from the complete cases and fit a QR model to each of the bootstrap samples. Second, use the average parameter estimates as a proxy for the unknown ‘true’ population parameters. Third, introduce that values are missing at random (MAR) to the bootstrap samples by deleting a certain fraction of values. Finally, evaluate QRbased MI regarding bias in point estimates and coverage rates (CR) of the 95% confidence intervals of the parameters and contrast the results with the complete data estimates and results obtained from other established MI methods (standard normal modelbased MI, PMM, and GAMLSSbased MI).
Hypotheses
The advantages and disadvantages of QR in comparison to GAMLSS are discussed in Kneib (2013) and Rigby et al. (2013). QR, for example, is distributionfree, while GAMLSS introduce model structure and are more parsimonious regarding the data requirements – while also bearing a higher risk of misspecification. To our best knowledge, QR and GAMLSSbased MI have not been compared systematically in Monte Carlo simulations, yet. We expect the preferred method to depend on the specific setting. In our simulation, we expect QR and GAMLSSbased MI to yield comparable results. We assume that both approaches will outperform PMM and standard normal modelbased MI – especially when quantiles farther away from the center of the distribution of the response are considered: PMM and normal modelbased MI create imputations based on predictions for the conditional mean. Thus, both normal modelbased MI and PMM should perform well in the vicinity of the center of the distribution.
Method
Design
Our simulation has a $2\times 2\times 4$ (percentage of missing values $\times $ missing value pattern $\times $ imputation method) design: From the bootstrap samples, we deleted 25% or 40% of values either from the response only, or from the response and one of the predictors following a MAR pattern (for details, see Section “Procedure”). We imputed missing values by QR and GAMLSSbased MI, PMM, and standard normal modelbased MI.
Evaluation Criteria
We evaluated the approaches based on the goal to yield unbiased statistical inference: Bias in the parameter estimate was defined as the difference between the true parameter $\mathit{\theta}$ and the average estimate $\hat{\mathit{\theta}}$ across the bootstrap samples. Bias was reported relative to the true population parameter as
6
$\text{\%Bias}=\frac{\text{B}\text{i}\text{a}\text{s}}{\mathit{\theta}}\cdot 100\%.$Coverage Rate (CR) was defined as the percentage of 95% confidence intervals that included the true population parameter. Schafer and Graham (2002) deem CR below 90% as seriously low. Undercoverage may result from biased point estimates and/or an underestimated variance. The confidence interval could either be too narrow to include the true parameter, and/or it can be shifted too much to the left or to the right to include the true parameter. Together, bias and CR allow to assess the accuracy of statistical inferences.
Sample and Measures
Our simulation is based on a sample taken from the longitudinal ErlangenNuremberg Development and Prevention Study. The data include 609 families (675 preschool children) from the NorthBavarian cities of Erlangen and Nuremberg (for more details, see Lösel et al., 2009).
The response in our model corporal punishment by fathers (CP), as well as two risk factors inconsistent discipline (ID) and other discipline practices (OD) were assessed using a German version of the Alabama Parenting Questionnaire (APQ; Lösel et al., 2003). Items were rated on fivepoint scales ranging from 1 (“never”) to 5 (“always”). Scale scores were obtained by calculating the means of the respective items. The corporal punishment scale was based on three items such as “You spank your child with your hand when he/she has done something wrong”. The empirical distribution of the corporal punishment scale is summarized in Figure 1.^{5} Inconsistent discipline (six items) summarizes parenting behaviors like “The punishment you give your child depends on your mood”. Other discipline practices (seven items) refer to actions such as “You take away privileges or money from your child as a punishment”. A further risk factor of interest was socioeconomic status (SES). The computation of the SES index was based on Geißler (1994). Finally, our model included the child’s gender by the gender indicator (FEMALE).
Figure 1
Procedure
We simulated 1000 bootstrap samples from the original $N=485$ complete cases. Each bootstrap sample had a size corresponding to the original sample size $N=675$. We then introduced MAR patterns (Rubin, 1976). Depending on the respective condition, we deleted either 25% or 40% of the values of the response CP, and/or the predictor OD. Values were deleted using the following rule: For each observation $i$ we computed the probability of a missing value $P{r}_{i}^{mis}$ with the logit model
7
$P{r}_{i}^{mis}=\text{i}\text{n}\text{v}\text{l}\text{o}\text{git}\left(52\cdot \text{S}\text{E}{\text{S}}_{i}\right)$Coefficients in Equation 7 were chosen to ensure probabilities of missing values ranging from about 0.2 to 0.8 – depending on the individual SES scores. In our simulation, participants from lower social classes were assigned larger probabilities of missing values.
Depending on the respective condition, we sampled the necessary number of cases with selection probabilities $P{r}_{i}^{mis}$ and deleted the respective observations from the datasets. An overview of the resulting missing value patterns is displayed in Table 1.
Table 1
Percentage of each pattern  SES  FEMALE  ID  OD  CP 

25% missing values in CP


74.96%  1  1  1  1  1 
25.04%  1  1  1  1  0 
25% missing values in CP and OD


56.54%  1  1  1  1  1 
18.42%  1  1  1  1  0 
18.42%  1  1  1  0  1 
6.61%  1  1  1  0  0 
40% missing values in CP


60.00%  1  1  1  1  1 
40.00%  1  1  1  1  0 
40% missing values in CP and OD


36.71%  1  1  1  1  1 
23.29%  1  1  1  1  0 
23.29%  1  1  1  0  1 
16.71%  1  1  1  0  0 
Note. Observed values denoted by ‘1’, ‘0’ indicates missing value for variables: SES = socioeconomic status; FEMALE = gender indicator; ID = inconstistent discipline; OD = other discipline practices; CP = corporal punishment. Missing values were MAR and based on individuals’ SES scores.
Missing values were imputed with QR and GAMLSSbased MI, PMM, and normal modelbased MI by using functions implemented in packages mice, Qtools, and ImputeRobust. Unless stated otherwise, we used default settings of the functions. All other columns of the dataset were used to predict missing values in a particular column. As common practice (e.g., Kleinke, 2018), the number of imputations was set to the percentage of missing values.
We then compared parameter estimates based on the imputed datasets with the means of the 1000 complete data estimates. We obtained coefficient estimates from a QR model, in which we regressed the response CP on SES, FEMALE, ID, and OD regarding quantiles $\tau \in \left\{.2,.3,.4,.5,.6,.7,.8\right\}$. Heteroscedasticityrobust standard errors were computed as in Haupt et al. (2014).
Results
Figure 2 summarizes the results on bias in the point estimates and displays the average amount of relative absolute bias across all model parameters. Following Forero and Maydeu Olivares (2009), bias was considered substantial if it exceeded 10% of the true population parameter. Overall, bias generally increased: (i) when the response and additionally predictor values were missing; (ii) with the number of missing values.
When considering the different approaches for MI, both PMM and normal modelbased MI had only a small bias in the center part of the distribution of the response (i.e., $\tau \in \left\{.4,.5,.6\right\}$), on average. As expected, bias increased when moving towards the tails of the distribution for both approaches. QR and GAMLSSbased MI produced widely unbiased results across all quantiles and Monte Carlo designs.
Figure 3 displays the corresponding CR. All approaches produced large enough confidence intervals (some even slighly above the nominal $\left(1\alpha \right)$ percent), such that a bias in point estimates was alleviated by a wider interval. CR was sufficiently large across all parameter estimates and quantiles, even when the percentage of missing values was 40%.
Figure 2
Figure 3
Discussion
Normal modelbased MI produced unbiased inferences in the center region of the response. While point estimates in the offcenter regions were biased, CR were nevertheless above 90%. The observed behaviour of normal modelbased MI is due to the socalled selfcorrecting property of MI: A mild to moderate bias in the parameter estimate resulting from the misspecification of the imputation model by assuming normality and homoscedasticity of the error distribution is alleviated by an increase in the corresponding variance estimate. This also lead to accurate CR produced by PMM. This result should be of considerable interest to applied researchers facing datasets with incomplete cases: In practice, valid inference in offcenter regions of the response are often of paramount interest – for example, when investigating risk factors for corporal punishment. Typically, applied researchers are not only interested in a suitable CR, but also in obtaining accurate point estimates regarding the tails of the distribution. To achieve this, both, the imputation and analysis model need to be able to capture the underlying properties of the data generating process. Our simulations show that this is widely the case for QR and GAMLSSbased MI: Valid inference can be obtained from both approaches under different missing value patterns and percentages. Following previous recommendations (cf. Kleinke, Reinecke, Salfrán et al., 2020) that the imputation method and the subsequent analysis model should coincide, we recommend to use QRbased MI when the analysis model is a QR model.
Application
In this section, we compare the findings of the two empirical applications by Haupt et al. (2014) and Fritsch et al. (2019), who used the complete cases only with results obtained by QRbased MI.
Substantive Models
Haupt et al. (2014) considered further predictors in their QR model that we did not include in our simulations above. These were 1) fathers’ aggressiveness (AG), measured by the aggressiveness scale of the FPIR personality questionnaire 2) parental distress (PD), measuring the perceived stress of being a parent, 3) dysfunctional interaction (DI), assessing the parents’ perception that the child does not meet their expectations or that they are manipulated by the child, and 4) the age of the father (AGE). For further details on the predictors, see Haupt et al. (2014). CP was regressed on the aforementioned predictors and heteroscedasticityrobust standard errors were computed (Powell, 1991). Regression coefficients were estimated for quantiles $\tau \in \left\{.2,.3,.4,.5,.6,.7,.8\right\}$.
In addition to Haupt et al. (2014), Fritsch et al. (2019) explored the relevance of further predictors (for an overview and a detailed description, see Table 1 and Section 3 in Fritsch et al., 2019). These variables were age difference of the parents (AGE.DIFF), as well as categorical indicators if the mother or father were older than the other parent respectively, parental involvement, positive parenting, monitoring and supervision by the parents, a measure of the ‘difficulty’ of the child, fathers’ life satisfaction, social orientation, performance orientation, excitability, inhibition, levels of stress, health concerns and physical complaints, candor, extraversion, and emotionality. Fritsch et al. (2019) focused on a predictioncentered approach and modelled the conditional mean of CP. We considered changes in the predictive performance when using the dataset obtained from QRbased MI.
Missing Values
Complete case analysis can be expected to yield unbiased statistical inferences in certain setting only, for example when only values in the response are missing or when the missing values are missing completely at random (MCAR) in the sense of Rubin (1976). MCAR means the probability of a missing value in a variable neither depends on observed information in the dataset (e.g., on SES), nor on unobserved information (e.g., on the unobserved CP values). The MCAR assumption is violated for example, when variables in the dataset have nonzero correlations with the missing value indicators of incompletely observed variables. We computed correlations of each variable with the missing value indicators of other variables in the dataset and found only small correlations between –.16 and .15. Effects of MAR should therefore be rather minor and we expect only marginal differences from the results reported in Haupt et al. (2014).
For the present analysis we included data from 575 children from families where both parents were present at the beginning of the study and for whom we had complete information regarding AGE, AGE.DIFF, and SES. Missing value patterns (e.g., van Buuren & GroothuisOudshoorn, 2011) are displayed in Table 2. About 82% of all cases are complete cases. No answers about CP were available from 79 fathers (about 14%). As many of these fathers also did not answer the other questionnaires (e.g., about parental distress or their aggressiveness), we could use mainly sociodemographic information (e.g., SES and FEMALE) to impute the missing values about CP. For some cases, we could also make use of observed information from the Freiburg Personality Inventory.
Table 2
Frequency  474  48  31  9 

FEMALE  1  1  1  1 
SES  1  1  1  1 
AGE  1  1  1  1 
AGE.DIFF  1  1  1  1 
OLDER.M  1  1  1  1 
OLDER.F  1  1  1  1 
FPI  1  1  0  0 
APQ  1  0  0  1 
PSI  1  0  0  1 
Note. Observed values denoted by ’1’, ’0’ indicates missing value: FEMALE = indicator for child’s gender; SES = socioeconomic status; AGE = fathers’ age; AGE.DIFF = age difference of the parents; OLDER.M and OLDER.F = indicators if mother or father is older than the other parent respectively; FPI = Freiburg Personality Inventory; APQ = Alabama Parenting Questionnaire; PSI = Parenting Stress Index. Column header gives observed frequency of each pattern. Only patterns with observed frequency larger than 5 are shown.
We imputed missing values using package mice. Missing values in the response were imputed by function mice.impute.rq from R package Qtools (Geraci, 2016a, 2016b). All other variables were imputed using mice’s default settings. Imputation models were automatically generated using mice’s default settings (for details, see van Buuren, 2018; van Buuren & Groothuis Oudshoorn, 2011). We then ran 20 iterations of mice’s Gibbs sampler and created a total of $m=100$ imputations. We fitted the same QR model as in Haupt et al. (2014) to the imputed data and used standard combination rules (Barnard & Rubin, 1999; Rubin, 1987) to obtain the multiple imputation inferences.
MI Results
The top part of Table 3 displays point estimates and corresponding standard errors, the bottom part deviations from the complete case results reported in Haupt et al. (2014). Coefficient estimates and standard errors are similar and the qualitative results across the two analysis are identical: Predictors SES, FEMALE, and PD have negative coefficient estimates, which are close to zero for PD for small values of $\tau $. The strength of the absolute effects relative to the standard errors is highest for SES and smaller for FEMALE. For the latter predictor, the relative strength of the absolute effect decreases with increasing $\tau $. For PD, the relative strength of the effect increases with increasing $\tau $ until $\tau =0.5$ and slightly decreases for larger $\tau $. Positive coefficient estimates are obtained for all other predictors. The largest coefficient estimates relative to the standard errors are obtained for DI, ID, and OD. For AG and AGE, coefficient estimates are generally smaller or only slightly larger than the standard errors. This confirms the risk factors for CP identified by Haupt et al. (2014) for the dataset imputed by QRbased MI: Low SES, DI, ID, and OD – while the limitations of the crosssectional analysis and the bidirectional design of the analysis noted by Haupt et al. (2014) remain unchanged.
Table 3
Parameter  $\tau =0.2$  $\tau =0.3$  $\tau =0.4$  $\tau =0.5$  $\tau =0.6$  $\tau =0.7$  $\tau =0.8$ 

MI results  
Intercept  0.804  0.532  0.290  0.076  0.157  0.417  0.711 
(0.056)  (0.055)  (0.055)  (0.055)  (0.056)  (0.06)  (0.062)  
AGE  0.014  0.032  0.041  0.025  0.065  0.076  0.085 
(0.06)  (0.062)  (0.061)  (0.059)  (0.066)  (0.068)  (0.068)  
SES  0.137  0.207  0.239  0.228  0.220  0.233  0.249 
(0.065)  (0.062)  (0.061)  (0.058)  (0.061)  (0.065)  (0.07)  
FEMALE  0.120  0.143  0.125  0.129  0.111  0.115  0.071 
(0.057)  (0.058)  (0.058)  (0.058)  (0.058)  (0.062)  (0.064)  
AG  0.061  0.059  0.051  0.044  0.037  0.051  0.066 
(0.055)  (0.057)  (0.056)  (0.06)  (0.059)  (0.063)  (0.062)  
PD  0.017  0.031  0.103  0.117  0.153  0.148  0.127 
(0.055)  (0.057)  (0.056)  (0.06)  (0.059)  (0.063)  (0.062)  
DI  0.169  0.202  0.273  0.297  0.326  0.353  0.351 
(0.068)  (0.07)  (0.07)  (0.067)  (0.066)  (0.078)  (0.073)  
ID  0.155  0.164  0.156  0.154  0.173  0.172  0.191 
(0.076)  (0.076)  (0.072)  (0.075)  (0.077)  (0.087)  (0.089)  
OD  0.055  0.095  0.135  0.168  0.198  0.199  0.187 
(0.055)  (0.057)  (0.057)  (0.058)  (0.059)  (0.065)  (0.066)  
Deviations from complete case results  
Intercept  0.009  0.009  0.013  0.011  0.007  0.029  0.008 
AGE  0.002  0.008  0.008  0.018  0.054  0.065  0.022 
SES  0.017  0.005  0.001  0.021  0.008  0.005  0.022 
FEMALE  0.009  0.006  0.017  0.005  0.013  0.002  0.003 
AG  0.001  0.009  0.014  0.015  0.016  0.016  0.001 
PD  0.018  0.034  0.008  0.006  0.005  0.006  0.014 
DI  0.026  0.032  0.009  0.012  0.020  0.001  0.010 
ID  0.017  0.021  0.004  0.012  0.007  0.027  0.008 
OD  0.007  0.010  0.006  0.008  0.007  0.014  0.012 
Note. The top part of the table shows MI point estimates and heteroscedasticityrobust standard errors (in parentheses). All model variables were standardized. The bottom part of the table displays deviations from complete case results reported in Haupt et al. (2014) reproduced based on standardized variables.
Additionally, we reproduced the results of the predictioncentered modelling approach in Fritsch et al. (2019). The paper models the conditional expectation of the response CP with multiple linear regression, regression trees and random forests.^{6} An outofsample evaluation of the three approaches is carried out based on root mean square error, mean square error, and mean absolute error. The paper reports superior predictive performance of random forestbased modelling. We reproduced the analysis with the 100 datasets obtained from QRbased MI as follows: First we split the datasets into training and test data; for the test data, we set aside the exact same observations as in the original analysis across all imputed datasets. Second, we fitted a multiple linear regression model, a regression tree, and a random forest to each of the 100 imputed datasets. Note that for the regression trees and the random forests, we tuned the metaparameters of the algorithm for each of the 100 datasets. Third, we predicted the response and evaluated the predictive performance of all methods by the three prediction error criteria for each of the 100 datasets. Overall, 1) there were only minor differences across the 100 datasets in predictive performance; 2) the predictive performance of modelling based on random forests and multiple linear regression improved slightly; 3) the predictive performance of modelling based on regression trees decreased slightly. We do not report more detailed results here, as there were only minor differences compared to the results reported in Fritsch et al. (2019).
Discussion
Note that in this application, the true distribution of observed and missing values, as well as the mechanism generating the missing values is unknown. This is typical for empirical datasets, where applied researchers are generally unaware of the ‘true’ impact of missing values on the obtained inferences. We compared estimates based on the complete cases – under the assumption of MCAR with estimates obtained by QRbased MI under the assumption of MAR. Although we cannot rule out MNAR mechanisms, our analyses provide empirical evidence that MAR processes were negligible and did not substantially affect the modelling results. Given that the MAR assumption holds – at least approximately – and that both, the substantive model and the imputation model are correctly specified (an untestable assumption in the analysis of empirical datasets), the reported MI inferences can be considered as unbiased.
General discussion
We have evaluated QRbased MI in a scenario that is typical for the social sciences and imputed incomplete scale scores that were obtained from ordinal Likert scale items. Typically, bias in statistical inferences depends on multiple factors: The amount of missing values in the dataset, the mechanism generating the missing values and the resulting missing value patterns, the interrelationships in the given dataset, and the selected imputation method and model (Raghunathan, 2016; Schafer, 1997). In our simulation, we held the percentage of missing values and the missing value pattern constant within a certain design so that differences in bias and CR can be attributed to the employed imputation methods. Overall, QR and GAMLSSbased MI yielded the most accurate inferences in comparison to the established MI approaches normal modelbased MI and PMM. While the two latter methods yielded unbiased parameter estimates only in the center parts of the distributions, QR and GAMLSSbased MI also performed well in the offcenter regions. Only in the more extreme scenarions (percentage of missing values of 40%) some small biases regarding the lower quantiles were found. Based on our findings, we conclude that if researchers are interested in modelling effects beyond the conditional mean, imputation methods beyond standard normal modelbased MI are promising. For example, when considering conditional quantiles, QRbased MI is particularly useful. Overall our findings support the recommendation that the imputation model and the analysis model should coincide.
We see the following directions for future research: First, as a sufficiently large sample is typically required by QR to estimate all model parameters in all quantiles of interest sufficiently well, poor performance of QRbased MI could be a byproduct of the relatively small employed sample size of $N=675$. It might be instructive to replicate our simulation using a larger sample size to estimate all quantiles of interest of the conditional distribution of the response. Practical guidelines on the required sample size to obtain unbiased statistical inferences should be established by future research – ideally by taking the model complexity into account. Second, QR generally assumes a continuous response. The corporal punishment variable employed in this paper, for example was obtained by computing the mean of discrete 5Point Likert scale items. Obviously, this does not yield a continuous variable, but creates point masses around certain values (see also Figure 1). Treating scale scores based on discrete items as (quasi)continuous is a common practice in the social sciences. To our best knowledge, systematic evaluations of the behaviour of QR do not exist, when the response is not (perfectly) continuous. Our results suggest, that QRbased MI works sufficiently well for missing values that are based on 5Point Likert scale items, given an adequately sized sample. In future research, the present study could be replicated and the results could be compared to settings where the response is continuous. Additionally, it may be helpful for practitioners to establish guidelines on the number of levels required for a discrete response (e.g., derived from Likert scale items), to consider the response in a QR model (quasi)continuous. Third, QRbased MI should be evaluated more broadly and more complex settings, such as models including nonlinear terms and interaction effects, should be considered. Finally, it will be of interest to applied researchers to compare QRbased MI with other imputation methods implemented in ImputeRobust. Further systematic evaluations and comparisons will help to single out settings in which to use QRbased MI, and when to use other semi, or even fully parametric approaches.