Approaches for dealing with item omission include incorrect scoring, ignoring missing values, and approaches for nonignorable missing values and have only been evaluated for certain forms of nonignorability. In this paper we investigate the performance of these approaches for various conditions of nonignorability, that is, when the missing response depends on i) the item response, ii) a latent missing propensity, or iii) both. No approach results in unbiased parameter estimates of the Rasch model under all missing data mechanisms. Incorrect scoring only results in unbiased estimates under very specific data constellations of missing mechanisms i) and iii). The approach for nonignorable missing values only results in unbiased estimates under condition ii). Ignoring results in slightly more biased estimates than the approach for nonignorable missing values, while the latter also indicates the presence of nonignorablity under all simulated conditions. We illustrate the results in an empirical example on PISA data.
Test data of low stakes large-scale assessments usually contain a substantial proportion of missing responses on test items. In this paper, we focus on missing values due to item omission. Omitted items are usually nonignorable (see, e.g.,
There are different approaches to dealing with missing values (for an overview see, e.g., While multiple imputation is a sophisticated and often used approach for missing values in single indicator variables (e.g.,
Incorrect scoring: Missing responses may be scored as incorrect responses, assuming that the subject did not know the answer. There are different views on the properties of this approach. Assuming that there is a true unobserved response (e.g.,
Ignoring: Another approach is ignoring missing values and, thus, treating them as if they were not administered. This approach assumes that missing responses are MAR, given the observed responses on the items in the test (and other covariates in the background model).
Nonignoring: Based on work of
Studies investigating the performance of the different missing data approaches show that, when the assumptions of the models are met, the respective approaches recover the true parameters. The approach of incorrect scoring results in unbiased parameter estimates when only incorrect responses are missing ( Note that some of the studies generated missing values based on (categorized) sum scores of observed items (e.g.,
While the different approaches have been evaluated extensively under MCAR or MAR assumptions, they have only been evaluated for very specific forms of nonignorability (
with
The mechanism in
In this paper, we want to challenge the three missing data approaches (incorrect scoring, ignoring, nonignoring) to evaluate under which kind of ignorable and nonignorable missing data mechanisms they perform well and where their limitations are. We performed three simulation studies, each using one of the three nonignorable missing data mechanisms. Note that the performance of the three approaches for the missing mechanism in
In line with typical LSAs, we generated data for
We varied the proportion of omitted items as well as the dependency of missing values on the item responses. For omission rates, we chose a proportion of 10% of all item responses to be missing, as this depicts typical applications, as well as a proportion of 30% in order to investigate the effect of proportion of missing values on parameter estimates For example, for a randomly selected replication in the 30% omission proportion and strong MNAR condition this procedure resulted in omission rates on item level ranging from 10% to 49% of responses missing per item. Note that the mean of the item parameters and the mean of the person parameters are zero in the simulation study, resulting on average in an equal amount of correct and incorrect responses in the data.
Conditional probability | ||||||
---|---|---|---|---|---|---|
Missing mechanism |
Missing mechanism |
|||||
MCAR | Medium MNAR | Strong MNAR | MCAR | Medium MNAR | Strong MNAR | |
0.1 | 0.05 | 0 | 0.3 | 0.15 | 0 | |
0.1 | 0.15 | 0.2 | 0.3 | 0.45 | 0.6 |
The data were analyzed using the R-package TAM (
Approach | ||||||
---|---|---|---|---|---|---|
Missing mechanism |
Missing mechanism |
|||||
MCAR | medium MNAR | strong MNAR | MCAR | medium MNAR | strong MNAR | |
Incorrect | -0.263 (0.017) | -0.136 (0.019) | 0.000 (0.019) | -0.737 (0.015) | -0.384 (0.017) | 0.000 (0.019) |
Ignore | -0.001 (0.020) | 0.114 (0.020) | 0.230 (0.020) | -0.001 (0.020) | 0.453 (0.021) | 0.957 (0.021) |
Non-ignoring | -0.001 (0.020) | 0.114 (0.020) | 0.227 (0.020) | -0.001 (0.020) | 0.439 (0.022) | 0.931 (0.021) |
Given that the average of the item parameters was fixed to zero, that is to the true values, in the estimation, the approach of ignoring and the approach of nonignoring yielded unbiased item parameter estimates in all conditions (
In order to investigate whether the approach for nonignorable missing values reflected the (non-)ignorability of the data generating mechanism, we evaluated the estimated correlation of the item difficulties and item parameters for the missing indicators
Estimated Parameter | ||||||
---|---|---|---|---|---|---|
Missing mechanism |
Missing mechanism |
|||||
MCAR | medium MNAR | strong MNAR | MCAR | medium MNAR | strong MNAR | |
-0.004 | -0.951 | -0.966 | 0.003 | -0.984 | -0.985 | |
0.016 | 0.018 | 0.238 | 0.015 | 0.113 | 0.396 | |
0.000 | -0.092 | -0.868 | 0.000 | -0.731 | -0.915 |
We simulated data according to the approach for nonignorable missing values (
While in simulation 1, the probability of a missing value was fixed for a given item response (being 0 for correct responses and 0.6 for incorrect ones in the considered condition), in the data generating mechanism of simulation 2, it varied across persons and items. The average missing probability for incorrect responses across all persons and items was .406 with a
The approach for nonignorable missing values was able to retrieve the true parameters. There was no bias in any of the parameters estimated in this model (see
Approach | |||||
---|---|---|---|---|---|
Incorrect | -0.9251 | -0.1326 | NA | NA | NA |
Ignore | 0.0389 | -0.0243 | NA | NA | NA |
Nonignoring | 0.0011 | 0.0024 | 0.0043 | 0.0044 | 0.007 |
Note that the differences in the estimated average ability between the three approaches are similar in simulation 1 and 2. Thus, we cannot infer the underlying missing mechanism from the difference in estimates between the different approaches, as this does not differ between the different mechanisms.
In order to look for possible indicators in the data that may help to distinguish the missing process in simulation 1 from that in simulation 2, we investigated model fit indices when using the approach for nonignorable missing values (
Simulation | BIC | ||||
---|---|---|---|---|---|
Sim 1 | 176169.3 | 1.0004 | 0.01530 | 1.00375 | 0.03459 |
Sim 2 | 176316.9 | 1.0001 | 0.01507 | 1.00008 | 0.03344 |
We generated missing data according to the following formula for the probability of a missing response:
with ξ denoting the latent missing propensity as defined by Note that for a missing probability of zero, we set
Conditional probabilities | ||||||
---|---|---|---|---|---|---|
Impact of item responses |
Impact of item responses |
|||||
no | medium | strong | no | medium | strong | |
0.166 (0.096) | 0.234 (0.121) | 0.295 (0.138) | 0.405 (0.157) | 0.548 (0.162) | 0.677 (0.147) | |
0.085 (0.059) | 0.043 (0.033) | 0.000 (0.000) | 0.246 (0.130) | 0.126 (0.081) | 0.000 (0.000) |
Note that in simulation 3 the proportion of missing values is slightly larger than 10 and 30 percent. This resulted in missing rates on item level ranging from 6 to 66% for a randomly selected replication in the condition with about 30% omitted responses and strong MNAR.
Data analyses were in accordance with simulation 1 and 2.
The bias was similar in size across all levels of ability (see Figure S3 in the
Approach | ||||||
---|---|---|---|---|---|---|
Impact of item responses |
Impact of item responses |
|||||
no | medium | strong | no | medium | strong | |
Incorrect | -0.235 (0.008) | -0.119 (0.008) | 0.000 (0.008) | -0.692 (0.008) | -0.349 (0.008) | 0.000 (0.008) |
Ignore | 0.012 (0.008) | 0.148 (0.008) | 0.285 (0.008) | 0.041 (0.010) | 0.525 (0.009) | 1.060 (0.010) |
Nonignoring | 0.000 (0.008) | 0.132 (0.008) | 0.265 (0.008) | 0.000 (0.010) | 0.462 (0.009) | 0.981 (0.010) |
This is because in these data generating conditions, the impact of the item response on the probability of a missing response was very strong, while it was much lower for the missing propensity. This can be seen in
The approach for nonignorable missing values reflected not only the nonignorability of the data due to the missing propensity, but also due to the impact of the item response (
Estimated parameter | Amount of missing responses |
|||||
---|---|---|---|---|---|---|
low |
high |
|||||
Impact of item responses |
Impact of item responses |
|||||
no | medium | strong | no | medium | strong | |
-0.979 | -0.983 | -0.982 | -0.981 | -0.985 | -0.983 | |
0.403 | 0.688 | 0.892 | 0.401 | 0.740 | 1.053 | |
-0.906 | -0.938 | -0.947 | -0.908 | -0.939 | -0.924 |
In the following empirical example, we illustrate what can be inferred from analyses results in practice. We reanalyzed the Italian PISA 2012 data on assessing math competence, which is publicly available (see
Applying the approach for nonignorable missing values resulted in an estimated correlation between the item difficulties and item parameters for the missing indicators of -0.664, indicating that more difficult items are more often omitted. The estimated variance of the omission propensity was large
We compared the ability estimates using each of the three approaches. Fixing the average item difficulty in all three analyses to zero, we found considerable differences in estimated mean ability: Incorrect scoring results in much lower average ability estimates
The non-zero correlation between ability and missing propensity indicates that there is some form of nonignorability in the data. From the results we can, however, not infer to the specific kind of nonignorable mechanism. Any of the nonignorable mechanisms described in
Approaches for nonignorable missing values have often been over-interpreted by users of being able to deal with all kinds of nonignorable missing data. So far, the performance of the different approaches for missing values has only been investigated on a limited set of nonignorable missing data mechanisms. We investigated the performance of incorrect scoring, ignoring missing values, and using the approach for nonignorable missing values on three different kinds of nonignorable missing data mechanisms. The three mechanisms differ in whether the probability of a missing value is a) a function of the item response, b) a function of a unidimensional missing propensity (and the respective model parameters), or c) a function of both, item response and missing propensity. The results found in this study are in line with previous studies. Similar to
Additional to previous studies, we also investigated the performance of the approaches under other nonignorable missing data approaches. There is no approach that can deal with all mechanisms of nonignorable missing values. Regarding nonignorable missing data mechanisms, the approach for nonignorable missing values is only appropriate under a missing data mechanism, in which the probability of a missing value is solely a function of the missing propensity. For any of the other mechanisms it results in biased parameter estimates. Ignoring missing values results in similar estimates as the approach for nonignorable missing values with only slightly larger bias. Incorrect scoring only results in unbiased parameter estimates when missing values solely occur on incorrect responses. Note that none of the approaches can deal with missing data mechanisms in which the missing value depends on the item response and missing values also occur on correct responses.
The simulation studies in this paper have some limitations. First, we did only consider Rasch models as measurement model for item responses as well as missing indicators. In practice often other models, such as the 2PL or 3PL model or (generalized) partial credit model are used. While the current study did not investigate this, the results will most likely also hold for these types of measurement models as previous research (
Even though the nonignoring approach cannot account for all types of missing mechanisms, it can be useful for investigating the mechanism of missing values as it does reflect the existence of some form of nonignorability of the missing process in the data. The results show that one needs to be cautious when interpreting the results of analyses using any of the proposed approaches. When choosing an approach for dealing with missing values, we must evaluate the plausibility of each underlying mechanism by making use of pilot studies, empirical analyses, and theory. We must discuss our assumptions on the underlying mechanism to make our arguments available to readers who may then judge whether these are plausible. If a nonignorable mechanism as described in
Based on work of
Data for the empirical example is freely available from the OECD ( see
Scatterplots of EAP-estimates of person ability for Simulations 1 to 3 for a single replication are available via the PsychArchives repository (for access see
Data for the empirical example (Italian PISA 2012 data) is freely available from the OECD (for access see
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) within the Priority Programme 1646: Education as a Lifelong Process (Grant No. PO1655/2-1).
The authors have declared that no competing interests exist.
We thank two anonymous reviewers for helpul comments on our manuscript.