Rapid guessing is a test taking strategy that consists of responding to items fast and without attempting to solve the item properly (Wise, 2017). This strategy enables the completion of a set of items within a very short time span. This behavior is motivated by testing situations that impose a time limit, as is common with many achievement tests. In such an occasion, rapid guessing enables the completion of all so-far not-reached items shortly before the end of testing time. The use of this test taking strategy in combination with a time limit in testing avoids item response omissions that can impair validity of measurement (Lu & Sireci, 2007). Therefore, it may appear that rapid guessing contributes to valid measurements. However, the presence of rapid guessing response behavior may actually introduce variance not related to the trait being measured. That is, variance due to the participants’ intention to respond at random to complete the test. This irrelevant variance may be substantial enough to begin to alter the technical quality of the items and resulting scores.
Another possible consequence is that the irrelevant variance manifests itself as an additional factor in the latent structure of the test. Models for structural investigations (e.g., factor analysis, dimensionality assessment) mostly assume that there is only one latent source of responding that leads to systematic and relevant variation, which is captured by the latent variable included in the measurement model (Graham, 2006). The enlargement of such a model by integrating another latent variable for capturing systematic irrelevant variation due to processing speed as assumed source of omissions because of a time limit can provide an account of speeded data in the absence of rapid guessing (Schweizer, Troche et al., 2019). But it remains unclear whether such an enlarged model can account for data if responses due to participants’ intent to respond at random replace omissions. This study reports on an investigation as to whether the influence of speeded data can be detected despite the replacement of omissions by random data. Such data can be expected in speeded testing as compared to power testing, and may even originate from power testing with an ample time limit that is yet insufficient for a subset of participants.
Rapid Guessing for Preventing Omissions
Although participants taking a test are expected to spend as much time as necessary on each item and to provide the best possible response, they may deviate from such behavior for various reasons. For example, there are situations, for example, where test scores will have major consequences (e.g., employment, education opportunities) that may lead participants to use inappropriate test taking strategies when completing items in order to increase the chance of reaching a high score (Stenlund et al., 2018). Furthermore, social desirability may play a role in responding where 1) a participant works to complete all items to be a “good” participant (Vogt & Johnson, 2015) or 2) a participant behaves according to the stereotype of a smart person by completing all items, even if guessing. Moreover, there may be the instruction or recommendation to make use of rapid guessing in test taking that is taught in test coaching courses. Also, the possibility exists that participants use rapid guessing in assessment environments where the consequences of the scores are low (i.e., low-stakes testing) or due to other reasons.
The advantage promised by rapid guessing is that a random response can be correct. If there are several response options and only one is correct, the probability that the random response is correct is one divided by the number of response options. That is, if there are four options and the participant guesses at random, there is a 25% chance of a correct response compared to not responding at all and ensuring a 0% chance of a correct response. Smaller numbers of response options are associated with larger probabilities of a correct response and larger numbers of response options with smaller probabilities. This strategy could be an advantage for the examinee if the number of correctly completed items serves as measure of performance.
Structural Investigation of Speeded Data
A popular way of investigating the internal structure of a scale to support a scoring inference for validity is with confirmatory factor analysis (CFA). A common assumption in item response theory is unidimensionality, which can, in part, be demonstrated by a one-factor CFA model (Graham, 2006). Such a demonstration confirms that the data are due to one latent source of systematic relevant variation. In this case, a latent variable specified in the CFA measurement model captures the systematic relevant variation due to the latent trait or ability measured by the test or assessment. These measurement models in a CFA framework can be specified with different types of indictors or items (e.g., continuous or dichotomous variables) of the trait measured. For didactical reasons we separate the discussion regarding the factor structure from the discussion regarding modeling different data types. For convenience, we discuss models under the assumption of continuous data in this section for the discussion regarding structure.
A measurement model specifies the influences that are assumed to determine the participants’ responses to a given item. A one-factor CFA model assumes one latent source of systematic responding that is reflected by latent variable ξ. The contribution of ξ to completing the ith item (i = 1, …, p) is quantified by factor loading λi. Additionally, assumed contributions are those of random influences that are represented by δi without further specification (e.g., no correlated residuals). Such a model relates the p×1 vector of manifest variables x to the sum composed of the product of the p×1 vector of factor loadings λ of the manifest variables on the latent variable and latent variable ξ on one hand and of the p×1 vector δ of random variables on the other hand:
A scale is said to show structural validity if this model accounts for the item covariance matrix. However, this is not general or all validity but validity restricted to major characteristics of the circumstances of data collection. One major characteristic is the time span for completing the items of the scale, as time limits in testing can alter the validity of the data (Lu & Sireci, 2007). In the case of a time limit in testing that prevents participants from completing all items, the data are not only due to the latent source and random influences but also due to latent processing speed (Partchev et al., 2013). The influence of latent speed even appears to increase with increasing age including adulthood (Borter et al., 2020). A lack of latent processing speed can lead to omissions, as is demonstrated by comparing the outcomes for models representing different latent sources of responding in investigating reasoning data (Schweizer, Reiß et al., 2019).
A modified CFA model of Equation 1 is necessary in order to account for systematic variation that is due to latent processing speed. Since latent processing speed is to be considered as another latent source, the variation due to latent processing speed needs to be captured by another latent variable. The necessity to consider a second latent variable creates circumstances comparable to the circumstances leading to multitrait-multimethod models (Byrne, 2016), and this irrelevant variance may be seen as a method effect. While in the standard CFA measurement model a manifest variable shows one factor loadings only, manifest variables of multitrait-multimethod models have cross-loadings. The modified model is a two-factor model with two latent variables, which we address as the primary latent variable and an additional latent variable and , respectively. Together they explain the manifest variables of the p×1 vector x:
There are two different types of two-factor models. The first type combines two one-factor models into a whole. The major characteristic of this type is that each manifest variable (e.g., item) loads on one latent variable only so that there are no cross-loadings (Kline, 2005, p. 175). The other type of a two-factor model allows the manifest variables (e.g., items) to cross-load. One version of this model is a bifactor model that comprises a general latent variable and a specific latent variable (Reise, 2012). The specific latent variable differs from the general latent variable in that it receives factor loadings from a subset of items only, and general and specific latent variables are orthogonal. This model enables the capturing of the systematic variation due to a general source and a specific source. The other version of this type of model is the multitrait-multimethod model (Byrne, 2016). This version has been proposed for investigating data that were collected according to a multitrait-multimethod design.
It is the first version of the second type of model that is suitable for data originating from two latent sources that simultaneously contribute to at least a few items. More specifically, since one source can be assumed to be active in completing all items whereas the second source is only active in some items, it is a bifactor model that is required for investigating data collected with a time limit in testing. This means that all entries of λprimary are either free for estimation or constrained to correspond to expected values whereas some entries of λadditional are fixed to zero. These are the entries regarding items that are not influenced by processing speed, that is, show no omissions:
Free factor loadings and factor loadings fixed to correspond to expected values have been shown to perform virtually equally well in simulated data if the expected values are adapted to the assumed latent source and the number of participants selecting the strategy (Schweizer et al., 2020; Schweizer, Troche et al., 2019). Using fixed factor loadings, it is necessary to free the associated variance parameter φadditional for estimation. These types of factor loadings have different properties. Free factor loadings can accommodate all kinds of effects so that there is hardly any impairment in model fit. This means that the factor loadings on the latent variable account for the systematic variation due to the intended latent source and in addition to some degree accommodate systematic variation due to other sources. In contrast, fixed factor loadings can only account for the systematic variation due to the intended latent source. If there is further systematic variation that may be due to a method effect, this leads to model misfit. The greater probability of model misfit may be considered as a downside of fixed factor loadings but there is also an advantage: good model fit indicates that the model captures exactly what it is expected to capture and nothing else.
Values for serving as factor loadings in order to capture systematic variation due to processing speed can be obtained by the cumulative normal distribution function that is approximated by the logistic function. The cumulative normal distribution function is obtained from the normal distribution function that is assumed to characterize the density distribution of latent processing speed. Using the logistic function, the factor loading of the ith item (i = 1, …, p) on the latent variable representing latent processing speed λi is defined as follows:
Figure 1 illustrates how a time limit in testing modifies the probability of responding correctly.
The curve printed as a solid line illustrates the assumed probabilities of a correct response if there is no time limit in testing. This curve suggests that the items are arranged according to their difficulty levels. The curve printed as a dashed line represents the assumed probabilities of a correct response originating from testing with a time limit. The assumed gradual drop-off of participants causes an increasing degree of deviation toward the end of the sequence of items.
Effect of Rapid Guessing on Investigating the Latent Structure
There may be consequences of rapid guessing different than leaving the not-reached items as omitted, and begins with the expected distribution of omissions. This distribution needs to be modified to take into consideration that correct and incorrect responses at random replace omissions. For this purpose, a clearly defined expected probability of a correct response at random that is independent of the difficulty level of the item is necessary. We assumed that the data were collected with items showing a multiple-choice response format in order to have a basis for such probabilities. In this case, the expected probability of a correct response at random solely depends on the number of response options. The multiple-choice response format is the most popular response format (Johnson & Morgan, 2016). Correct responses at random obtained by this response format are not only an issue with respect to rapid guessing, but also a general issue of assessment (Drasgow & Mattern, 2006).
Since the logistic function varies between zero and one and is assumed to provide values corresponding to the expected frequency of omissions divided by the upper limit for the frequency of omissions, it can be perceived as probability. Accordingly, in the following discussion we use probabilities for combining the description of the effect of a time limit with the description of the effect of rapid guessing. The expected probability E[Pr( )] for Xi (i = 1, …, p) to be an omission is described with respect to the set of omissions Co. To keep this section connected to the previous discussion, we start from Equation 4:
Next, the influence of rapid guessing needs to be quantified. Rapid guessing means that Xi (i = 1, …, p) can be perceived as taken either from the set of correct responses Cc or the set of false (= incorrect) responses Cf. The expected probability depends on the number of response options. If we assume that this number is f, the expected probability of a correct response due to rapid guessing is given by
The majority of correct responses can be assumed to originate from the primary source of responding whereas omissions turned into incorrect responses are more likely than omissions turned into correct responses at random. This suggests that the focus has to be on the incorrect responses in quantifying the effect of rapid guessing on the detection of speededness. Accordingly, the expected probability of an incorrect response due to latent processing speed in combination with rapid guessing is given by
Figure 2 provides a graphical representation of how rapid guessing modifies the probability of a correct response.
This Figure includes curves depicting the probability of responding correctly if participants use rapid guessing in combination with response formats including two, four, six and eight options. The curves suggest that eight, six and even four response options only cause minor deviations from the curve for no rapid guessing.
Equations 8 and 9 enable a first evaluation of the consequences of rapid guessing for the detection of the effect of speeded testing. The comparison of Equation 9 and Equation 4 suggests that there is a decrease of the systematic variation due to latent processing speed. The decrease of systematic variation also means a reduced probability of detecting it. Further, the comparison of Equation 8 and Equation 5 reveals that there is a decrease in the probability of detecting the effect of speeded testing. The decrease depends on the number of response options. Therefore, we state the following hypothesis for the empirical investigation: the larger the number of response options, the larger the probability of the detection of rapid guessing.
Despite the indicated impairment of the detectability of the effect of a time limit in testing there is also positive news: there is still some chance to detect this effect. Further, the larger the number of response options, the larger is the probability to detect it. Concerns about the effect of the number of response options leads Johnson and Morgan (2016) to recommend three or more response options in constructing multiple-choice items. Three response options are reported to be most common in applications (Rodriguez, 2005).
Confirmatory factor analysis attempts to estimate a model that can reproduce the covariance matrix. This involves comparing the model-implied p × p covariance matrix Σ with the p × p empirical covariance matrix S by means of a discrepancy function. Good model fit is considered as confirmation of the specified measurement model (Graham, 2006) that gives rise to Σ, and model misfit as its rejection. Since factor analysis is mostly conducted according to a measurement model that includes continuous and normally distributed variables, the appropriateness for investigating dichotomous data that are considered in the present work may be called into question (Kubinger, 2003).
Mathematics offers several solutions for relating different types of data to each other. There are link transformations as part of generalized linear models (McCullagh & Nelder 1985; Skrondal & Rabe-Hesketh, 2004), variance stabilizing methods (Guan, 2009; Morgenthaler & Staudte, 2012) and methods that are specific for factor analysis. Methods specific for factor analysis include the transformation of dichotomous data into frequencies or probabilities that are considered as continuous. This important step turns binary information into continuous information. Further transformations leads to tetrachoric correlations or probability-based covariances that serve as input to factor analysis (Schweizer et al., 2015). Whereas the computation of tetrachoric correlations includes the computation and use of thresholds that eliminates the effect of splitting data according to probability level p in dichotomization, probability-based covariances still include the effect of p that may be considered as a reason for observing spurious factors (Kubinger, 2003).
The model-implied p × p covariance matrix Σ is defined as
Using this model in combination with dichotomous data requires adaptation that is two-fold in the approach characterizing this work. First, there is adaptation of the scale level of data that occurs in computing probability-based covariances that changes from binary to continuous (see the paragraph preceding the previous paragraph). We symbolize this adaptation by transformation T of x so that the manifest part of the model becomes Tprobability-based covariance(x). But there is still dependency on p used in dichotomization. Second, this dependency is removed by an additional transformation that is realized as weighting. The item-specific weight wi (i=1,…,p) is defined as
The corresponding model-implied p × p covariance matrix Σ is a matrix that is prepared for probability-based covariances as input:
The correctness of the model-implied p × p covariance matrix Σ specified according to Equation 15 for reproducing the p × p empirical covariance matrix S can be investigated by the maximum likelihood estimation method. This method maximizes the likelihood of the parameters of the model with respect to the data. For this purpose function F is minimized:
The preconditions for making use of function F are continuous data, invertibility of Σ and positive definiteness whereas there is no precondition regarding distribution. Yet, in the application of F for comparing S and Σ there is restriction regarding the distribution of data that originates from Σ. The variables of this model are assumed to follow the normal distribution and are treated as such. Skewness is a deviation from normality that has been demonstrated to lead to model misfit (Lai, 2018). Correction methods and estimation methods have been developed that aim at overcoming such deviation of data from normality.
Our approach differs from the available data-focused approaches in that it seeks to modify the model in such a way that model and data correspond to each other according to major distributional properties. This means that it makes use of the characteristic of the maximum likelihood estimation function of no restriction regarding the distribution. The factor loadings are modified by multiplication with weights in such a way that the effect of splitting continuous data according to probability level p in dichotomization is compensated. This is an important precondition for the correct reproduction of the entries of S computed from dichotomous data. In the following we prefer to refer to our approach as ML-MA (model-adapted ML).
The main objective of the empirical investigation was to examine if the effect of a time limit in testing was detectable in data despite participants’ rapid guessing. The use of this guessing strategy was an important issue as its strict application would result in the complete disappearance of omissions. Complete disappearance of omissions meant that the effect of a time limit in testing was no longer apparent in descriptive statistics.
The simulated data for this investigation had to show 1) the characteristics of data originating from a time limit situation in testing leading to omissions, 2) the use of a multiple-choice response formats, and 3) rapid guessing. The selected time limit was assumed to allow all participants to complete approximately two-thirds of the 20-item set before they would gradually stop responding properly. Furthermore, the data had to show the consequence of the participants’ rapid guessing. For this purpose, the simulated omissions due to the testing time limit were replaced by simulated random responses.
Data matrices composed of 500 rows and 20 columns were generated by means of three 20 × 20 relational patterns (Jöreskog & Sörbom, 2001). These patterns showed off-diagonal entries that could be reproduced by factor loadings of 0.325, 0.375 and 0.425 of a one-factor model; we referred to them as low, medium, and high levels. As they reflected the influence of the primary latent source of responding to be used for controlling effects, we refer to them as levels of source influence. The diagonal elements of the relational patterns were set equal to one. Each one of the three relational patterns served the generation of 500 matrices of continuous and normally distributed random data [N(0,1)]. In the following, we refer to rows of the matrices as simulated participants and to the columns as simulated items.
Establishment of Data Characteristics
The continuous data were dichotomized so that the first simulated item showed a simulated probability of a correct response of .95 and the last simulated item of .50. The simulated probabilities of the simulated items in-between linearly decreased. Furthermore, omissions were integrated into the data matrices using the logistic function. That is for each simulated item (= column) the percentage of simulated participants (=rows) who were expected to be unable to respond within the available time span was determined by the logistic function. After the selection of a simulated participant the entries to this and all following simulated items were turned into omissions. The turning point that marks the switch from the increase in steepness to the decrease of steepness of the logistic function was set to item 18 (Schweizer, Wang et al., 2020).
Simulation of Response Format
The omissions were replaced by random data (correct responses or incorrect responses at random), as could be expected because of rapid guessing. Because of the crucial influence of the number of response options different multiple-choice response formats were considered. Eight, six, four and two response options were selected for this study. The corresponding probabilities of a correct response at random were 1/8, 1/6, 1/4 and 1/2 respectively. They served the investigation of the hypothesis regarding the number of response option (see end of the theoretical section). Furthermore, no replacement of omissions, that is, no rapid guessing, was also considered in order to have a comparison level. Altogether, there were 500 × 3 (source influence levels) × 5 (response option levels) matrices.
The confirmatory factor models included either one or two latent variables (=factors). One of them was designed to capture systematic variation due to the primary source of responding and the other one to capture systematic variation due to the additional source that was assumed to be latent processing speed. The latent variables were not allowed to correlate with each other. Furthermore, there were 20 manifest variables. The constraints for the factor loadings on the latent variable representing the effect of the time limit in testing were obtained according Equation 4 or Equation 9. The variance parameters of the latent variable with fixed loadings were set free. The factor loadings on the substantive latent variable were freely estimated while the corresponding variance parameter was fixed to one. The input to confirmatory factor analysis was achieved by computing probability-based covariances.
The study included the type of model (one-factor vs two-factor models) as main independent variable, the response formats as a minor latent variable and the levels of source influence as control variable. The dependent variable was model fit measured by CFI.
The statistical investigation was conducted using ML-MA version of maximum likelihood estimation (Jöreskog & Sörbom, 2006). Only the Comparative-fit-index (CFI; see DiStefano, 2016; Peterson et al., 2020) was considered for model evaluation in this study since this index was found to be especially sensitive to the effect of a time limit in testing (Schweizer, Troche et al., 2019). Models were compared by means of the CFI and χ2 differences; where a difference of .01 could be considered as substantial regarding the CFI difference (Cheung & Rensvold, 2002) and of 3.84 regarding the χ2 difference (df = 1). Since each condition required analyzing 500 matrices, there were 500 estimates of model fit for each condition. The mean fit results were evaluated and compared.
The mean CFI results observed for the one-factor and two-factor CFA models are presented as bars in Figure 3. The first set of three bars provides the results for data showing no effect of a time limit in testing, that is, the data were complete and without rapid guessing. The mean CFIs depicted by these bars were larger or close to .95 that indicated good or marginally good model fit, by standard criteria for fit. The following five sets of three bars picture the results obtained by the one-factor confirmatory factor model for different probabilities of a correct response at random. There was a monotonic CFI increase that was associated with an increase in the probability of a correct response at random; that is, a reduction of the number of response options. Whereas model misfit was indicated for no random responses, good model fit was signified for the probability of .5 (that means two response options) in two levels of source influence. The sets of bars on the right side of the Figure depict the results observed by the two-factor confirmatory factor model. As suggested by the sizes of the bars, the model fit was always good for two levels of source influence and the results for the third level were close to .95. In each set of three neighboring bars, the CFIs showed dependency on the levels of source influence. Always the highest level led to the largest CFI and the smallest level to the lowest CFI.
Table 1 comprises the CFI differences between the CFI results of the corresponding one-factor and two-factor confirmatory factor models.
|Source influence level||CFI differences for the following probabilities of a correct response
aComparison level. There was no replacement of omissions. bThe probability of a correct response due to chance (instead of an omission).
*p < .05 (according to Cheung & Rensvold, 2002).
The columns of the table refer to the probability levels (numbers of response options) and the rows to the source influence levels. All differences of the first to third columns were larger than or equal to .01. These result signified a substantial improvement in model fit from the one-factor to the two-factor models for the probabilities of zero, 1/8 and 1/6; that is, for no replacement of omissions and response formats with eight and six response options. In the fourth column there was only one other substantial difference for the lowest source influence level. Not one of the differences reported in the last column reached the level of statistical significance.
The χ2 differences for the corresponding one-factor and two-factor confirmatory factor models are included in Table 2
|Source influence level||χ2 differences for the following probabilities of a correct response
aComparison level. There was no replacement of omissions due to rapid guessing. bThe probability of a correct response due to chance (instead of an omission). cSince the two-factor model with free factor loadings on the first factor led in a large number of datasets to estimation problems in this condition, the factor loadings on this factor were fixed to one.
*p < .05.
This table shows the same structure as Table 1. The χ2 results were in line with the CFI results with two exceptions. The first exception was the result for the combination of the source level of .375 and the probability of a correct response of 1/4 and the second exception the result for the combination of the source level of .425 and the same probability of a correct response. In both cases the χ2 difference signified a substantial difference whereas the CFI difference did not.
In sum, an effect of a time limit in testing was detected if there were no less than either six response options (CFI difference) or four response option (χ2 difference), or expressed in a different way, if the probability of a correct response at random was not larger than .167 (or .25) with one exception.
Accurate data are the precondition for the achievement of new insights in science; the control of sources that potentially impair data is an important part of scientific research. A long known issue regarding measurement validity is a time limit in testing (Lu & Sireci, 2007; Wise, 2017). A time limit is likely to prevent some or even all participants from completing all items so that omissions are inevitable. This means that the validity of measurement not only depends on the structural validity of the scale but also on the time limit in testing. Data collected with a time limit in testing are likely to lead to model misfit in an investigation of the structural validity of the scale by standard confirmatory factor analysis. Model misfit is normally interpreted as indication of the lack of unidimensionality or some specified structure, although in this case the true reason of the failure is the time limit in testing.
A time limit in testing creates a special precondition for the statistical investigation of internal structure that requires adaption of the factor model. The special precondition is that two sources of responding need to be considered instead of only one (Partchev et al., 2013). There are further preconditions that are of importance but need not to be considered as long as conventions hold. An important convention is that the participants behave as expected. They are expected to thoroughly process the items of the scale successively. Scale instruction and item text play a key role in invoking this convention (Johnson & Morgan, 2016). Statistical investigations by a two-factor measurement model can be expected to yield accurate results despite a time limit in testing as long as the participants’ behavior is guided by this convention.
Rapid guessing means a violation of this convention. Various reasons can lead to its violation including test preparation courses that advise participants to respond to all items even if there is not enough time for completing them appropriately. Following this advice leads to complete data that may be regarded as desirable because the missing data problem is avoided (Little & Rubin, 2019) in the statistical investigation. However, this solution to the missing data problem does not take into consideration that responses at random, as data at random do not harmonize with data originating from the latent source of responding that is measured by the scale. The lack of harmonization means that rapid guessing cannot prevent the impairment of measurement validity due to a time limit in testing (Lu & Sireci, 2007).
As is demonstrated in our results, there remains the possibility to capture systematic variation that is due to latent processing speed despite rapid guessing. It is not even necessary to modify the model for the investigation of speeded data because of rapid guessing and also not necessary to measure processing times (Wise & Kong, 2005). The expected values obtained by the logistic function with and without the consideration of rapid guessing only differ by a constant multiplier. The results suggest that despite the use of rapid guessing in completing the items, it is possible to detect latent processing speed as one source of responding.
Although the use of rapid guessing does not prevent the detection of latent speed as one source of responding, it is not without a negative consequence for structural investigations. The integration of the probability of a correct response at random into the formal representation of the expected effect of the time limit in testing suggests an impairment of the probability of detecting this effect in structural investigations. This impairment is demonstrated to depend on the number of response options. Our results support the hypothesis suggesting such impairment.
A limitation of the present study is the assumption that all participants perform rapid guessing and that omissions completely disappear. Another limitation is the assumption of independence of ability and rapid guessing. Further limitations are the considerations of a single test length, the arrangement of items, the absence of omissions due to other sources, constancy of sample size, independence of factors. Moreover, free factor loadings (Estrada et al., 2017) are not considered besides fixed factor loadings. The assumptions may not hold in real test taking situations. The behavior observed in real test taking situations appears to show a large degree of variability (McNulty et al., 2007) instead of the homogeneity assumed for the reported study and to depend on the participants cognitive ability (Lindner et al., 2019). There may also be participants who behave as expected besides others who behave inconsistently. But, given that we investigated a kind of worst-case scenario, our results are promising regarding the detection of latent processing speed as source of responding.