In the world of factor analysis (FA), Guttman’s (1956) image theory is viewed today (at best) as a mathematical curiosity that reflects the spirit of logical positivism of the times when it was proposed (Ferrando, 2021). However, some basic results of image theory provide useful diagnostic tools and indices for common exploratory FA (EFA). At present, most of these tools have fallen almost totally into disuse, which is only to be expected for at least three reasons. First, the theory on which they are based is considered obsolete. Second, the terminology they use is highly obscure and potentially confusing: it is derived from models of psychometric inference (the domainsampling theory) which are no longer in use today (Ferrando, 2021). Finally, they still remain as relics in old programs that are still active but are not generally implemented in modern EFA software.
In this article we review an index of the type discussed above: Kaiser’s (1970; Kaiser & Rice, 1974) measure of sampling adequacy (MSA) at the singlevariable level (a quite unfortunate name). Furthermore, our review is done within a specific scenario: the preliminary selection of items as part of the item analysis process, when this analysis is based on the EFA model. Within this framework we have several aims. The first is to present the index using modern FA terminology so that the interested practitioner can understand its underlying rationale and why it is of interest for the intended purpose. The second is to improve the use of the index by embedding it within a robust procedure that will efficiently flag the most inappropriate items. The final aim is instrumental and practical: the procedure as developed here is implemented in different statistical programs so that researchers can use the one that is best suited to their purposes.
We shall now discuss the choice of scenario and the potential role that MSA can play. To start with, we consider EFA to be the most appropriate model for item analysis, especially in the initial stages in which the grossly inappropriate items are discarded (Muñiz & FonsecaPedrero, 2019). Now, as we discuss in detail below, the inappropriate items that can be flagged with MSA are the types that most frequently give rise to problems when solutions with a different number of factors are tested on the calibration data. So, our interest in using MSA is clear: it can discard the most inappropriate items before the FA extraction stage begins, (and so before the number of factors is even specified). This initial cleaning can greatly simplify the subsequent phases of the analysis.
Reviewing MSA: A Modern View
Consider the wellknown correlational structure of the EFA model
1
$R=\Lambda \Phi \Lambda \text{'}+{\Psi}^{2}$where R is the m × m interitem correlation matrix, Λ is the m × r pattern, Φ is the r × r interfactor correlation matrix, and Ψ is the m × m diagonal matrix containing the item residual standard deviations. The main result which serves as a basis for MSA is the following (Guttman, 1956): if the set of items under scrutiny behaves according to Model 1, then the inverse of the interitem correlation matrix should be near diagonal, and should approach a fully diagonal matrix as the number of items per factor increases and the number of common factors remains constant. This limiting result (the number of items indicating a factor increases without bound) is the origin of the unfortunate name “sampling adequacy”: in fact, the items under study are viewed as a sample of a potential universe of items that could measure this factor.
Consider now, the following transformation of the inverse of R:
2
$\begin{array}{cc}\hfill {S}^{2}=& {\left[diag({R}^{1})\right]}^{1}\hfill \\ \hfill P=& 2IS{R}^{1}S\hfill \end{array}$Clearly, if R^{1} is near diagonal, then the P matrix in Equation 2 will have to be, too. However, the transformed matrix P has a far clearer interpretation than R^{1}. P is the partial interitem correlation matrix. So, its diagonal elements are 1’s and its nondiagonal elements are the correlations between the corresponding pairs of items after the influence of the remaining m − 2 items have been partialled out. Furthermore, P is a (finite) estimate of the correlation matrix between the unique factors, which should be identity according to Model 1 (see Ferrando et al., 2021). So, to summarize: if the EFA Model 1 holds, then the partial correlations between any pair of items once the influence of the remaining items have been partialled out (i.e., the nondiagonal elements of P) should approach zero.
The most popular (or the least forgotten) index derived from the results just described is the KaiserMeyerOlkin (KMO) measure of overall adequacy (Kaiser, 1970; Kaiser & Rice, 1974). The KMO is still used in some item factor analysis (IFA) applications as a test for deciding whether the interitem correlation matrix is suitable for being factored (Ferrando & LorenzoSeva, 2017). It is obtained as:
3
$KMO=\frac{{\displaystyle \sum _{j\ne k}^{m}{\displaystyle \sum _{k}^{m}{r}_{jk}{}^{2}}}}{{\displaystyle \sum _{j\ne k}^{m}{\displaystyle \sum _{k}^{m}{r}_{jk}{}^{2}}}+{\displaystyle \sum _{j\ne k}^{m}{\displaystyle \sum _{k}^{m}{p}_{jk}{}^{2}}}}$where j and k are indices referring to individual items, r are the firstorder correlations between pairs of items (i.e., nondiagonal elements of R), and p are the corresponding partial correlations (i.e., nondiagonal elements of P).
Equation 3 is a relative measure, bounded between 0 and 1, and intended to reflect higher suitability as its value approaches unity. It can be seen from Equation 3 that KMO values are higher as the firstorder interitem correlations become larger and the corresponding partial correlations approach zero. Conceptually then, KMO values are larger when 1) the items are strongly correlated with one another (i.e., high internal consistency) and 2) these correlations do not reflect shared specificity (i.e., there are no unique correlated factors). Minimum cutoff values for considering the correlation acceptable for FA purposes were proposed by Kaiser (1974): .9 (Very good), .8 (Good), .7 (Fair), .6 (Mediocre), .5 (Bad), and lower than .5 (Unacceptable). Finally, simulation studies (Cerny & Kaiser, 1977; Meyer et al., 1977; Shirkey & Dziuban, 1976) suggest that KMO increases as the overall interitem correlation (internal consistency) increases, the number of items increases, and the number of factors decreases. Note that the last two determinants can be expected from Guttman’s (1956) asymptotic image results.
We turn now to the index considered in this paper. Like KMO, individualitem MSA is a relative index that compares the magnitude of the partial correlations in which the item under study is involved to the corresponding firstorder correlations. The final version considered here (Kaiser & Rice, 1974) is
4
$MS{A}_{j}=\frac{{\displaystyle \sum _{k\ne j}^{m}{r}_{jk}{}^{2}}}{{\displaystyle \sum _{k\ne j}^{m}{r}_{jk}{}^{2}}+{\displaystyle \sum _{k\ne j}^{m}{p}_{jk}{}^{2}}}$Like KMO, Equation 4 is a normed index bounded between 0 and 1, and the closer to 1 it is, the more appropriate it is for factor analysis. Kaiser (1970), however, was not too explicit or objective when defining what this index measures exactly. So, we are told that low MSA values flag those items that do not “belong to the same family as the other items”, or that “do not sample the same content domains measured by the remaining items”. In our view, these statements are too vague.
A close scrutiny of Equation 4 suggests that MSA is expected to flag two main types of poor items: first, and above all, “noisy” items that behave almost at random, and, therefore, which lack discriminating power (see Ferrando, 2012); and second, “redundant” items that share specific content with other items in the pool. As far as the first type is concerned, consider that the expected values of both the firstorder and the partial correlations for a random item are zero. So, for both types of correlation, all the observed departures from zero reflect only sampling error, so it follows that the expected value for the MSA in this case is 0.50. As for the second type, the partial correlations between items that share specific content are expected to increase faster than the corresponding firstorder correlations (Ferrando et al., 2021) so the MSA for an item of this type is expected to decrease.
Kaiser and Rice (1974) proposed .50 as a cutoff value for discarding items that do not conform to the EFA Model 1. As discussed above, this cutoff seems reasonable for “noisy” items. Furthermore, simulation results based on random or almostrandom items repeatedly show that their expected MSA values across different conditions are indeed of about .50 (Cerny & Kaiser, 1977; Meyer et al., 1977; Shirkey & Dziuban, 1976). For “redundant” items, however, this cutoff is less clear. Our preliminary research suggests that an item of this type will be flagged with this cutoff only if it 1) has low loadings on the common factors, and 2) has several strong residual correlations with a small group of items within the pool. More generally, our research suggests that direct inspection of the partial correlation matrix P in Equation 2, or indices based only on this matrix, provide more powerful methods than MSA for detecting “redundant” items (Ferrando et al., 2021). Even so, we believe it should be pointed out that MSA is also expected to flag redundant items under certain conditions even when it is not a very sensitive index for this type of inappropriateness.
Noisy, low discriminating items and redundant items are amongst those expected to give rise to more problems when an EFA solution is fitted for purposes of item analysis. Noisy items are those that do not show substantial loadings on any factor when multiple (usually rotated) solutions are tried. Faced with this result, the researcher, who does not know whether these items measure a different factor or are pure noise, tries solutions with an increasing number of common factors. Usually this ends up in overfactoring, and with some of the nondiscriminating items having nonnegligible, totally artifactual, loadings on some of the obtained factors. As for redundant items, the problems are discussed in detail in Ferrando et al. (2021) but can be summarized in three points: 1) bad modeldata fit, 2) spurious evidence of multidimensionality, and 3) biased parameter estimates.
From the discussion above, it should be clear that, if items with MSA estimates below .50 are discarded before starting the IFA process, researchers can save themselves a lot of trouble. Furthermore, the overall suitability of the debugged interitem correlation matrix as measured by the KMO would necessarily increase. Indeed, an inspection of Equation 3 and Equation 4 clearly suggests that the overall KMO is some sort of average of the item MSAs. More specifically (proof can be obtained from the authors), the KMO is a weighted average (a linear composite) of the MSAs. This result is used in the proposal that follows.
A limitation of the ‘original’ MSA discussed so far is that it is a purely descriptive index, subject to sampling fluctuation and so potentially affected by capitalization on change. To overcome these shortcomings, we propose below a robust procedure that 1) provides confidence intervals for MSA point estimates, and 2) minimizes the risk of capitalization on change by using a crossvalidation assessment schema.
Robust MSA
As stated above, our proposal for assessing MSA within a robust context is a double one. On the one hand, we propose using bootstrap resampling to estimate confidence intervals (CIs) for MSA. If the lowest end of the CI is above Kaiser’s .50 threshold, then the corresponding item can be retained in the analysis; otherwise, the item should be removed. On the other hand, we also propose to assess the replicability of the decisions obtained from a calibration sample on the basis of further analyses in a second sample in order to avoid capitalization on chance. When the available sample is large enough to be split into two subsamples, the first subsample can be used to decide which items are to be discarded, and the second sample to assess whether the increase in the KMO value observed in the first subsample can also be replicated.
Implementation of Robust MSA
We implemented the Robust MSA procedure in three different statistical programs, and made it available (see Supplementary Materials). The utilities developed are:

The R script “RobustMSA.r”. This script uses only native functions in R, so no packages need to be downloaded. In order to use it, researchers have to store participants’ responses in a text file, update the name of the input file, and execute the script. The number of bootstrap samples, the confidence interval, and the threshold MAS value can also be configured.

The SPSS script “RobustMSA.sps”. Again, to use this script, researchers must have participants’ responses in a SPSS data file, and execute the script. The same parameters as the R script can be configured.

The Matlab function “RobustMSA.m”. To use this script, researchers must have participants’ responses in a Matlab matrix, and execute the script. The same parameters as the R script can be configured.

Finally, we implemented the Robust MSA method in our program to compute factor analysis that can be downloaded free from the site https://psico.fcep.urv.cat/utilitats/factor/. MSA is computed by default when the quality of the correlation matrix to be analyzed using factor analysis is assessed. If bootstrap sampling is active in the program, the program computes the 95% CI.
Method: Illustrative Analysis of a Real Dataset
In this section, we illustrate how robust MSA can be used to decide whether some items need to be removed from the item pool before an exploratory IFA is performed for purposes of item analysis.
Participants
The sample consisted of 1,156 participants (37.2% females), aged between 16 and 53 years (M = 21.2, SD = 4.2). This is the sample that was used to validate the test for the Spanish culture (Piera et al., 1993).
Instruments
The sample responded to the Spanish version of the ReducerAugmenter Scale (Piera et al., 1993), which has 61 binary items. The test is intended to be unidimensional. In the Spanish adaptation, the original 61 items were translated, and the authors decided to maintain them all, even though some showed a low loading on the factor that was retained. The estimated reliability of the sum scores (Cronbach’s alpha) was .847, and sum scores correlated .542 with Extraversion.
Data Analysis
The aim of the present analysis is to reanalyze the original dataset, and to assess if some of the 61 items could be removed. In order to study replicability, the sample (N = 1,156) was split into two equivalent subsamples (N = 578) using Solomon method (LorenzoSeva, in press). This method improves the representativeness of the subsamples (i.e., all possible sources of variance are contained in the subsamples). The first subsample was analyzed using Robust MSA, with 3,000 bootstrap samples, and a confidence interval of 95%. As a threshold value to decide whether an item could be removed, Kaiser’s proposal of .50 was used: the items that should remain were the ones that presented a 95% confidence interval above .50. The KaiserMeyerOlkin (KMO) statistic (Kaiser, 1970; Kaiser & Rice, 1974) was computed to assess whether the quality of the reduced correlation matrix actually increased. Finally, the items removed from the first subsample were also removed from the second so that the replicability of the outcomes obtained in the first subsample could be inspected.
Results
The KMO index for the correlation matrix between the 61 items in the first subsample was .8219. MSA sample indices and 95% confidence intervals are shown in Table 1. As can be seen in Table 1, when only the pointestimated sample MSA value was evaluated, only one item was proposed for removal: Item 17, with an MSA value of .467. However, when the 95% confidence intervals obtained with bootstrap sampling were considered, 19 items were proposed for removal (i.e., lower interval ends below .50). When these 19 items were removed, the KMO of the trimmed correlation matrix (i.e., the correlation matrix for the 42 remaining items) was .8664.
In order to inspect the replicability of this outcome, the KMO index for the second subsample was inspected. When all the items were present, the KMO value was .8201, while the KMO of the trimmed correlation matrix was .8663. The conclusion is that the increment in the KMO of the first subsample when the 19 items were removed was reproduced in the second subsample. This suggests that the 19 discarded items were not contributing substantially to the overall adequacy across samples taken from the target population, and that this result is also to be expected in the population.
Table 1
Item  MSA  95% CI  Item  MSA  95% CI 

1  .654^{a}  [.483, .700]  32  .601^{a}  [.473, .646] 
2  .701  [.558, .731]  33  .897  [.814, .898] 
3  .786  [.674, .807]  34  .871  [.707, .866] 
4  .599^{a}  [.453, .667]  35  .591^{a}  [.481, .629] 
5  .901  [.775, .893]  36  .775  [.528, .788] 
6  .674^{a}  [.446, .703]  37  .918  [.845, .914] 
7  .808  [.636, .822]  38  .655^{a}  [.490, .715] 
8  .568^{a}  [.412, .621]  39  .871  [.752, .872] 
9  .917  [.827, .910]  40  .871  [.748, .876] 
10  .866  [.733, .862]  41  .747  [.546, .768] 
11  .578^{a}  [.402, .665]  42  .675^{a}  [.493, .723] 
12  .827  [.648, .833]  43  .848  [.733, .860] 
13  .618^{a}  [.431, .675]  44  .847  [.683, .849] 
14  .661  [.561, .696]  45  .868  [.734, .869] 
15  .838  [.731, .850]  46  .549^{a}  [.400, .645] 
16  .571^{a}  [.430, .628]  47  .802  [.659, .809] 
17  .467^{a}  [.355, .595]  48  .716^{a}  [.488, .749] 
18  .781  [.590, .794]  49  .636^{a}  [.468, .689] 
19  .669  [.558, .697]  50  .908  [.831, .906] 
20  .889  [.816, .893]  51  .836  [.662, .840] 
21  .616^{a}  [.424, .689]  52  .865  [.715, .865] 
22  .861  [.752, .861]  53  .800  [.608, .817] 
23  .868  [.764, .871]  54  .677^{a}  [.471, .718] 
24  .755  [.586, .757]  55  .657  [.564, .685] 
25  .845  [.709, .847]  56  .527^{a}  [.385, .636] 
26  .735^{a}  [.474, .762]  57  .743  [.588, .767] 
27  .866  [.760, .875]  58  .920  [.853, .919] 
28  .855  [.772, .864]  59  .760  [.627, .790] 
29  .878  [.797, .880]  60  .839  [.737, .851] 
30  .699  [.509, .747]  61  .790  [.573, .795] 
31  .900  [.836, .903] 
^{a}Items proposed for removal.
Discussion
The authors of this article often review manuscripts dealing with psychometric applications, most of which, as expected, include some type of item EFA for screening or selection purposes (Muñiz & FonsecaPedrero, 2019). Now, item selection is not so straightforward, particularly in multidimensional solutions. However, our view is that too much time and effort is often spent on tasks that could be solved in a much simpler way. Indeed, our first recommendation along these lines is to “clean up” the data and discard the most offending items before starting to fit different FA solutions.
In this article we have adopted a multifaceted approach to rescue an old and forgotten index that, in our view, is quite suited to the initial debugging process mentioned above. We first discussed the rationale behind the index and why it is of interest for the task at hand using a more uptodate FA perspective. Next, we proposed an improved procedure for using the MSA index, which is based on a crossvalidation schema and provides confidence intervals around the point estimated value. In this way, the MSA becomes more of an inferential statistic than a purely descriptive index. Thirdly, we implemented our proposal in a variety of statistical programs. And finally we illustrated its usefulness with real data. We feel that practitioners now have a useful new tool in their panoply. All that remains to be seen now is to what extent it will be used.
Like any proposal of this type, ours has its shares of limitations and points that deserve further study, of which we shall discuss two before we close. First, the .50 cutoff value is the expected MSA value for an item that behaves totally at random (i.e., a totally inappropriate item with zero discrimination). However, further research on alternative cutoffs and their practical interest is warranted. A less lenient criterion may well be more useful. Second, our proposal (and the initial MSA proposal for that matter) is solely intended for productmoment correlation matrices, which implies fitting the linear FA model. In principle, the whole procedure could also be applied to tetrachoric/polychoric matrices (and so, to the nonlinear IFA model). However, some preliminary checks suggest that its use in this case might lead to results that are not so interpretable. Polychoric matrices are not productmoment matrices and their elements are estimated on a pairwise basis and have different amounts of sampling error. So, a careful study is needed to assess the behavior of the index in this case. This is also left for future research.