^{1}

^{1}

Kaiser’s single-variable measure of sampling adequacy (MSA) is a very useful index for debugging inappropriate items before a factor analysis (FA) solution is fitted to an item-pool dataset for item selection purposes. For reasons discussed in the article, however, MSA is hardly used nowadays in this context. In our view, this is unfortunate. In the present proposal, we first discuss the foundation and rationale of MSA from a ‘modern’ FA view, as well as its usefulness in the item selection process. Second, we embed the index within a robust approach and propose improvements in the preliminary item selection process. Third, we implement the proposal in different statistical programs. Finally, we illustrate its use and advantages with an empirical example in personality measurement.

In the world of factor analysis (FA),

In this article we review an index of the type discussed above:

We shall now discuss the choice of scenario and the potential role that MSA can play. To start with, we consider EFA to be the most appropriate model for item analysis, especially in the initial stages in which the grossly inappropriate items are discarded (

Consider the well-known correlational structure of the EFA model

where

Consider now, the following transformation of the inverse of

Clearly, if ^{-1}^{-1}

The most popular (or the least forgotten) index derived from the results just described is the Kaiser-Meyer-Olkin (KMO) measure of overall adequacy (

where

We turn now to the index considered in this paper. Like KMO, individual-item MSA is a relative index that compares the magnitude of the partial correlations in which the item under study is involved to the corresponding first-order correlations. The final version considered here (

Like KMO,

A close scrutiny of

Noisy, low discriminating items and redundant items are amongst those expected to give rise to more problems when an EFA solution is fitted for purposes of item analysis. Noisy items are those that do not show substantial loadings on any factor when multiple (usually rotated) solutions are tried. Faced with this result, the researcher, who does not know whether these items measure a different factor or are pure noise, tries solutions with an increasing number of common factors. Usually this ends up in over-factoring, and with some of the non-discriminating items having non-negligible, totally artifactual, loadings on some of the obtained factors. As for redundant items, the problems are discussed in detail in

From the discussion above, it should be clear that, if items with MSA estimates below .50 are discarded

A limitation of the ‘original’ MSA discussed so far is that it is a purely descriptive index, subject to sampling fluctuation and so potentially affected by capitalization on change. To overcome these shortcomings, we propose below a robust procedure that 1) provides confidence intervals for MSA point estimates, and 2) minimizes the risk of capitalization on change by using a cross-validation assessment schema.

As stated above, our proposal for assessing MSA within a robust context is a double one. On the one hand, we propose using bootstrap re-sampling to estimate confidence intervals (CIs) for MSA. If the lowest end of the CI is above Kaiser’s .50 threshold, then the corresponding item can be retained in the analysis; otherwise, the item should be removed. On the other hand, we also propose to assess the replicability of the decisions obtained from a calibration sample on the basis of further analyses in a second sample in order to avoid capitalization on chance. When the available sample is large enough to be split into two subsamples, the first subsample can be used to decide which items are to be discarded, and the second sample to assess whether the increase in the KMO value observed in the first subsample can also be replicated.

We implemented the Robust MSA procedure in three different statistical programs, and made it available (see

The R script “RobustMSA.r”. This script uses only native functions in R, so no packages need to be downloaded. In order to use it, researchers have to store participants’ responses in a text file, update the name of the input file, and execute the script. The number of bootstrap samples, the confidence interval, and the threshold MAS value can also be configured.

The SPSS script “RobustMSA.sps”. Again, to use this script, researchers must have participants’ responses in a SPSS data file, and execute the script. The same parameters as the R script can be configured.

The Matlab function “RobustMSA.m”. To use this script, researchers must have participants’ responses in a Matlab matrix, and execute the script. The same parameters as the R script can be configured.

Finally, we implemented the Robust MSA method in our program to compute factor analysis that can be downloaded free from the site

In this section, we illustrate how robust MSA can be used to decide whether some items need to be removed from the item pool before an exploratory IFA is performed for purposes of item analysis.

The sample consisted of 1,156 participants (37.2% females), aged between 16 and 53 years (

The sample responded to the Spanish version of the Reducer-Augmenter Scale (

The aim of the present analysis is to reanalyze the original dataset, and to assess if some of the 61 items could be removed. In order to study replicability, the sample (

The KMO index for the correlation matrix between the 61 items in the first subsample was .8219. MSA sample indices and 95% confidence intervals are shown in

In order to inspect the replicability of this outcome, the KMO index for the second subsample was inspected. When all the items were present, the KMO value was .8201, while the KMO of the trimmed correlation matrix was .8663. The conclusion is that the increment in the KMO of the first subsample when the 19 items were removed was reproduced in the second subsample. This suggests that the 19 discarded items were not contributing substantially to the overall adequacy across samples taken from the target population, and that this result is also to be expected in the population.

Item | MSA | 95% CI | Item | MSA | 95% CI |
---|---|---|---|---|---|

1 | .654^{a} |
[.483, .700] | 32 | .601^{a} |
[.473, .646] |

2 | .701 | [.558, .731] | 33 | .897 | [.814, .898] |

3 | .786 | [.674, .807] | 34 | .871 | [.707, .866] |

4 | .599^{a} |
[.453, .667] | 35 | .591^{a} |
[.481, .629] |

5 | .901 | [.775, .893] | 36 | .775 | [.528, .788] |

6 | .674^{a} |
[.446, .703] | 37 | .918 | [.845, .914] |

7 | .808 | [.636, .822] | 38 | .655^{a} |
[.490, .715] |

8 | .568^{a} |
[.412, .621] | 39 | .871 | [.752, .872] |

9 | .917 | [.827, .910] | 40 | .871 | [.748, .876] |

10 | .866 | [.733, .862] | 41 | .747 | [.546, .768] |

11 | .578^{a} |
[.402, .665] | 42 | .675^{a} |
[.493, .723] |

12 | .827 | [.648, .833] | 43 | .848 | [.733, .860] |

13 | .618^{a} |
[.431, .675] | 44 | .847 | [.683, .849] |

14 | .661 | [.561, .696] | 45 | .868 | [.734, .869] |

15 | .838 | [.731, .850] | 46 | .549^{a} |
[.400, .645] |

16 | .571^{a} |
[.430, .628] | 47 | .802 | [.659, .809] |

17 | .467^{a} |
[.355, .595] | 48 | .716^{a} |
[.488, .749] |

18 | .781 | [.590, .794] | 49 | .636^{a} |
[.468, .689] |

19 | .669 | [.558, .697] | 50 | .908 | [.831, .906] |

20 | .889 | [.816, .893] | 51 | .836 | [.662, .840] |

21 | .616^{a} |
[.424, .689] | 52 | .865 | [.715, .865] |

22 | .861 | [.752, .861] | 53 | .800 | [.608, .817] |

23 | .868 | [.764, .871] | 54 | .677^{a} |
[.471, .718] |

24 | .755 | [.586, .757] | 55 | .657 | [.564, .685] |

25 | .845 | [.709, .847] | 56 | .527^{a} |
[.385, .636] |

26 | .735^{a} |
[.474, .762] | 57 | .743 | [.588, .767] |

27 | .866 | [.760, .875] | 58 | .920 | [.853, .919] |

28 | .855 | [.772, .864] | 59 | .760 | [.627, .790] |

29 | .878 | [.797, .880] | 60 | .839 | [.737, .851] |

30 | .699 | [.509, .747] | 61 | .790 | [.573, .795] |

31 | .900 | [.836, .903] |

^{a}Items proposed for removal.

The authors of this article often review manuscripts dealing with psychometric applications, most of which, as expected, include some type of item EFA for screening or selection purposes (

In this article we have adopted a multi-faceted approach to rescue an old and forgotten index that, in our view, is quite suited to the initial debugging process mentioned above. We first discussed the rationale behind the index and why it is of interest for the task at hand using a more up-to-date FA perspective. Next, we proposed an improved procedure for using the MSA index, which is based on a cross-validation schema and provides confidence intervals around the point estimated value. In this way, the MSA becomes more of an inferential statistic than a purely descriptive index. Thirdly, we implemented our proposal in a variety of statistical programs. And finally we illustrated its usefulness with real data. We feel that practitioners now have a useful new tool in their panoply. All that remains to be seen now is to what extent it will be used.

Like any proposal of this type, ours has its shares of limitations and points that deserve further study, of which we shall discuss two before we close. First, the .50 cut-off value is the expected MSA value for an item that behaves totally at random (i.e., a totally inappropriate item with zero discrimination). However, further research on alternative cut-offs and their practical interest is warranted. A less lenient criterion may well be more useful. Second, our proposal (and the initial MSA proposal for that matter) is solely intended for product-moment correlation matrices, which implies fitting the linear FA model. In principle, the whole procedure could also be applied to tetrachoric/polychoric matrices (and so, to the nonlinear IFA model). However, some preliminary checks suggest that its use in this case might lead to results that are not so interpretable. Polychoric matrices are not product-moment matrices and their elements are estimated on a pairwise basis and have different amounts of sampling error. So, a careful study is needed to assess the behavior of the index in this case. This is also left for future research.

For this article source code in R, SPSS and Matlab are available via the PsychArchives repository (for access see

The authors have no funding to report.

The authors have declared that no competing interests exist.

This project has been made possible by the support of the Ministerio de Ciencia e Innovación, the Agencia Estatal de Investigación (AEI) and the European Regional Development Fund (ERDF; PID2020-112894GB-I00).