A Simulation-Based Scaled Test Statistic for Assessing Model-Data Fit in Least-Squares Unrestricted Factor-Analysis Solutions

A shortcoming of least-squares unrestricted factor analysis (UFA) procedures, which are widely used in psychometric applications is that a test statistic for assessing model-data fit cannot be easily derived from the minimum fit function value. This paper proposes a chi-square type goodness-of-fit test statistic intended for the principal-axis, MINRES, and minimum-rank UFA procedures. The statistic is empirically obtained via intensive simulation based on a two-stage approach. First, a distribution of minimum fit function values is obtained from a scenario in which the null hypothesis of perfect model-data fit holds. Second, the obtained statistic is non-linearly transformed so that it has its first four moments equal to those of the theoretical reference chi-square distribution with the appropriate degrees of freedom. Extensions of the basic statistic are next proposed that include comparative and relative indexes based on it. Tests of close-fit and power assessment derived from the basic statistic are also proposed.

though it is more indeterminate than a restricted solution, the unrestricted solution is much more flexible and does not require zero-loading constraints to be imposed for identifying the pattern, which means that the variables in a UFA solution are allowed to be factorially complex. This specification-related flexibility makes UFA a very valuable (possibly the best) tool in many psychometric applications, particularly, at the early stages of test development, when the aim is to assess the dimensionality of an item pool without imposing any particular relational structure to the data (Ferrando, 2021). And, at the latter stages, UFA continues to be a flexible and versatile tool for assessing the structure of an instrument when some of its items are complex.
The position taken in this paper is that in the scenario summarized above, not only is generally UFA the most appropriate model but also that, under rather common conditions, an UFA solution is expected to work better when fitted with simple procedures. To be more specific, the common conditions are: (a) a large number of items, (b) not too large samples, and (c) complex structures (e.g., Muñiz & Fonseca-Pedrero, 2019). And, with regards to the simple estimation procedures, we shall consider here the family of UFA procedures based on the unweighted least squares (ULS) criterion (see below). These methods have been considered (somewhat disparagingly) as approximate or second rate with respect to more statistically rigorous methods such as Maximum Likelihood (ML) or Generalized Least Squares (GLS). A literature review, however, (and also our experience) suggests that, when compared to ML or GLS solutions in the scenario considered here, ULS solutions are faster, computationally simpler, robust, stable (particularly for categorical variables), less prone to arrive at improper solutions, and less likely to be affected by minor irrelevant factors (Ferrando & Lorenzo-Seva, 2017). Possibly for these reasons, the ULS-based methods based on both continuous and discrete variables are commonly used in the type of applications considered here (Revelle, 2022).
Meaningful assessment of the appropriateness of an UFA solution in the scenario described above requires a complex and multifaceted approach to be undertaken. Good ness of model-data fit (GOF), as based on the chi-squared test statistic, is, in principle, a basic property that has to be assessed, and here lies the focus of the present proposal.
As for the intended use of the proposal, we do not consider the chi-squared test statistic as the final measure of fit, but as, (a) a useful measure when accompanied by power information and, (b) a necessary basis for obtaining GOF indices that might function well in psychometric applications of the UFA model. Also we do not regard these new indices as substitutes for those that currently exist and that do work, but rather as useful complements to them. Finally, the inherent difficulties of theoretically deriving a chi-squared statistic because of both the properties of the ULS estimator and the characteristics of the item scores (see below), suggest that intensive simulation is an appropriate approach for arriving at this type of statistic.

Aims of the Proposal
The present article proposes an empirical test statistic in chi-square metric for assessing goodness of model-data fit in UFA solutions based on the ULS criterion, specifically: Principal-Axis-Factoring (PAF), ULS-MINRES, and Minimum Rank Factor Analysis (MRFA). The UFA-ULS solutions, in turn can be based on the standard linear FA model in which the variables are treated as continuous or on the non-linear UVA-FA model in which they are treated as ordered-categorical.
Our proposal is fully empirical, and avoids theoretical developments at the cost of intensive simulation which, given the capabilities of modern computers, is perfectly affordable. The basic idea is to combine: (a) the rationale of previous empirical proposals based on the idea of sampling from a simulated scenario in which the model holds exactly with, (b) non-linear transformations that bring the distribution of the resulting fit statistic close to the expected chi-square distribution. We shall label the approach as LOSEFER, as an acronym of Lorenzo-Seva and Ferrando's approach.

Description of the Procedure
Consider a set of m observed variables related to p common factors, the population standardized variance-covariance (i.e., correlation) matrix Σ (m × m) among the set of observed variables, and the corresponding estimate R (m × m) obtained in a sample of N observations. When R is obtained from a large and representative sample from the population, R is expected to be a good estimate of Σ.
The direct UFA model decomposes Σ as, where Γ is the loading matrix (m × p), and Ψ is the diagonal matrix (m × m) with the unique variances in the main diagonal. When an UFA solution is fitted to sample data, the aim is to estimate matrices Γ and Ψ from the observed matrix R. In terms of the sample estimate, matrix R is decomposed as, where A (m × k) and U (m × m) are the corresponding estimates of Γ and Ψ in Expression (1), and E (m × m) represents the amount of observed covariance in R that cannot be accounted for by the sample factor model. When k (the number of factors in the sample model) is chosen to equal p (the number of factors in the population model), the observed values in E will tend to be zero, and the estimated factor model is expected to attain an appropriate goodness-of-fit. However, the typical situation when fitting a UFA solution (especially when used with exploratory purposes) is that the value of p is not known, so that k is assigned a tentative value that aims to achieve an appropriate goodness-of-fit level for the sample factor model.

Goodness-of-Fit Assessment
In order to define a scalar goodness-of fit test statistic for a prescribed UFA solution obtained from the PAF, MINRES or MRFA approaches, the matrix E is derived from Expression (2) as, and the discrepancy function we shall consider here is: where the e ij 2 terms are the non-diagonal elements of E. So, the discrepancy function (4) is the sum of non-duplicated squared residuals between the observed and reproduced correlation matrix. The test statistic we consider based on this discrepancy function is now: The discrepancy function in (4) is the ordinary or unweighted least squares (ULS) function, which is the simplest in covariance structure analysis. As mentioned above, all the methods considered here: PAF, ULS-MINRES and MRFA are essentially ULS methods, and so are based on the minimization of (4). If the ULS estimates were asymptotically efficient, the distributional assumptions mentioned above were met, and the null hypothesis of exact fit in (1) would hold, then (5) would be asymptotically distributed as a chi-square variable with degrees of freedom, (see e.g., Lawley, 1940). Because the ULS estimates are never asymptotically efficient, neither for continuous nor for ordered-categorical solutions, and the fulfillment of the distributional assump tions cannot be taken for granted, a naïve theoretical chi-square reference distribution cannot be claimed for (5). To address this problem, the proposal here is to non-linearly transform the test statistic (5) so that, when the fitted solution holds in the population, the transformed statistic closely approaches a central chi-square distribution with de grees of freedom (6). Next, the sample test statistic undergoes the same transformation, so that, when null hypothesis (H 0 ) holds, it is interpretable as a value sampled from a chi-square distribution. Overall, then, our proposal can be regarded as a correction of a chi-square type test statistics that brings its distribution closer to that expected theoreti cally when the null hypothesis holds. The main basis difference with existing corrections of this type (i.e., robust statistics) is that the correction is not based on asymptotic theory but is empirical. Finally, as for requirements and assumptions for undertaking the transformation, the basic requirement is that both, the observed and reproduced covariance matrices are positive definite (see Lorenzo-Seva & Ferrando, 2021). And, as for the basic assumptions, being (4) a sum of squares that adopts only positive values, it is assumed that the distribution of (5) will be positively skewed, and will approach normality as the model becomes larger.

Empirically Obtaining the Scaled Test Statistic in Chi-Square Metric
Once an UFA solution has been fitted to an observed correlation matrix R, the repro duced variance-covariance matrix is defined as, Let's take this R*, as if it was a true population matrix, and decompose it using Choles ky's method, If Z (N × m) is a random matrix with columns normally distributed N(0, 1), the product, produces a population matrix X (N × m). From this population matrix X, random samples X i (N i × m) of observed scores are sampled for which the corresponding correlation matrix R i is an estimate of the true population matrix R*. Matrix R i is then factor analyzed in order to obtain estimates A i , U i , and Finally, the test statistic c i is obtained using expression (5) applied to matrix E i . The process is repeated for an arbitrary number of times K (i = 1…K), in order to obtain a vector of c that contains the K values c i . The distribution of the elements of this vector is then the distribution of the uncorrected test statistic when the null hypothesis holds. In the studies presented in this document, we used K = 1,000 and N = 100,000, and N i equal to the size of the sample used to obtain observed matrix R, and arrived at acceptable results. In addition, it must be pointed out that, if X is a set of ordinal variables, each X i must be discretized using the empirical thresholds estimated from X, and the computed correlation matrices must be based on polychoric correlations. So far, our proposal is similar to previous existing developments, and particularly to Bollen and Stine's (1992) bootstrapped approach (see also Corrêa Ferraz et al., 2022). The basic idea, in effect, is to sample (or resample in the bootstrapping approach) from a population in which the null hypothesis exactly holds. Furthermore, this condition is obtained by using the parent sample matrix as a basis (i.e., R* is taken as if it was a Σ in Equation 1). More specifically, the original data is transformed (according to (8) and (9) in our proposal) so that is forced to satisfy the null hypothesis.
From here on, however, our proposal differs from the previous related developments. Bollen and Stine (1992) considered only the ML-based scenario for continuous outcomes, in which the test statistic was expected to truly follow a central chi-square distribution under the null hypothesis. Second, Bollen and Stine's (1992) proposal was not intended to make the empirical bootstrap distribution closer to the theoretical chi square distribution, but rather, to obtain reference p values to which the untransformed sample test statistic could be compared.
Continuing with our proposal, once c is available, what we propose is nonlinearly transform it using a third degree polynomial, so that the first moment, and the second, third, and fourth central moment estimates of the transformed c coincide with those of the reference chi-square distribution with degrees of freedom (df) in (6). In more detail, the coefficients of the polynomial (11) are obtained by solving the following system, Where E() is used for expectation. Technical details on how to determine the polynomial coefficients in (11) from the system (12) can be obtained from the authors. As a summary, of the different solving procedures we tried, the most effective was two step. First, the original c values were: (a) cube-root transformed, and (b) transformed to have the first four moments of a standard normal variable (i.e., the first four moments in the system being 0, 1, 0 and 3). Second, the normal-transformed variable was transformed again using Fleishman's (1978) procedure to obtain a chi-square distribution from a standard normal distribution, so that the final transformed variable have the first four moments as close as possible to those in (12). Once the coefficients a, b 1 , b 2 , and b 3 have been obtained, we can now factor analyze the sample correlation matrix R and compute the sample-observed c statistic by using expressions (4) and (5). Next, c is transformed to y using the fitted polynomial (11) and this transformed value is interpreted with relation to a chi-square distribution with degrees of freedom (6).
A crucial point in interpreting the transformed sample c value as a proper chi-square type test statistic is, indeed, that not only the first four moments, but its distribution in general would adhere to the theoretical distribution under the null hypothesis of model-data fit. Strictly speaking, this adherence cannot be guaranteed, so we assessed this point using intensive simulation. Full results are provided below. However, it can be advanced that our proposal shows a considerable viability in this respect.

Beyond the Scaled Test Statistic: GOF Indices Test of Close Fit and Power Analysis
Provided that y approaches closely enough the corresponding reference distribution, it can be further used as a basis for computing meaningful point estimates of selected GOF statistics, so that the proposed solution can be more thoroughly assessed. We shall propose to derive CFI and RMSEA point estimates directly from the transformed y statistic. However, it should be clear that this is only an initial proposal that is expected to be updated as more information about the performance of GOF indices in UFA will become available.
Let λ 1 = y 1 -df 1 the noncentrality parameter estimate for the solution under study, and λ o = y o -df o the corresponding estimate for the solution with zero common factors (i.e., the null or baseline model). In terms of y, the comparative fit index point estimate can then be obtained as: And the RMSEA point estimate as: Our view, however, is that meaningful information of a GOF statistic requires not only the point estimate to be reported, but also the corresponding confidence interval. In the implementation approach in which the indices proposed here are programed (see below), 90% confidence intervals are reported based on bootstrap resampling. The choice of the RMSEA as an index directly derived from the test statistic allows two further pieces of important information to be obtained: the test of close fit and pow er analysis (Lee et al., 2012). In our view, this information is highly relevant for avoiding two common pitfalls when fitting UFA solutions. The first is to use too small samples in order to achieve a better fit (at the cost of a gross loss of power). The second is to over-factor with the same aim, a practice that ends up in considering as relevant trivial, uninterpretable minor factors which are devoid of any substantive interest (Ferrando & Lorenzo-Seva, 2018).
The implementation is straightforward. In our proposal we have set the null (close fit) and alternative hypothesis as, With regards to power assessment, we have chosen the approach by Lee et al. (2012) in which the noncentrality parameter that expresses the lack of fit in the population is obtained by setting RMSEA values under H 0 and H 1 . In particular, in our implementation, power is computed as the capacity for distinguishing between a close fit solution (H 0 : RMSEA = 0.05) and a moderately misspecified solution (H 1 : RMSEA = 0.08).

Simulation Studies
Two simulation studies have been carried out. The first aims to assess if statistic y in (11) actually: (a) has the expected chi square distribution when the true factor model is actually fitted in the sample data, and (b) leads to rejection rates close to those expected under the chi square distribution. The second study aims to assess the rejection rates when the sample factor model is misspecified.

First Simulation Study
A Monte Carlo simulation study was carried out using samples drawn from a true population model. Based on Expression (1), a population loading matrix was defined in which each observed variable had a salient loadings had a values of .70, and unicity equal to 1 minus communality of the variable. The number of factors and the number of salient variables per factor were manipulated in order to produce models with different degrees of freedom: • 5 degrees of freedom: The population matrix was defined by a single factor and five variables. • 64 degrees of freedom: The population matrix was defined by two factors and seven salient variables per factor. • 207 degrees of freedom: The population matrix was defined by three factors and eight salient variables per factor. • 492 degrees of freedom: The population matrix was defined by four factors and nine salient variables per factor.
The sample sizes were also manipulated: 200, 500, and 800. From each true population matrix, 500 samples were obtained, and a total of 6,000 were factor analyzed. For each sample, the true number of factors was extracted using two extraction methods: ULS/Minres, and Principal Axes (PA); and the y statistic computed after each extraction. The statistic y was computed using population samples of 100,000 and the number of random samples extracted from the population were K = 1,000.
The four firsts moments of the empirical distribution of y were computed and com pared to the expected ones for the theoretical distribution of chi square statistic with degrees of freedom df. Table 1 shows the outcomes related to ULS/MINRES extraction. Kurtosis values are printed as zero centered (i.e., kurtosis minus 3). Mean and variances were slightly overestimated, especially in small samples. The estimates of skewness and kurtosis in general do not differ from the values expected in the population, and only in small models (5 degrees of freedom) and large samples (N = 800) the estimates are overestimating the expected values.
Rejection rates after ULS/MINRES factor extraction are shown in Table 2. The worse rejection rates were observed when the sample size was small (N = 200). In addition, the worse rejections rates estimates were the ones expected to be .001. It means that is at the farthest tail of the distribution of statistic y is where less adherence to the chi-square distribution is observed. From a practical point of view, the rejection levels observed are reasonable. Note. 95th confidence intervals are shown in parenthesis.  Note. 95th confidence intervals are shown in parenthesis. Table 3 shows the outcomes related to PA extraction. Mean and variances were slight ly underestimated, especially in large samples. Again, the estimates of skewness and kurtosis in general do not differ from the values expected in the population, and only in small models (5 degrees of freedom) and large samples (N = 800) the estimates are underestimating the expected values. Rejection rates after PA factor extraction are shown in Table 4. The outcomes are quite similar to those obtained after ULS/MINRES extraction pattern. However, PA ex traction obtained slightly better rejection rates than ULS/MINRES. Again, rejection rates improve when sample sizes are large. Finally, RMSEA, CFI and NNFI goodness-of-fit indices were computed. As the model that was fitted to the sample data systematically corresponded to the true population model, the values of these indices should indicate in all cases that an acceptable model fit had been attained. Table 5 shows the mean of goodness-of-fit indices after ULS/MINRES factor extraction. As can be observed, a good model fit was always reported. In addition, as the sample size became larger, the goodness-of-fit values improved for all the indices.   Table 6 shows the mean of goodness-of-fit indices after PA factor extraction. Once again, a good model fit was always reported. It must be pointed out that the values obtained here suggested a better fit than those obtained after ULS/MINRES extraction. In addition, the estimates did not seem so much influenced by the sample size, as it happened after ULS/MINRES extraction. The conclusion of the first simulation study is that, when the correct model is fitted to the sample data, the distribution of the y statistic shows a reasonable adherence to the expected chi-square distribution, and the related goodness-of-fit indices can be safely interpreted.

Second Simulation Study
The second study is mainly concerned with assessing the power and sensitivity of our proposed statistic. To assess so, we replicated the first simulation study with two variations. First, the number of factors retained in the sample data was one less than the true number in the population. Second, only three population models were considered: the models with 64, 207 and 492 degrees-of-freedom in the population.
ULS/MINRES and PA extraction methods were used to extract the incorrect number of factors. As the sample model was not the one that existed in the population, goodnessof-fit indices derived from y should suggest an improper model adjustment. In addition, the values should worsen when the discrepancy between the sample and the population models increases. Thus, if the population model had four factors, and the sample model is adjusted for three factors, the misspecification is not so strong as if the population model had two factors, and the sample model is adjusted for one single factor. The values of the goodness-of-fit indices should be sensitive to the different degrees of misspecification.
RMSEA, CFI and NNFI goodness-of-fit indices were again computed. Table 7 shows the mean of the goodness-of-fit indices after ULS/MINRES factor extraction. As can be observed, an unacceptable model fit was always reported for all the indices, and the worse goodness-of-fit values were related to the solutions with the lowest degrees of freedom. It must be noted that, when the sample size was large, the indices reported better goodness-of-fit values, but these values were still farther away from the usual threshold values used in applied research for judging the fit as acceptable. Table 8 shows the mean of goodness-of-fit indices after PA factor extraction. As can be observed, the values of the goodness-of-fit are quite similar to the ones obtained after ULS/MINRES extraction.
The conclusion of the second simulation study is that, when the sample model is incorrectly specified, the goodness of fit indices derived from the proposed y statistic are expected to detect that the proposed model is wrong under most of the conditions expected to occur in practice.

Implementation
The code file developed is the R script "Losefer.r". This script uses only native functions in R, so no packages need to be downloaded. In order to use it, researchers have to store participants' responses in a text file, update the name of the input file, and execute the script. The script is implemented to allow different extraction procedures (Principal Component Analysis, Centroid, and Principal Axes). With this example script, research ers can easily adapt the code to use other extraction methods. We made it available via the PsychArchives repository.
In addition, we implemented the full item selection proposal in our program to compute factor analysis, that can be downloaded free from the site psico.fcep.urv.cat/util itats/factor. The computing is offered as a LOSEFER chi-square adjustment method, and can be computed with Principal Component Analysis, ULS/MINRES, MRFA and ML. In this software, the response format can be linear variables and graded response variables.

Illustrative Example
A set of six items from the Statistical Anxiety Test (Vigil-Colet et al., 2008) was used for the illustrative example. All the items correspond to the Anxiety to Examination subscale, and are responded on a 5-point graded format. A sample of 459 undergraduate students from the first course of a degree on psychology answered the test.
A Robust Unweighted Least Squares (RULS) solution was first computed, and the chi-square statistic derived from this method was scaled using the mean and variance adjustment. In addition, the statistic was also corrected using the method proposed in this document. Finally, ULS/MINRES, PAF, and Minimum Rank Factor Analysis (MRFA) solutions were computed, and the corresponding y statistics were obtained as proposed in this article.
Initial information was obtained by assessing the existing fit statistics that are not derived from the chi-square test. The output of Parallel Analysis suggested that the unidimensional solution was the most appropriate. The percentage of explained common variance (ECV) was 0.77, below the 0.80 cut-off most commonly used. Finally, the RMSR and GFI estimates were 0.06 and 0.99 respectively. Overall, these results suggest that a single dominant factor underlies the responses to these 6 items, but that this factor is not yet able to fully account for the inter-item correlations.
The chi-squared derived GOF statistics here were RMSEA, and CFI. In addition, the Non-Normed Fit Index (NNFI) was also obtained. In the ULS/MINRES, PAF, and MRFA cases, these indices were computed based on the g values obtained after Losefer adjustment. Table 9 shows a summary of the goodness of fit results. Note. df = 9.
The four methods compared in the study (RULS, ULS/MINRES, PAF, and MRFA) arrived at similar estimates of both the chi square statistic, and goodness-of-fit indices. This outcome could be expected, because they were assessing exactly the same factor solution. However, MRFA reported a lower chi square index, and the goodness-of-fit indices also suggest that the factor solution fitted better than that fitted by the other three extraction methods. As a single factor was extracted in all cases, the only source that can explain this difference is that the loading values estimated by MRFA gave rise to a reproduced correlation matrix that was closer to the observed correlation matrix. Finally, the close-fit and power results based on the PA-Losefer solution (which seems to be the most widely agreed) are reported in Table 10. The chi-square based outcomes agrees with the initial measures of fit above but, as expected, seem to be more sensitive to detect model misspecification, especially the test statistic itself and the RMSEA. Clearly, it cannot be accepted now from the first outcome in Table 10 that a single factor closely fits the scale data in this example. Perhaps still more important, however, the power results suggest that, in such a small model (only 9 degrees of freedom), the power for distinguishing a moderate misspecification from a close fit is still unacceptably low, and a much larger sample should have been collected for this purpose.

Discussion
Adjudging the appropriateness of an UFA solution is a multi-faceted process that goes far beyond goodness of model-data fit (e.g., Ferrando & Lorenzo-Seva, 2018). At the same time, however, we believe that the basis test of fit statistic must necessarily be part of this process. It (a) provides relevant information on its own, especially when accompanied by power information, and (b) is the basis for computing goodness of fit indices that can provide additional information regarding the approximate or the relative fit of the proposed solution.
So far, the chi-squared test of fit statistic is available for certain statistical (in Lawley, 1940's, terms) UFA estimation procedures that either are fully efficient or for which theoretical corrections that compensate for its lack of efficiency are available. The test statistic, however, is not available for more humble UFA procedures that are often refer red to, slightly derogatorily, as "approximate". In certain applied scenarios, however, it happens that these approximate procedures show advantages that more than compensate for their lack of statistical efficiency. So, to derive a basis chi-squared test of fit that can be used with these procedures is an issue of clear interest (e.g., Harman & Jones, 1966).
In this article we have proposed and implemented a test statistic of this type that can be used with ordinary least squares UFA solutions, solutions which, in turn, can be based on both the linear and the non-linear FA model. Overall, we consider that the proposed statistic works quite well under most of the conditions considered in the simulation. As a summary, it (a) closely adheres to the expected distribution under the null hypothesis, (b) demonstrates power and sensitivity for detecting a wrongly specified solution (if enough sample is available), and (c) allows for meaningful goodness of fit indices, tests of close fit, and power estimates to be derived. We acknowledge indeed that we (partly) make use of existing methodology for variable transformation, and also that, at its initial stages, our proposal is based on previous developments, particularly the bootstrapped approach by Bollen and Stine (1992). From here on, however, we consider our proposal to be mostly a new contribution.
As any initial proposal, this has its share of limitations and points that deserve fur ther study. Regarding limitations, we have avoided complex theoretical developments at the cost of intensive simulation. So, the procedure places strong computational demands and can be time consuming. However, the computing power of informatic equipment that is nowadays available for researchers should help to make the proposal rather feasible.
As for points that require further study, to start with further intensive simulation about its functioning in a variety of conditions beyond those considered here are clearly warranted. On the other hand, it is still not clear at present which are, (a) the chi-squarebased fit indices, (b) the appropriate thresholds for these indices, and (c) the close-fit-test or power specifications that work best with UFA solutions. So, most of what is proposed here must be considered as tentative and is expected to be updated as more information will be available.
In spite of the shortcomings noted above, we believe that this proposal has great interest, wide applicability, and will be very useful for the FA practitioner, and more so taking into account that is implemented as a resource in a free, widely known and user-friendly UFA program, as well as in two of the best known statistical programs at present.