We review common situations in Bayesian latent variable models where the prior distribution that a researcher specifies differs from the prior distribution used during estimation. These situations can arise from the positive definite requirement on correlation matrices, from sign indeterminacy of factor loadings, and from order constraints on threshold parameters. The issue is especially problematic for reproducibility and for model checks that involve prior distributions, including prior predictive assessment and Bayes factors. In these cases, one might be assessing the wrong model, casting doubt on the relevance of the results. The most straightforward solution to the issue sometimes involves use of informative prior distributions. We explore other solutions and make recommendations for practice.

In Bayesian modeling, prior distributions for the covariance matrix often involve the inverse-Wishart (IW) distribution due to its conditional conjugacy. However, the IW distribution can be problematic because it assumes the same amount of prior information for the entire covariance matrix. To overcome this challenge, various strategies have been suggested for separately specifying priors on the variance and correlation parameters underlying the covariance matrix (

The situation becomes even more complicated when the covariance matrix has model-imposed constraints, which can arise in SEMs with correlated residuals or with across-group equality constraints. In these models, we cannot impose an IW (or other prior) on the full covariance matrix because those priors will not respect the constraints imposed by the model. An easily-implemented approach is to place priors on individual parameters within the covariance matrix (and we consider other approaches later). The problem is that, when we build a covariance matrix using these parameters, the resulting matrix will sometimes be non-positive definite. We elaborate below.

An example comes from the popular “Political Democracy” model originally described by

As described earlier, we could elect to put a univariate prior distribution on each variance (or standard deviation or precision) in this matrix, and also on the six residual correlations that are not fixed to zero. But this is problematic because the univariate priors on correlations can yield non-positive definite correlation matrices. The specific problem that occurs depends on the software package. For example, JAGS will stop as soon as it encounters a correlation matrix that is not positive definite. This means that univariate priors on correlations cannot often be used. On the other hand, Stan will report that a non-positive definite matrix was encountered, reject it, and continue sampling. We are left with univariate priors that are collectively constrained to be positive definite. If we consider only the space of positive definite correlation matrices under these priors, then the priors are usually more informative than we originally specified. That is, the prior distributions are opaque: the analyst specifies a set of priors that are different from the implied prior distributions, which must obey the constraint of positive definiteness.

How can we characterize the implied prior distributions? We could simply generate thousands of correlation matrices from the prior, then discard matrices that are not positive definite, then visualize what is left. As an example of this, we specified a Uniform(

To technically describe the positive definite constraint here, we can simultaneously permute the rows and columns of the correlation matrix to obtain a block diagonal matrix. A block diagonal matrix is useful because the determinant of the full matrix is the product of determinants of each individual block within the matrix. This allows us to examine whether or not the full matrix is positive definite, by working with submatrices of smaller dimension.

The Cuthill-McKee algorithm (

After applying traditional rules for computing matrix determinants, we can express the determinant of the

This shows analytically how the positive definite constraint of our correlation matrix influences our univariate priors. The univariate priors are collectively constrained by the requirement that

To show how this issue becomes problematic for applied work, we consider the calculation of Bayes factors using the Bollen model. Imagine that we wish to know whether the two residual correlations involving

The Savage-Dickey method is slightly more complicated than usual because the two residual correlations in question influence the possible values that other residual correlations can take (due to positive definiteness). This changes the prior distributions on other residual correlations, as we move from a model with the two focal correlations freed to a model with the two focal correlations fixed.

If we ignore (or do not realize) all of the above and set Uniform priors on each individual correlation, then the prior density of the two focal correlations at 0 equals

These evaluations lead to two separate Bayes factors for the model with correlations, relative to the model without correlations. Using our incorrect priors that do not account for positive definite constraints, we obtain a log-Bayes factor of 4.66 in favor of the model with correlations. Using our priors that do account for positive definite constraints, along with the correction from

We agree with you, the reader, that these cutoffs are arbitrary and that the Bayes factors do not differ by very much. But the point is that the Bayes factor systematically differ depending on how we compute prior densities. These differences will sometimes lead to different substantive conclusions in practice, with the easier computation (i.e., ignoring the positive definite constraint) being incorrect.

While it was straightforward to visualize implied prior distributions in the Bollen model, the process becomes inefficient for correlation matrices whose dimension is larger than 3 or 4. For such correlation matrices, few of the randomly-generated matrices will be positive definite, and it will take a long time to obtain a sufficient number of positive definite matrices to describe the implied prior. Additionally, every unique structure of correlation matrix will have a unique positive definite constraint, similar to the one from

The simplest, partial solution is to maintain the univariate priors on individual correlations, but make those priors informative around 0. For example, instead of placing Uniform priors on the correlations, we might use Beta distributed priors. The Beta distribution is typically defined for the interval

A more general solution to this problem comes from putting priors on the Cholesky decomposition of the correlation matrix, which is related to the

For the Political Democracy model, the

A disadvantage of this approach is that the entries of the Cholesky decomposition do not necessarily have intuitive interpretations, so that it is difficult to set informative priors. Each diagonal entry is related to the portion of the corresponding variable’s variance that cannot be accounted for by variables that occur further to the left of the matrix. Each off-diagonal entry is related to a partial correlation conditioned on variables further to the left of the matrix (see

A final solution was recently described in a blog post by

We now turn to a problem that is more specific to SEM: sign indeterminacies of loading parameters. It is well known that, if we change the signs of all loadings, the SEM likelihood (usually) stays the same. To avoid this issue, SEM software typically “prefers” positive loadings through various aspects of implementation. First, for both Bayesian and frequentist models, the loadings’ starting values are often set to positive numbers. Additionally, if a single loading is fixed for identification, it is almost always fixed to +1. This often leads other loadings towards positive values.

Especially when using software like JAGS or Stan, researchers commonly fix the latent variance to 1 and place truncated normal priors on the factor loadings, where the distributions are truncated from below at 0 (e.g.,

When a single loading per factor is fixed to 1 for identification, we should not need to fix the signs of any other loadings. If we instead fix the latent variance to 1 for identification, then an improved solution (over fixing all signs to positive) is to employ relabeling algorithms (e.g.,

The relabeling algorithms’ preferences for positive loadings can conflict with reseachers’ desires to use noninformative prior distributions for the loadings (say, Normal with a mean of 0 and a large variance). That is, the software’s preference for positive loadings conflicts with the noninformative prior distributions, which state that both positive and negative loadings are equally likely. More generally, factor loadings are influenced by the model identification constraints (e.g.,

To illustrate the interaction between sign indeterminacy and prior distributions, it is sufficient to consider the usual confirmatory factor model that is fit to the

The Holzinger-Swineford factor model has five types of model parameters: intercepts, loadings, factor standard deviations, factor correlations, and residual standard deviations. We assign true (“population”) values to all these parameters. Intercepts receive true values of 0, factor standard deviations receive true values of 1, factor correlations receive true values of 0, and residual standard deviations receive true values of 1. Finally and importantly, loadings receive true values of

Using these true values, we generated a dataset of 1,000 observations and re-fit the 3-factor model back to the data (where true values were treated as unknown). We used common, non-informative priors for the parameters of the estimated model, which are currently the defaults in

The resulting posterior distributions of the loadings appear in

At this point, readers might object that this just illustrates sign indeterminacy. And posterior inferences about factor loadings are not generally impacted here. But the example highlights that, for loadings, a noninformative prior centered at 0 usually ignores the identification constraint that was chosen. That is, when employing noninformative priors, we are usually attempting to say that we have no idea about the loadings’ values, or to avoid influencing the results of the model estimation. But the prior ignores the fact that (i) we typically fix a loading to be positive for identification, and (ii) observed variables are usually positively correlated, so that we can expect other loadings to be positive. So our original priors, which were intended to be

Sign indeterminacy also complicates MCMC algorithm validation, which is used to ensure that MCMC samplers are working correctly. The MCMC validation process is difficult even without sign indeterminacy, because randomness is inherent in MCMC sampling. This means that we cannot simply examine whether the posterior means and standard deviations match other samplers to many decimal places. While it is possible to obtain analytic posterior distributions for certain models, analytic results are the exception instead of the rule for models estimated via MCMC.

Generate many sets of parameter values from the prior distribution.

For each set of parameters from Step 1, generate an artificial dataset.

Fit the model of interest to each artificial dataset from Step 2.

Examine whether the resulting posterior distributions look like the prior distribution.

If the MCMC sampler is working correctly, then the posteriors from Step 3 should look like the priors from which we started. We can graphically examine this idea by comparing the posterior means from Step 3 to the parameter values from Step 1; we should see an identity line when plotting the parameter values against the posterior means.

Using the same model from the previous section, we used the

Results for the Set 1 and Set 2 priors are shown in

If researchers are aware of how their software handles sign indeterminacy, then they can potentially avoid the issues described here and successfully employ noninformative prior distributions. Barring that, we advise researchers to explicitly consider the loadings’ expected signs when setting prior distributions. In many models, we expect that all the observed variables corresponding to a factor will have the same direction of relationship with that factor. Additionally, a single loading is often set to 1 for identification. In a situation like this, it is often reasonable to place priors on the free loadings that have a mean of 1 and a standard deviation of, say, .5. These priors look very informative at first glance, as compared to, say, a Normal prior with a mean of 0 and a variance of 10,000. But the suggested priors better represent what the researcher knows about the signs of the loadings, combined with the fact that some loadings are being fixed to 1.

Instead of fixing a single loading to 1 for identification, researchers may fix the latent variance to 1. In this case, as described by

There are alternative prior distributions that avoid the issue and/or that make it easier to specify informative prior distributions, though they are not readily available in popular software.

Finally, we describe prior distributions for order-constrained parameters, which are commonly seen in SEMs for ordinal variables with more than two categories. For these models, there exist threshold parameters that chop each underlying continuous variable into observed, ordered categories. The threshold parameters must be ordered so that they correspond to the ordering of the observed variables. For example, the lowest threshold chops off the lowest category, the second threshold chops off the bottom two categories, and so on.

The prior distributions for threshold parameters are often opaque, because the priors that researchers specify often have no order constraints. This is commonly done to improve the software’s ease of use: researchers are accustomed to setting univariate Normal priors on individual parameters, and the priors with order constraints typically do not have simple forms. But the software always imposes order constraints here, which changes the prior distribution in various manners. Researchers often do not realize that anything happened, which may be especially problematic when setting informative priors.

Say that a researcher fits a factor analysis model to a set of 4-category ordinal variables, and that she specifies a Normal(0,5) prior on all threshold parameters in the model. Because there are four categories per variable, we require three order-constrained thresholds per variable. We wish to know what these priors look like, after accounting for the order constraints.

We consider two ways that we could translate Normal(0,5) priors to three ordered parameters (also see

For both of these translations, the threshold parameters’ prior distributions differ from the Normal(0,5) distribution that the researcher originally declared. We expand on this point below, separately for the two methods.

When we draw three values from the Normal distribution and then order them, the act of ordering influences the resulting prior distributions. The specific distributions can be described via statistical theory on order statistics. For our example, the Normal(0,5) priors translate into the following probability density functions (pdfs) for individual thresholds:

The

Just like the previous section, the above priors can be translated to priors on individual thresholds. It is obvious that the prior for

While the priors for

Unlike the previous issues with positive definite constraints and sign constraints, researchers do not have to consider changing their priors in order to address order constraints. The main solution is to be aware of the fact that, if one places univariate prior distributions on a set of order-constrained parameters, then some translation will take place to ensure that the parameters are ordered correctly. And this translation will influence the implied prior distribution of each parameter. It is worthwhile to understand how this is handled by one’s software, especially for prior predictive assessments, Bayes factor calculation, and simulation-based calibration methods.

In this paper, we considered the idea of

The three issues that we considered were (i) positive definite constraints on model covariance matrices; (ii) sign indeterminacy and constraints used to identify model parameters (typically factor loadings); and (iii) order constraints on subsets of model parameters (typically thresholds/intercepts). These issues occur with different frequencies, with (ii) and (iii) occurring more often than (i). To expand on this, issue (iii) occurs for most models that have ordinal variables with more than two categories, issue (ii) occurs for most measurement models (with free loadings), and issue (i) occurs for models with residual covariances, or other combinations of fixed and free covariances. We could have a worst-case scenario, such as a multiple group model with ordinal variables and across-group parameter constraints, where all three issues occur at once.

To avoid problems associated with opaque priors, we offer the following recommendations for practice:

If one’s model involves covariance matrices without parameter constraints, use a single prior for the full covariance (or correlation) matrix (LKJ, inverse Wishart, etc).

If one’s model involves a covariance matrix with parameter constraints, consider putting a prior on the Cholesky decomposition, or use matrix identities to see whether the full matrix can be broken into blocks that are easier to handle. If these are unavailable, use informative priors on the correlations that place more density close to 0.

For factor loadings, consider the expected direction of the relationship between each observed variable and the corresponding latent variable(s), along with the loading identification constraints. Use priors that place most density in this expected direction.

Be aware of how order constraints influence priors for thresholds, especially if one is doing model assessments that directly involve prior evaluation.

Out of these recommendations, the priors on constrained covariance matrices are most difficult to handle. Future work could make it easier for researchers to place reasonable priors on constrained covariance matrices.

Importantly, the issue of opaque priors does

Opaque priors are vaguely similar to applied modeling of ordinal variables (e.g.,

We conclude by considering that some non-Bayesian researchers may find this paper appealing, because they can use it to justify phrases like “Bayesian methods are difficult to use.” We agree that priors present extra complications that do not exist for other methods, but we find the extra complications to be worthwhile. In our experience, wrestling with prior distributions can lead to a deeper, more sober understanding of one’s model and how it interacts with data. This understanding might be achieved via other, non-Bayesian routes, but it will require the time and effort that Bayesians devote to prior distributions.

The R codes to replicate the results of the current study are freely available and can be found in the

The supplementary materials provided are the R code scripts to replicate the results of the current study (see

All results were obtained using the

This work was supported by the Institute of Education Sciences, United States Department of Education, Grant R305D210044 to the University of Missouri.

The authors have declared that no competing interests exist.

We thank Bob Carpenter and two reviewers for comments that improved the paper. All remaining errors are due to the authors.