^{1}

^{2}

^{1}

Configural Frequency Analysis (CFA) is a useful statistical method for the analysis of multiway contingency tables and an appropriate tool for person-oriented or person-centered methods. In complex contingency tables, patterns or configurations are analyzed by comparing observed cell frequencies with expected frequencies. Significant differences between observed and expected frequencies lead to the emergence of Types and Antitypes. Types are patterns or configurations which are significantly more often observed than the expected frequencies; Antitypes represent configurations which are observed less frequently than expected. The R-package confreq is an easy-to-use software for conducting CFAs; another useful shareware to run CFAs was developed by Alexander von Eye. Here, CFA is presented based on the log-linear modeling approach. CFA may be used together with interval level variables which can be added as covariates into the design matrix. In this article, a real data example and the use of confreq are presented. In sum, the use of a covariate may bring the estimated cell frequencies closer to the observed cell frequencies. In those cases, the number of Types or Antitypes may decrease. However, in rare cases, the Type-Antitype pattern can change with new emerging Types or Antitypes.

This article describes the use of the package _{o}) with expected (_{e}) frequencies and searches for over- or underfrequented cells. Configurations that are observed more frequently than the expected frequencies (_{o} > _{e}) are called _{o} < _{e}) are called

In the English speaking literature CFA is closely related to the name Alexander von Eye (

In the field of developmental psychology, person-oriented research is mainly represented by David Magnusson and Lars Bergman from the University of Stockholm in Sweden (see also

Here we give the reader a brief overview of the interdisciplinary use of CFA (see also

Researchers from the

CFA belongs to the person-oriented analytic approach for the analysis of frequencies in multi-way contingency tables (cf.

On the statistical level and for the use of CFA, individuals, animals or objects are grouped in cross-tabulations into disjunct categories based on their respective patterns or configurations (_{ijk}) that occur significantly more often than their corresponding expected cell frequencies (e_{ijk}) constitute CFA

In log-linear modeling the expected frequencies are estimated by using the Generalized Linear Model (GLM). The General Linear Model is a special case of the Generalized Linear Model. The GLM is:

The function

where Y is a column vector including the dependent variable.

If we replace the parameters β by λ we obtain the equation for a log-linear modeling:

The basis of CFA is the analysis of frequencies in multi-way contingency tables. Each individual case is cross-tabulated into disjunct categories based on his or her respective pattern or configuration. The underlying logic is the comparison of observed frequencies _{(o)} with expected frequencies _{(e)}. Therefore, a global chi-square, a goodness-of-fit statistic, is calculated (this following formula is, for didactic reasons, presented for three variables but can easily be extended to any number of variables):

_{ijk}

_{ijk}

and the general formula for the degrees of freedom for a contingency table with main effects is: _{d} the number of categories of a variable.

An important alternative goodness-of-fit statistic to the Pearson's chi-square is the Likelihood Ratio chi-square (LR):

The global chi-square tests the following statistical hypotheses (H_{0} and H_{1}). Again, the following formulas are, for didactic reasons, presented to three variables but may be extended to any number of variables easily:

In semantic terms, the null (H_{0}) and alternative hypothesis (H_{1}) are expressed as follows:

H_{0}: There are no significant (local) associations between the variables involved or the variables are independent of each other.

H_{1}: There are significant (local) associations between the variables involved or the variables are not independent of each other.

The alternative hypothesis includes also higher-order associations. In non-hierarchical log-linear models, lower-order associations are omitted (cf.

The expected frequencies were calculated according to the assumption of independence:

A CFA that is based on the assumption of independence is called first order CFA. In addition, we differentiate between a

Significant local chi-square values represent

The data for the present paper relate to the project “CURL” (see _{1} to _{2} (time gap: two years) included 775 juveniles with complete data with regard to delinquency. Of the 189 offenders at _{1}, 114 (ca. 60%) remained in the longitudinal data file, and about one half (48.2%) reported of having committed another crime at _{2}.

The selection of variables, here possible risk factors, for the following analyses was based on a publication with the title “Risk factors for the development of antisocial behavior in childhood and youth” (German translation: Risikofaktoren für die Entwicklung dissozialen Verhaltens in der Kindheit und Jugend;

In 5th grade

Patterns |
_{(o)} |
_{(e)} |
Type/Antitype | ||||
---|---|---|---|---|---|---|---|

Att | Peer | Offend | |||||

- | - | - | 486 | 449.67 | 1.71 | .043 | |

- | - | + | 130 | 149.34 | −1.58 | .057 | |

- | + | - | 8 | 24.01 | −3.27 | .001 | Antitype |

- | + | + | 7 | 7.97 | −0.35 | .365 | |

+ | - | - | 46 | 70.55 | −2.92 | .002 | Antitype |

+ | - | + | 31 | 23.43 | 1.56 | .059 | |

+ | + | - | 8 | 3.77 | 2.18 | .015 | |

+ | + | + | 14 | 1.25 | 11.40 | .000 | Type |

_{(o)} > _{(e)}; Antitype (underfrequented cell): _{(o)} < _{(e)}.

Both goodness-of-fit statistics suggested a poor fit: LR = 77.72, ^{2} = 161.90,

Alexander von Eye (Michigan State University) has written a CFA program (

^{1}

See

Within R, one can read in a frequency table by typing in the pattern and their frequencies into a spreadsheet file. Such form of data are typically named as

The following R syntax will lead to the results of ```
# reading in an EXCEL file in csv–format
#
order1 <– read.table("5thgrade.csv", sep=";", header=TRUE, quote="\"")
order1
# you need to load the R-package confreq
# do not use zeros as configural patterns!
library("confreq")
# convert the data to patterned frequencies
order1pat<–dat2fre(fre2dat(order1))
order1pat
# first order CFA
resd1 <– CFA(order1pat,alpha=0.05, form="~ Offender + Delinqpeer + Attitude")
summary(resd1)
# inspect the design matrix of the first order CFA
resd1$designmatrix
```

The resulting design matrix for the base model (see last syntax line in the box above) looks like the following:

The underlying idea is that covariates are employed in the loglinear base model to compute the expected frequencies. As the first covariate we added _{c} = parameter for the covariate.

The resulting model belongs to the family of nonstandard log-linear models. In the literature there exist caution with the ambiguous interpretation of parameters from such nonstandard models.

As the

Let’s have a look at the R-syntax with one covariate:
```
##### the covariates from CURL 5th Grade ---------------------
co <- read.csv2(file = "covariate.csv", header = TRUE)
co
# to run a CFA with one covariate, here Apq_pe Parental Engagement
erg4_PE <- CFA(order1pat,cova = co$Apq_pe)
# 'cova = co$Apq_pe' adds the covariate to the design matrix
summary(erg4_PE, showall = T, type = "pChi")
# have a closer look at the design matrix
erg4_PE$designmatrix
```

The resulting design matrix with the means of

Patterns |
_{(o)} |
_{(e)} |
Type/Antitype | ||||
---|---|---|---|---|---|---|---|

Att | Peer | Offend | |||||

- | - | - | 486 | 458.35 | 1.29 | .098 | |

- | - | + | 130 | 147.39 | −1.43 | .076 | |

- | + | - | 8 | 9.61 | −0.52 | .302 | |

- | + | + | 7 | 15.65 | −2.19 | .014 | |

+ | - | - | 46 | 73.08 | −3.17 | .001 | Antitype |

+ | - | + | 31 | 14.14 | 4.46 | .000 | Type |

+ | + | - | 8 | 6.96 | 0.39 | .347 | |

+ | + | + | 14 | 4.78 | 4.22 | .000 | Type |

_{(o)} > _{(e)}; Antitype (underfrequented cell): _{(o)} < _{(e)}.

The fit is still not perfect; there are significant differences between the observed and expected frequencies; however, the AIC and BIC were reduced and we lose one degree of freedom: LR = 48.32, ^{2} = 56.71,

Next to

Let’s have a look at the R-syntax with two covariates:
```
##### the covariates from CURL 5th Grade ---------------------
co <- read.csv2(file = "covariate.csv", header = TRUE)
co
# to run a CFA with two covariates, here Parental Engagement and Corporal Punishment
erg5_CP <- CFA(order1pat,cova = cbind(co$Apq_pe,co$Apq_cp))
summary(erg5_CP, showall = T, type = "pChi")
# have a closer look at the design matrix
erg5_CP$designmatrix
```

The results of the first order CFA with two covariates can be found in

Patterns |
_{(o)} |
_{(e)} |
Type/Antitype | ||||
---|---|---|---|---|---|---|---|

Att | Peer | Offend | |||||

- | - | - | 486 | 486.759 | −0.034 | .486 | |

- | - | + | 130 | 127.086 | 0.258 | .398 | |

- | + | - | 8 | 8.110 | −0.038 | .485 | |

- | + | + | 7 | 9.047 | −0.680 | .248 | |

+ | - | - | 46 | 47.592 | −0.231 | .409 | |

+ | - | + | 31 | 31.564 | −0.100 | .460 | |

+ | + | - | 8 | 5.541 | 1.045 | .148 | |

+ | + | + | 14 | 14.303 | −0.008 | .468 |

_{(o)} > _{(e)}; Antitype (underfrequented cell): _{(o)} < _{(e)}.

With two covariates, the significant differences between the observed and expected frequencies vanished. We invested another degree of freedom but we have a reasonable fit: LR = 1.60, ^{2} = 1.69,

In CFA, covariates which correlate with the residuals decrease the differences between the observed and expected cell frequencies. However, covariates which do not correlate can lead to the emergence of new Types and Antitypes (

We demonstrated the use of CFA with covariates. CFA is a very useful tool in the realm of person-oriented research which is related to other statistical methods which analyze patterns or configuration of information, like latent-class analysis (LCA), latent profile analysis and general growth mixture models (GGMM). GGMM are basically growth curve models performed for different latent classes (

Compared to other person-oriented data analysis approaches, CFA can be distinguished according to different aspects. First, in comparison to LCA and the mixture models associated with it (

The use of additional covariates makes CFA even more flexible. In particular, if one investigates variables of different scale levels (e.g., categorical and interval level variables). In the person-oriented research, a covariate which is significantly related to the variables under investigation brings the observed frequencies closer to the expected frequencies; this results in a diminishing number of Types and Antitypes. Moreover, this disappearance is probably causally related.

Using the log-linear modeling (LLM) approach to CFA, covariates are simply added to the design matrix of a first order CFA by adding columns of means, medians or even percentages.

In addition, the use of the R-package

The authors have no funding to report.

The authors have declared that no competing interests exist.

The authors have no additional (i.e., non-financial) support to report.