^{1}

Consider a one-way or two-way ANOVA design. Typically, groups are compared based on some measure of location. The paper suggests alternative methods where measures of location are replaced by a robust measure of effect size that is based in part on a robust measure of dispersion. The measure of effect size used here does not assume that the groups have a common measure of dispersion. That is, it deals with heteroscedasticity. It is fairly evident that no single method reveals everything of interest regarding how groups differ. Certainly, comparing measures of location provides useful information. But as illustrated, comparing measures of effect size can provide a deeper understanding of how groups compare.

Momentarily consider two independent groups and let

where it is assumed that

as a measure of effect size.

where

A parameter is said to be non-robust if a small change in a distribution has a large
impact on its value. Formal mathematical methods for characterizing robust
parameters are summarized by

Here, robust versions of (3) and (4) are used, which are based on a trimmed mean and
a Winsorized variance. This mimics the basic approach used by

The choice

Winsorizing

where

where

Consider the case where

A basic percentile bootstrap method is used to test (1). Momentarily focus on two
independent groups. First, generate a bootstrap sample from each group. That is,
randomly sample with replacement

where

When

If

If

If

As is evident, the percentile bootstrap method is readily adapted to testing (2)
where

Let

has a g-and-h distribution, where

Here, an estimate of the null distribution of

Let

That is, an estimate of the upper quantile of the distribution of

Some comments about a 2-by-2 design are helpful. For this special case, let

is the same as testing

That is, it is irrelevant whether differences are based within rows rather than within columns. Note that testing

is an analog of testing for an interaction. Rather than comparing measures of effect size based on the difference between measures of location only, a measure of effect size is used that is based in part on the Winsorized variance within each group. But testing (7) is not necessarily the same as testing

That is, comparing measures of effect size corresponding to the levels of the first factor differs from comparing measures of effect size corresponding to the levels of the second factor.

Simulations were used to assess the small sample properties of methods M1 and M2. Data were generated from four types of distributions: normal, symmetric and heavy-tailed symmetric and relatively light-tailed, and asymmetric and relatively heavy-tailed, roughly meaning that outliers tend to be common. More specifically, data were generated from four g-and-h distributions. The four distributions used here are the standard normal (

0.0 | 0.0 | 0.00 | 3.00 |

0.0 | 0.2 | 0.00 | 21.46 |

1.0 | 0.0 | 0.61 | 3.68 |

1.0 | 0.2 | 32.81 | 2295.98 |

The range of distributions used here is motivated by a review of several studies aimed at charactering the extent distributions differ from normality (

For method M1, simulations were run for

N1 | |||
---|---|---|---|

0.0 | 0.0 | 0.055 | 0.055 |

0.0 | 0.2 | 0.053 | 0.051 |

1.0 | 0.0 | 0.056 | 0.057 |

1.0 | 0.2 | 0.041 | 0.053 |

N2 | |||

0.0 | 0.0 | 0.048 | 0.053 |

0.0 | 0.2 | 0.047 | 0.068 |

1.0 | 0.0 | 0.040 | 0.051 |

1.0 | 0.2 | 0.047 | 0.048 |

There is a well-established heteroscedastic method for performing all pairwise comparisons based on trimmed means (e.g.,

As for method M2, simulations were run for

N1 | ||
---|---|---|

0.0 | 0.0 | 0.021 |

0.0 | 0.2 | 0.016 |

1.0 | 0.0 | 0.071 |

1.0 | 0.2 | 0.070 |

N2 | ||

0.0 | 0.0 | 0.032 |

0.0 | 0.2 | 0.029 |

0.2 | 0.0 | 0.063 |

1.0 | 0.2 | 0.063 |

N3 | ||

0.0 | 0.0 | 0.043 |

0.0 | 0.2 | 0.042 |

1.0 | 0.0 | 0.055 |

0.2 | 0.2 | 0.057 |

N4 | ||

0.0 | 0.0 | 0.029 |

0.0 | 0.2 | 0.028 |

1.0 | 0.0 | 0.065 |

1.0 | 0.2 | 0.065 |

As can be seen, the estimates satisfy Bradley’s criterion in all situations except N1 and when dealing with a symmetric distribution; the estimates are less than 0.025, the lowest estimate being 0.016.

Method T1 is readily extended to a two-way ANOVA design where the goal is to perform all pairwise comparisons of the levels of the first or second factor, which is called method T2 henceforth. The details are in

M2 | T2 | ||
---|---|---|---|

N1 | |||

1.0 | 1.0 | 0.447 | 0.301 |

1.0 | 0.5 | 0.798 | 0.229 |

1.0 | 2.0 | 0.284 | 0.206 |

N3 | |||

0.5 | 1.0 | 0.712 | 0.361 |

0.5 | 0.5 | 0.993 | 0.407 |

0.5 | 3.0 | 0.290 | 0.213 |

Methods M1 and M2 are illustrated based on data stemming from a study of an intervention program aimed at improving the physical and mental health of older adults (

Group | M1 | T1 | |
---|---|---|---|

1 | 2 | 0.084 | 0.060 |

1 | 3 | 0.000 | 0.002 |

1 | 5 | 0.000 | 0.001 |

2 | 3 | 0.352 | 0.534 |

2 | 4 | 0.260 | 0.325 |

2 | 5 | 0.564 | 0.103 |

3 | 4 | 0.564 | 0.999 |

3 | 5 | 0.564 | 0.825 |

4 | 5 | 0.564 | 0.997 |

Presumably, what constitutes a large effect size can depend on the situation. For illustrative purposes, suppose

The second illustration deals with the goal of understanding the association between a measure of meaningful activities (MAPA) and two independent variables: a measure of life satisfaction (LSIZ) and a participant’s cortisol awakening response (CAR), which is the difference between cortisol measured upon awakening and measured again about 30-45 minutes later. The focus here is on measures taken after intervention.

For illustrative purposes, the data are split into four groups based on the medians of the two independent variables, resulting in a two-by-two ANOVA design.

As for method M2, for low LSIZ, the estimate of

As previously noted, based on trimmed means only, a significant interaction was obtained. The results based on

Perhaps it should be stressed that it is not being suggested that methods based on measures of location only should be abandoned. Surely, they provide useful information about how groups compare. Again, the only suggestion is that comparing groups based on a robust measure of effect size that includes some measure of variation can provide insights about how groups compare.

There are several other ways for dealing with an interaction, based on a heteroscedastic measure of effect size, beyond the approach used here (

For this article, data is freely available (see

For this article, the following supplementary materials are available (see

The file TWO-WAY_KMS_illustration.pdf contains the R code that was used in the illustrations plus some additional analyses of the data.

The R functions for applying the methods in this paper are stored in the file Rallfun-v39. The function KMSmcp.ci applies method M1 and AN2GLOB.KMS applies method M2. In the illustration involving LSIZ and CAR, method M2 was applied via the R function KMSgridRC, which provides a convenient way of splitting the data as was described. The function KMS2way performs all relevant pairwise comparisons and tests all tetrad interactions.

The code used in the simulations is stored in the files KMSmcp_ci_sim.tex (method M1) and ANOG2KMS_sim.tex (method M2).

The author has no funding to report.

The author has declared that no competing interests exist.

The author has no additional (i.e., non-financial) support to report.