^{*}

^{a}

^{a}

Confidence intervals (CIs) constitute the most popular alternative to widely criticized null hypothesis significance tests. CIs provide more information than significance tests and lend themselves well to visual displays. Although CIs are no better than significance tests when used solely as significance tests, researchers need not limit themselves to this use of CIs. Rather, CIs can be used to estimate the precision of the data, and it is the precision argument that may set CIs in a superior position to significance tests. We tested two versions of the precision argument by performing computer simulations to test how well sample-based CIs estimate a priori CIs. One version pertains to precision of width whereas the other version pertains to precision of location. Using both versions, sample-based CIs poorly estimate a priori CIs at typical sample sizes and perform better as sample sizes increase.

The null hypothesis significance testing procedure is increasingly coming under attack (see

Although most statistically savvy researchers favor CIs over significance tests, CIs also can be criticized. The most popular use of CIs is as an alternative form of significance testing; if the critical value falls outside the CI, the finding is “significant.” When used in this way, CIs fail to improve on traditional significance tests. Alternatively, some have promoted CIs for parameter estimation, but this can be done in a naïve or sophisticated way. A naïve example would be when a researcher computes a sample mean, then constructs a 95% CI around the mean, and concludes that the population mean has a 95% chance of being within the constructed CI. The unfortunate fact is that there is no way to know this probability, and serious frequentists would argue that probabilities are irrelevant, as the parameter either is in the CI or is not. The researcher’s lack of knowledge about whether the parameter is in the interval fails to justify assigning a probability.

But if CIs should not be used as an alternative form of significance testing, nor to assign probabilities with respect to the placement of population parameters, what is the potential contribution? The usual answer given by CI sophisticates is that CIs provide researchers with information about the precision of the data (e.g.,

There is empirical support for the precision argument.

But is the precision argument misdirected? Our intention is to argue that it is. Put briefly, we see the ability of sample-based CIs to capture sample means in following simulations as not very relevant because the estimation goal concerns population parameters and not sample statistics. The more relevant issue, as will become clear in the ensuing discussion, is whether sample-based CIs accurately estimate

What are

As a quick example, suppose

Although CIs provide the basis for the ^{1}

More complex a priori equations have been developed since

For a computer simulation employing the standard normal distribution (mean equals 0 and standard deviation equals 1),

How well will sample-based CIs approximate

There are at least three ways in which sample-based CIs can do well or badly at approximating

The simulation was based on the manipulation of sample size. Sample sizes ranged from 10 to 1,000 increasing by 10 (i.e., 10, 20, …, 1,000). For the simulation, pseudo-random data were obtained from the standard normal distribution with mean and variance equal to zero and one, respectively. A random seed was set to 12 to ensure the results could be perfectly replicated. The simulation ran 10,000 times for each sample size. Each sample was then subjected to a one-sample

Before comparing sample-based CIs to

With respect to location, the data pertaining to lower limits and upper limits are very similar, so

Also, because lower limits are single numbers, in contrast to widths being intervals, there was no way to calculate a percentage of a lower limit in the way that we did for widths, and we simply used absolute numbers to create ranges. For example, in the 2.5% case, we determined the percentage of sample-based lower limits between each

A possible reason the findings were so unflattering to sample-based CIs is because we used 95% CIs that can be considered extreme.^{2}

We thank an anonymous reviewer for suggesting this possibility.

To address this issue, we performed analyses resembling the foregoing; but using 50% CIs instead of 95% CIs.Although our main points have been made, there is one final matter. It might be useful to gain an idea of the effect of sample sizes on empirical distributions in a more general way than is conveyed by ^{3}

We thank an anonymous reviewer for suggesting this possibility.

One way to accomplish this is to consider the absolute value of the mean difference score, between each empirically generated range and the a priori range, within each sample size. The expectation is that mean difference scores should decrease as sample sizes increase. A second way is to consider the standard deviations of empirically generated ranges, within each sample size, which should decrease as sample sizes increase.Illustrations analogous to those in the foregoing paragraph can be applied to lower limits too.^{4}

As before, lower limits and upper limits generate similar data, so we remained with lower limits.

We considered the difference between empirically generated lower limits and the lower limit of the a priori interval, at each sample size.Finally, as a check on the programming, at all sample sizes, we calculated the percentages of CIs that enclosed the population mean using 95% sample-based CIs and 50% sample-based CIs. Supporting that the programming was valid, at all sample sizes, all percentages pertaining to 95% CIs and 50% CIs were very close to 95% and 50%, respectively.^{5}

We also investigated medians but found nothing sufficiently interesting to be reported here.

However, we reiterate a point made earlier. Knowing the percentage of sample-based CIs that enclose the population mean does not justify drawing a conclusion about the probability of a population mean being within a single sample-based CI. There is no way to know this latter probability.CIs are not much of an improvement over significance tests if they are merely to be used as significance tests. Nor can CIs be used to estimate the probability that the population parameter of interest (e.g., the population mean) is within the constructed CI. Sophisticated users of CIs know these points and argue instead that CIs are useful for estimating the precision of the data. On the contrary, however,

Well, then, if sample-based CIs do not work well for precision, how do they contribute to statistical inference? Our answer is that researchers should eschew sample-based CIs in favor of

We end by admitting an important limitation of the

The authors have no funding to report.

The authors have declared that no competing interests exist.

The authors have no support to report.