^{1}

Having high statistical power and good estimated precision are essential to statistical practice; however, this integrative consideration on sample size planning remains limited in the literature, especially for two-group mean comparisons with unequal/unknown variances and unequal sampling costs. Furthermore, due to the neglect or misuse of employing confidence intervals, the present study aims to illuminate the probabilistic thinking by finding optimal allocations of sample sizes such that researchers can claim that the null hypothesis is rejected, the desired confidence-interval width of mean difference is achieved, and/or the true difference is encompassed in the interval. Cost effectiveness was also considered to find the optimal sample size. The simulation showed that the proposed approach can maintain the desired probability level for the conditional/unconditional probabilities of events and has good coverage rates in terms of confidence intervals. This study provides an important opportunity to advance the understanding of sample size planning and confidence intervals as well. Three R Shiny apps are provided for easy application in the Supplementary Materials.

Sample size planning is a classic problem in research design when aiming for the greatest statistical power to reject the null hypothesis, while also having the smallest sample size in order to be economical. Moreover, to obtain confidence intervals (CIs) in data analysis is an increasingly important topic because CIs provide substantive advantages to the controversial null hypothesis significance testing by providing informative results and facilitating the accumulation of knowledge from insufficient data (

For comparing two-group means, recent developments in sample size planning have already heightened the need to integrate the notion of power and precision altogether by probabilistic thinking. An increasingly recognized solution to the integration is to reject the null hypothesis (event rejection, R), to encompass the true mean difference for a

In view of methodological justification that has been mentioned so far, the sample size calculation methods have been mostly restricted to the cases of equal variances. In practice, there is evidence that extreme variance ratios do occur (

It should be noted that formulas for sample sizes that are needed for interval estimation, especially when taking the cost factor into account, are generally not found in elementary textbooks, and are thus ignored by most researchers (

A major problem of the afore-mentioned methods suggests that developing easy-to-use computer applications for practitioners and applied researchers is critical. Thus, in the context of two-group mean comparisons where variances are unequal/unknown, the aim of the present study was to develop several R Shiny apps for practitioners to find the optimal sample size for the probability of an event of interest to fulfill two distinct motives of cost effectiveness. These motives are: (a) for achieving a desired probability level, the total sampling cost can be minimal; and (b) for a given total sampling cost, the probability of event can be maximal. In an attempt to comprehensively satisfy researchers’ needs and synthesize the literature, the probabilities (cases) of event prepared in the present study contain two categories:

Unconditional: 1.

Conditional: 6.

This thorough discussion can provide an exciting opportunity to advance our knowledge of event probability. Another advantage of our method over

The remainder of this article is organized as follows. In the section immediately following, sample size planning while considering corresponding probabilities is proposed. In the first sub-section, the Welch test statistic is introduced and the sample size for event

Let the given data

with approximate Type I error rate

where

(

where

The CI width is then

If the CI width is less than or equal to a desired width value

where

It should be noted that, in

Based on

It is known that

where

In the present study, let

For motive (a), achieving a desired probability level for minimal total cost, the provided App (I) can be executed by exhaustion algorithms (

For motive (b), on the other hand, if there is a budget constraint and the total sampling cost is limited, we need to choose an optimal sample size such that a maximal probability of the event can be obtained. Thus,

For this task, App (II) (see

In this section, several comparisons are described in order to see if the proposed approach gives consistent or better results than existing methods for two-tailed test or two-sided CIs. For motive (a), firstly, when variances are equal, in

Proposed App (I) |
||||||||
---|---|---|---|---|---|---|---|---|

(1,1) with (1,1) | 45 | 22 | 23 | .906142 | 45 | 23 | 22 | .9057 |

(1,1) with (1,3) | 83 | 29 | 18 | .900254 | 84 | 30 | 18 | .9032 |

(1/9,1) with (1,1) | 21 | 5 | 16 | .902258 | 22 | 6 | 16 | .9144 |

(1/9,1) with (1,2) | 36 | 6 | 15 | .900894 | 37 | 7 | 15 | .9086 |

Proposed App (I) |
||||||||
---|---|---|---|---|---|---|---|---|

(4,1) with (1,3) | 253 | 130 | 41 | .901892 | 254 | 128 | 42 | .9060 |

(9,1) with (1,2) | 337 | 225 | 56 | .900217 | 338 | 226 | 56 | .9057 |

(9,1) with (1,3) | 390 | 240 | 50 | .900709 | 391 | 238 | 51 | .9042 |

For motive (b) in a fixed total cost, we employed

Fixed total cost | Proposed App (II) |
||||||
---|---|---|---|---|---|---|---|

(1/9,1) with (1,3) | 50 | 8 | 14 | .152415 | 11 | 13 | .1546 |

(1,1) with (1,3) | 80 | 35 | 15 | .066037 | 38 | 14 | .0679 |

(4,1) with (1,2) | 180 | 104 | 38 | .470735 | 106 | 37 | .4723 |

It should be noted that except for the true mean difference and the desired width, the values of variances and unit costs are also the key elements for allocating group sizes. Thus, these four parameters were varied to present the features of sample size planning in the following two sample size tables for the nine probability cases (all sample sizes are rounded up to the nearest integer) for motive (a). In

Case | ( |
||
---|---|---|---|

(2, 3) | (4, 3) | (8, 3) | |

1. |
393, 394 | 99, 100 | 26, 26 |

2. |
358, 358 | 358, 358 | 358, 358 |

3. |
395, 395 | 358, 358 | 358, 358 |

4. |
361, 361 | 361, 361 | 361, 361 |

5. |
420, 420 | 361, 361 | 361, 361 |

6. |
358, 358 | 358, 358 | 358, 358 |

7. |
385, 386 | 358, 358 | 358, 358 |

8. |
357, 358 | 358, 358 | 358, 358 |

9. |
359, 360 | 361, 361 | 361, 361 |

Case | (2, 4) | (4, 4) | (8, 4) |

1. |
393, 394 | 99, 100 | 26, 26 |

2. |
205, 205 | 205, 205 | 205, 205 |

3. |
393, 394 | 205, 206 | 205, 205 |

4. |
207, 207 | 207, 207 | 207, 207 |

5. |
420, 420 | 207, 207 | 207, 207 |

6. |
205, 205 | 205, 205 | 205, 205 |

7. |
379, 379 | 205, 205 | 205, 205 |

8. |
204, 204 | 204, 205 | 205, 205 |

9. |
206, 206 | 206, 206 | 207, 207 |

These sample sizes are consistent with the results in Table 2 of

Case | (1) | (2) | (3) | (4) |
---|---|---|---|---|

1. |
64, 64 | 49, 24 | 63, 17 | 41, 39 |

2. |
36, 37 | 29, 15 | 36, 11 | 25, 23 |

3. |
64, 64 | 49, 24 | 63, 17 | 41, 39 |

4. |
37, 38 | 30, 15 | 39, 11 | 26, 23 |

5. |
68, 69 | 52, 26 | 68, 18 | 44, 41 |

6. |
36, 37 | 29, 15 | 37, 11 | 26, 20 |

7. |
61, 62 | 47, 24 | 62, 16 | 40, 36 |

8. |
36, 36 | 28, 14 | 37, 10 | 24, 23 |

9. |
36, 37 | 29, 15 | 36, 11 | 25, 23 |

In

To carry out a computer simulation, we chose the optimal sample sizes (26, 20) in Case 6

The current discussions in sample size planning is to fulfill one or more goals, such as power-based statistical tests, precision of estimations, minimal costs, or some other criteria. To avoid the pitfall of treating inferential statistics as descriptive statistics, the probabilistic thinking is essential for the scheme of sample size determination. Along with rapid advances of ready-to-used computer applications, the present study aimed to contribute to this growing area and developed R Shiny

As

Since the advocacy of CIs is a central theme in statistical reform, additional methods and formulas for sample size determination in regard to various unconditional/conditional probabilities are required to address this need in the future to simultaneously obtain power and precision. The current findings add to a growing body of literature on sample size planning, and there is abundant room for further progress in multiple comparisons (simultaneous confidence intervals) (

To prepare the probability of the intersection event of

where

For

Then,

For the probability of the intersection of events

For

Finally, other conditional probabilities can be found as follows:

The author thanks Emeritus Professor Jiin-Huarng Guo of National Pingtung University, Taiwan for his guidance on derivation and programming.

For this article, three R Shiny apps can be found online to facilitate integrative consideration on sample size planning (for access see

App I: Test and Confidence Intervals for Cost Constraints

App II: Test and Confidence Intervals for Fixed Total Cost

App III: Test and Confidence Intervals for Fixed n1

This research was supported by a National Science Council grant, Taiwan (NSC98-2410-H-006-067-MY3).

The author has declared that no competing interests exist.