^{a}

^{b}

^{a}

^{c}

^{a}

^{a}

Regression coefficients are crucial in the sciences, as researchers use them to determine which independent variables best explain the dependent variable. However, researchers obtain regression coefficients from data samples and wish to generalize to populations; without reason to believe that sample regression coefficients are good estimates of corresponding population regression coefficients, their usefulness would be curtailed. In turn, larger sample sizes provide better estimates than do smaller ones. There is much recent literature on the a priori procedure (APP) that was designed for the general purpose of determining the sample sizes needed to obtain sample statistics that are good estimates of corresponding population parameters. We provide an extension of the APP to regression coefficients, which works for standardized or unstandardized regression coefficients. A simulation study and real data example support the mathematical derivations. Also, we include a free and user-friendly computer program to aid researchers in making the calculations.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent (response) variable and one or more independent (explanatory) variables. The most common form of regression analysis is linear regression that most closely fits the data according to a specific mathematical criterion.

The multiple linear regression model for

Let

We can also express the vector of regression coefficients

Note that the

In this section, we will establish the APP for estimating

The proof of Theorem 1, together with the density of

Note that Theorem 1 still hold for finding the necessary sample size needed to trust

If the previous data sets are available, we can use them to obtain

The density curves of

If the non-central parameter

From Theorem 1, we can construct a confidence region for

To illustrate the above results for case where

In this section, we conduct a simulation study and present a real data analysis to evaluate the performance of the APP proposed above. The necessary sample sizes (

f | 0.1 | 0.15 | 0.2 | 0.25 | ||||

c | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 |

n | 308 | 244 | 138 | 114 | 84 | 62 | 58 | 49 |

cr | 0.95027 | 0.89956 | 0.95006 | 0.89995 | 0.95005 | 0.89968 | 0.94963 | 0.90019 |

f | 0.1 | 0.15 | 0.2 | 0.25 | ||||

c | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 |

n | 275 | 222 | 129 | 104 | 74 | 60 | 49 | 38 |

cr | 0.94976 | 0.90035 | 0.94999 | 0.90042 | 0.94974 | 0.90039 | 0.94991 | 0.90003 |

f | 0.1 | 0.15 | 0.2 | 0.25 | ||||

c | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 |

n | 229 | 195 | 113 | 96 | 71 | 51 | 43 | 38 |

cr | 0.95009 | 0.90000 | 0.94962 | 0.90004 | 0.95003 | 0.89946 | 0.94992 | 0.90030 |

f | 0.1 | 0.15 | 0.2 | 0.25 | ||||

c | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 | 0.95 | 0.9 |

n | 192 | 172 | 98 | 78 | 58 | 46 | 38 | 35 |

cr | 0.95009 | 0.89961 | 0.94995 | 0.89959 | 0.94991 | 0.89957 | 0.94971 | 0.89984 |

The tables indicate the following. First, the required sample size

For calculating the necessary sample sizes needed to estimate the regression coefficients

To use the program for finding the sample size needed to estimate the regression coefficients, it is necessary to make three entries. In the first box, type in the number (

The data set was obtained from the R Package named

The verification of the assumptions of normality, homoscedasticity and influential values is provided in the C section of the

In the introduction, we explained why the size of regression coefficients, not just whether they are statistically significant, is important especially for applied research. Even if a regression coefficient is statistically significant, it might not be sufficiently large to justify expenditures necessary for a policy change (

In turn, there are two ways the present work, with the free and user-friendly program, can be used. One use concerns the original purpose of the APP, which is to plan sample sizes necessary for achieving researcher goals pertaining to precision and confidence. Secondly, however, the present APP expansion can be used post data collection, such as evaluating an already published regression coefficient. If a researcher reports a seemingly impressive regression coefficient, the trust that regression coefficient deserves can be assessed using the present program. If the reported sample size is less than what is necessary to meet assessors’, reviewers’, or policy makers’ criteria for precision and confidence, the applicability of the sample regression coefficient can be discounted accordingly. Alternatively, if the reported sample size exceeds that which is necessary to meet criteria for precision and confidence, trust in the sample regression coefficient can be augmented accordingly.

Also, we wish to be upfront about an important limitation, which is the assumption of multivariate normality. Future work, that we intend to perform, could include commencing from more general assumptions. For example, instead of assuming a multivariate normal distribution, it would be a further advance to extend the APP to regression coefficients under a multivariate skew normal distribution. In the meantime, the present work is nevertheless useful even if the assumption of multivariate normality is violated. To see why, consider that skewness decreases sample sizes necessary to meet specifications for precision and confidence (e.g.,

Finally, applied researchers should consider potential applications of their research, and explicitly consider how accurate the estimation needs to be to base an intervention or policy change on the regression coefficients they obtain. They can render specifications for precision and confidence accordingly. Also, the total cost of collecting a sample with required sample size

In conclusion, we hope and expect that the present contribution provides an alternative to power analysis for researchers who use correlational designs that feature regression coefficients. If the goal is to attain statistical significance, power analysis makes sense; but if the goal is to obtain sample regression coefficients that are trustworthy estimators of corresponding population regression coefficients, the present APP extension is best.

By Lemma 1 we know that

Now, we set up the APP for estimating

The density of

Now we check the assumptions of normality, homoscedasticity and influential values by

Data is freely available at

For this article, a Shiny App can be found online to calculate the necessary sample sizes needed to estimate the regression
coefficients

App: Necessary Sample Sizes to Estimate Regression Coefficients

The authors have no funding to report.

The authors have declared that no competing interests exist.

The authors have no additional (i.e., non-financial) support to report.