abstract:49e60289618e905c.tex

1: \begin{abstract}

2: The commonly cited rule of thumb for regression analysis, which suggests

3: that a sample size of \(n \geq 30\) is sufficient to ensure valid

4: inferences, is frequently referenced but rarely scrutinized. This

5: research note evaluates the lower bound for the number of observations

6: required for regression analysis by exploring how different

7: distributional characteristics, such as skewness and kurtosis, influence

8: the convergence of t-values to the t-distribution in linear regression

9: models. Through an extensive simulation study involving over 22 billion

10: regression models, this paper examines a range of symmetric,

11: platykurtic, and skewed distributions, testing sample sizes from 4 to

12: 10,000. The results reveal that it is sufficient that either the

13: dependent or independent variable follow a symmetric distribution for

14: the t-values to converge to the t-distribution at much smaller sample

15: sizes than \(n=30\). This is contrary to previous guidance which

16: suggests that the error term needs to be normally distributed for this

17: convergence to happen at low \(n\). On the other hand, if both dependent

18: and independent variables are highly skewed the required sample size is

19: substantially higher. In cases of extreme skewness, even sample sizes of

20: 10,000 do not ensure convergence. These findings suggest that the

21: \(n\geq30\) rule is too permissive in certain cases but overly

22: conservative in others, depending on the underlying distributional

23: characteristics. This study offers revised guidelines for determining

24: the minimum sample size necessary for valid regression analysis.

25: \end{abstract}

26: