Sample Size and Power Analysis: Comprehensive Reference Guide
This comprehensive guide covers power analysis fundamentals, sample size calculations for various study designs, effect size determination, and practical considerations for planning statistical studies with detailed mathematical formulations and interpretation guidelines.
Overview
Sample size and power analysis are crucial components of study design that determine the ability to detect meaningful effects and ensure adequate statistical power. Proper planning prevents underpowered studies and resource waste while maintaining scientific rigor.
Power Analysis Fundamentals
1. Statistical Errors
Type I Error (α):
- Probability of rejecting true null hypothesis
- False positive rate
- Typically set at 0.05 (5%)
Type II Error (β):
- Probability of failing to reject false null hypothesis
- False negative rate
- Typically set at 0.10 or 0.20
Statistical Power (1-β):
- Probability of correctly rejecting false null hypothesis
- Ability to detect true effect
- Typically desired at 0.80 (80%) or 0.90 (90%)
2. Effect Size
Definition: Standardized measure of the magnitude of difference or association.
Cohen's Conventions:
- Small effect: d = 0.2, r = 0.1, f = 0.1
- Medium effect: d = 0.5, r = 0.3, f = 0.25
- Large effect: d = 0.8, r = 0.5, f = 0.4
Cohen's d (standardized mean difference):
Correlation coefficient (r):
Cohen's f (ANOVA effect size):
3. Factors Affecting Power
- Effect size: Larger effects easier to detect
- Sample size: Larger samples increase power
- Significance level (α): Lower α decreases power
- Variability: Lower variability increases power
- Study design: More efficient designs increase power
Sample Size Calculations for Different Study Designs
1. One-Sample Tests
One-Sample t-test (mean):
One-Sample z-test (proportion):
Where:
- = null hypothesis mean
- = alternative hypothesis mean
- = null hypothesis proportion
- = alternative hypothesis proportion
2. Two-Sample Tests
Independent samples t-test (equal variances):
Independent samples t-test (unequal variances):
Where k = (allocation ratio)
Two-sample z-test (proportions):
Where
Paired t-test:
Where:
- = standard deviation of differences
- = mean difference
3. ANOVA Designs
One-way ANOVA:
Simplified formula:
Two-way ANOVA:
Where:
- f = Cohen's f effect size
- c = correction factor based on design
- = degrees of freedom for effect
Repeated Measures ANOVA:
Where:
- k = number of repeated measures
- ρ = correlation between repeated measures
4. Factorial Designs
2×2 Factorial Design:
For interaction effect:
General factorial design:
Correlation and Regression Studies
1. Correlation Analysis
Sample size for correlation:
Fisher's z-transformation:
Power for given sample size:
2. Linear Regression
Simple linear regression:
Where:
- u = number of predictors
- (effect size)
Multiple regression:
Logistic regression:
Where:
- p = proportion of events
- OR = odds ratio
Non-Parametric Test Sample Sizes
1. Mann-Whitney U Test
Asymptotic relative efficiency (ARE) = 0.955:
Direct formula:
2. Wilcoxon Signed-Rank Test
ARE = 0.955 relative to paired t-test:
3. Kruskal-Wallis Test
ARE = 0.864 relative to one-way ANOVA:
Survival Analysis Sample Size Calculations
1. Log-Rank Test
Formula:
Where:
- = proportions in each group
- HR = hazard ratio
With censoring:
Where = probability of observing event
2. Cox Proportional Hazards
Number of events needed:
Total sample size:
3. Exponential Survival
Equal allocation:
Where:
- λ = hazard rates
- T = study duration
Cluster Randomized Trials and Multilevel Studies
1. Cluster Randomized Trials
Design effect:
Where:
- m = average cluster size
- ρ = intracluster correlation coefficient
Adjusted sample size:
Number of clusters:
2. Multilevel Models
Two-level design:
Three-level design:
3. Stepped Wedge Designs
Sample size adjustment:
Where:
- T = number of time periods
- ρ = intracluster correlation
Post-Hoc Power Analysis Considerations
1. Observed Power
Problems with observed power:
- Circular reasoning when non-significant
- Misleading interpretation
- Not useful for study interpretation
Formula:
2. Confidence Intervals
Preferred approach:
- Report confidence intervals instead of post-hoc power
- Provides information about precision
- Indicates practical significance
Relationship to power:
3. Effect Size Estimation
Retrospective effect size:
Confidence interval for effect size:
Software Recommendations and Practical Guidelines
1. Software Options
Specialized Software:
- G*Power (free, comprehensive)
- PASS (commercial, extensive)
- nQuery (commercial, clinical trials)
- SAS/PROC POWER
- R packages (pwr, PowerTOST)
General Statistical Software:
- SPSS (limited power analysis)
- Stata (sampsi, power commands)
- SAS (PROC POWER)
- R (multiple packages)
2. Practical Considerations
Planning Phase:
- Define primary endpoint clearly
- Specify effect size of interest
- Consider feasibility constraints
- Plan for attrition/dropout
- Consider multiple comparisons
Effect Size Determination:
- Literature review
- Pilot studies
- Clinical significance
- Regulatory guidelines
- Expert opinion
Sample Size Inflation:
- Dropout rate: multiply by 1/(1-dropout rate)
- Non-compliance: adjust for dilution effect
- Multiple comparisons: Bonferroni or other corrections
3. Reporting Guidelines
Essential Elements:
- Primary hypothesis and endpoint
- Effect size and justification
- Power and significance level
- Sample size calculation method
- Assumptions made
- Software used
Example: "Sample size was calculated for a two-sided t-test comparing mean scores between groups. Assuming a medium effect size (Cohen's d = 0.5), α = 0.05, and power = 0.80, a total sample size of 128 participants (64 per group) was required. Accounting for 20% attrition, we aimed to recruit 160 participants."
4. Common Mistakes
Avoid These Errors:
- Using post-hoc power analysis for interpretation
- Ignoring multiple comparisons
- Unrealistic effect size assumptions
- Inadequate consideration of dropout
- Confusing statistical and clinical significance
This comprehensive guide provides the foundation for understanding and conducting proper sample size and power analysis for various study designs and statistical tests.