Sample Size and Power Analysis

Comprehensive reference guide for sample size calculations and power analysis.

Sample Size and Power Analysis: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of statistical power and sample size determination all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are planning a new study for the first time or deepening your understanding of how to design adequately powered research, this guide builds your knowledge systematically from the ground up.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What are Sample Size and Power Analysis?
  3. The Mathematics Behind Power Analysis
  4. Considerations and Planning Checklist
  5. Power Analysis for Common Statistical Tests
  6. Using the Sample Size and Power Analysis Calculator Component
  7. Step-by-Step Procedure
  8. Interpreting the Output
  9. Visualising Power and Sample Size
  10. Sensitivity Analysis and Robustness Checks
  11. Advanced Topics
  12. Worked Examples
  13. Common Mistakes and How to Avoid Them
  14. Troubleshooting
  15. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into sample size and power analysis, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 Hypothesis Testing Framework

All power analyses are grounded in the hypothesis testing framework. A statistical test evaluates the evidence in a dataset against a null hypothesis:

  • Null hypothesis (H0H_0): The default position — typically, that no effect exists, no difference is present, or variables are unassociated.
  • Alternative hypothesis (H1H_1): The research hypothesis — that an effect exists, a difference is present, or variables are associated.

The test produces a test statistic (e.g., tt, FF, χ2\chi^2, zz) and a corresponding p-value. If pαp \leq \alpha, we reject H0H_0 in favour of H1H_1.

1.2 The Four Outcomes of a Hypothesis Test

Every hypothesis test results in one of four possible outcomes, two of which are correct decisions and two of which are errors:

H0H_0 is TRUEH0H_0 is FALSE
Fail to reject H0H_0✅ Correct decision (True negative)❌ Type II error (False negative)
Reject H0H_0❌ Type I error (False positive)✅ Correct decision (True positive)

The probabilities associated with each outcome:

OutcomeSymbolDefinition
Type I error rate (false positive rate)α\alphaP(Reject H0H0 is true)P(\text{Reject } H_0 \mid H_0 \text{ is true})
Type II error rate (false negative rate)β\betaP(Fail to reject H0H0 is false)P(\text{Fail to reject } H_0 \mid H_0 \text{ is false})
Significance levelα\alphaControlled by the researcher; conventionally .05.05
Statistical power1β1 - \betaP(Reject H0H0 is false)P(\text{Reject } H_0 \mid H_0 \text{ is false})
Specificity1α1 - \alphaP(Fail to reject H0H0 is true)P(\text{Fail to reject } H_0 \mid H_0 \text{ is true})

1.3 The Significance Level (α\alpha)

The significance level α\alpha is the maximum acceptable probability of a Type I error — that is, the probability of declaring a significant result when the null hypothesis is actually true. The researcher chooses α\alpha before collecting data.

Conventional values:

α\alphaContext
.05.05Standard in most social, behavioural, and health sciences
.01.01More stringent; clinical trials, policy-relevant decisions
.001.001Very stringent; genomics, physics, large-scale testing
.10.10Sometimes used in exploratory research or small pilot studies

1.4 The p-Value

The p-value is the probability of observing a test statistic as extreme as or more extreme than the one obtained, assuming H0H_0 is true:

p=P(test statistictobsH0)p = P(\text{test statistic} \geq t_{obs} \mid H_0)

A small p-value (below α\alpha) means the observed result is unlikely under H0H_0 and constitutes evidence against H0H_0. Crucially, the p-value does not tell you the probability that H0H_0 is true, nor does it measure the size or practical importance of an effect.

1.5 Effect Size

An effect size is a standardised, scale-free measure of the magnitude of a phenomenon. It is the single most important input to any power analysis. Common effect size measures by test type:

TestEffect Size MeasureSymbolRange
t-test (two groups)Cohen's dddd(,+)(-\infty, +\infty)
ANOVA (multiple groups)Cohen's ffff[0,)[0, \infty)
CorrelationPearson correlationrr[1,+1][-1, +1]
Chi-square testCramér's VV (or ϕ\phi)VV, ϕ\phi[0,1][0, 1]
Regression (multiple)Cohen's f2f^2f2f^2[0,)[0, \infty)
Proportion testCohen's hh (arcsine difference)hh(,+)(-\infty, +\infty)
Repeated measuresCohen's drmd_{rm} or ff

1.6 Cohen's Conventions for Effect Size

Jacob Cohen (1988) proposed widely used benchmarks for effect size magnitudes across common statistical tests. These are conventions of last resort — domain knowledge always supersedes them:

TestSmallMediumLarge
t-test (dd)0.200.500.80
ANOVA (ff)0.100.250.40
Correlation (rr)0.100.300.50
Chi-square (ww)0.100.300.50
Regression (f2f^2)0.020.150.35
Proportion test (hh)0.200.500.80

1.7 The Normal and Non-Central Distributions

Power analysis relies on understanding how test statistics are distributed under two scenarios:

  • Under H0H_0: The test statistic follows a central distribution (e.g., standard normal N(0,1)\mathcal{N}(0,1), central t, central χ2\chi^2, central F).
  • Under H1H_1: The test statistic follows a non-central distribution — the same family but shifted by a non-centrality parameter λ\lambda, which quantifies the true size of the effect and the sample size:

λ=δ×n(for z/t-tests, simplified form)\lambda = \delta \times \sqrt{n} \quad \text{(for z/t-tests, simplified form)}

Power is the probability that a non-centrally distributed test statistic exceeds the critical value derived from the central distribution.

1.8 Directionality: One-Tailed vs. Two-Tailed Tests

The directionality of a test affects the critical region and therefore the power:

  • Two-tailed test: The critical region is split across both tails of the distribution. Rejects H0H_0 for both very large and very small test statistics. Critical value: zα/2z_{\alpha/2} (e.g., 1.961.96 for α=.05\alpha = .05).
  • One-tailed test: The critical region is entirely in one tail. More powerful for detecting effects in the predicted direction but cannot detect effects in the opposite direction. Critical value: zαz_{\alpha} (e.g., 1.6451.645 for α=.05\alpha = .05).

For a given effect size and sample size, a one-tailed test has greater power than a two-tailed test. However, one-tailed tests require a strong directional a priori justification and are vulnerable to criticism if the actual effect is in the opposite direction.

⚠️ Most journals and reporting guidelines recommend two-tailed tests unless there is a strong, pre-registered, directional theoretical justification. DataStatPro defaults to two-tailed tests for all procedures.


2. What are Sample Size and Power Analysis?

2.1 The Core Questions

Power analysis and sample size determination address four deeply interconnected questions in research design:

  1. Power analysis (post-hoc): Given my sample size, effect size, and α\alpha, what was the probability of detecting a true effect?
  2. Sample size calculation (a priori): Given a desired power, effect size, and α\alpha, how many participants do I need?
  3. Minimum detectable effect: Given my sample size and α\alpha, what is the smallest effect I have adequate power to detect?
  4. Sensitivity analysis: Given my sample size and α\alpha, how does power vary across a range of plausible effect sizes?

These four questions form the four modes of power analysis. A priori sample size calculation — performed before data collection — is the most important and is the focus of most of this tutorial.

2.2 Why Power Analysis Matters

Consequence of Ignoring PowerEffect
Underpowered studyHigh probability of Type II error; genuine effects missed; wasted resources
Overpowered studyResources wasted; trivially small, practically meaningless effects declared significant
Post-hoc power gamingMisleading; observed power with observed effect size is 50% when p=αp = \alpha
Non-replicable findingsUnderpowered studies produce inflated effect size estimates (the "Winner's Curse")
Ethical implicationsExposing participants to risk or burden without adequate chance of meaningful results
Grant and ethics requirementsMost funding bodies and ethics committees require a priori power justification

2.3 The Four Elements of Power Analysis

Every power analysis involves exactly four quantities, any one of which can be computed from the other three:

Power (1 - β)  ←─────────────────────────────┐
         │                                   │
Sample size (N) ─────────────────────►       │
         ↕                                   │
Effect size (ES) ───────────────────►         │
         │                                   │
Significance level (α) ─────────────►         │
         └───────────────────────────────────┘

Specify any three → solve for the fourth.

2.4 Four Modes of Power Analysis

ModeFixed InputsSolved OutputWhen Used
A prioriα\alpha, 1β1-\beta, ESESNNBefore data collection (study planning)
Post-hocα\alpha, NN, ESES1β1-\betaAfter data collection (result interpretation)
CriterionNN, 1β1-\beta, ESESα\alphaRare; sometimes used in quality control
Sensitivityα\alpha, NN, 1β1-\betaESminES_{min}Before or after collection; what can I detect?

⚠️ Post-hoc power analysis computed using the observed effect size is widely regarded as uninformative and should be avoided. When p>αp > \alpha, observed power will typically be low — but this is a mathematical consequence of the non-significant result, not an independent finding. Instead of post-hoc power, report the 95% confidence interval for the effect size and a sensitivity power analysis.

2.5 Desired Power: Choosing 1β1 - \beta

The conventional target for statistical power is 0.80 (80%), implying a 20% Type II error rate. Higher power targets are increasingly recommended:

Power (1β1 - \beta)β\betaContext
0.800.80.20.20Minimum conventional standard (Cohen, 1988)
0.900.90.10.10Recommended for clinical trials; replication studies
0.950.95.05.05High-stakes research; confirmatory studies
0.990.99.01.01Safety-critical or regulatory contexts

The ratio of Type I to Type II error rates is also informative:

  • At α=.05\alpha = .05 and power =.80= .80: ratio =β/α=.20/.05=4= \beta/\alpha = .20/.05 = 4 (Type II errors are 4× more likely than Type I errors).
  • At α=.05\alpha = .05 and power =.95= .95: ratio =.05/.05=1= .05/.05 = 1 (equal error rates).

2.6 Real-World Applications

FieldContextTypical Power Target
Clinical TrialsRCT comparing drug vs. placebo0.900.900.950.95
PsychologyBetween-subjects experiment0.800.800.900.90
Education ResearchIntervention effectiveness study0.800.80
EpidemiologyCase-control study; cohort study0.800.800.900.90
Genomics / GWASAssociation study (α=5×108\alpha = 5 \times 10^{-8})0.800.80
Marketing ResearchA/B test for conversion rate0.800.800.900.90
Quality ControlDetecting process shift0.900.900.950.95
Pilot StudiesFeasibility; parameter estimation0.600.600.800.80

3. The Mathematics Behind Power Analysis

3.1 The General Power Framework

For any test with test statistic TT and critical value tcritt_{crit}:

Power=1β=P(TtcritH1 is true)\text{Power} = 1 - \beta = P(T \geq t_{crit} \mid H_1 \text{ is true})

Under H1H_1, TT follows a non-central distribution with non-centrality parameter λ\lambda. Power is the probability that this non-centrally distributed test statistic exceeds the critical value determined under H0H_0.

The critical value tcritt_{crit} is determined by α\alpha and the test type:

  • Two-tailed: tcrit=z1α/2t_{crit} = z_{1-\alpha/2} (e.g., 1.9601.960 for α=.05\alpha = .05)
  • One-tailed: tcrit=z1αt_{crit} = z_{1-\alpha} (e.g., 1.6451.645 for α=.05\alpha = .05)

3.2 Power for the One-Sample z-Test

The simplest case: testing whether a population mean μ\mu equals a known value μ0\mu_0, with known population SD σ\sigma and sample size nn.

Non-centrality parameter:

λ=(μ1μ0)σ/n=d×n\lambda = \frac{(\mu_1 - \mu_0)}{\sigma / \sqrt{n}} = d \times \sqrt{n}

Where d=(μ1μ0)/σd = (\mu_1 - \mu_0)/\sigma is Cohen's dd for the one-sample case.

Power (two-tailed):

1β=1Φ(zα/2λ)+Φ(zα/2λ)1 - \beta = 1 - \Phi(z_{\alpha/2} - \lambda) + \Phi(-z_{\alpha/2} - \lambda)

For practical purposes (when λ\lambda is not near zero):

1βΦ(λzα/2)1 - \beta \approx \Phi(\lambda - z_{\alpha/2})

Where Φ\Phi is the standard normal CDF.

Sample size formula (solving for nn):

n=(zα/2+z1βd)2n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d}\right)^2

Where z1βz_{1-\beta} is the critical value for the desired power (e.g., 0.8420.842 for 1β=0.801-\beta = 0.80; 1.2821.282 for 1β=0.901-\beta = 0.90; 1.6451.645 for 1β=0.951-\beta = 0.95).

3.3 Power for the Two-Sample Independent t-Test

Testing whether two population means differ (H1:μ1μ2H_1: \mu_1 \neq \mu_2), with equal group sizes n1=n2=nn_1 = n_2 = n and pooled SD σpooled\sigma_{pooled}.

Cohen's dd (standardised mean difference):

d=μ1μ2σpooledd = \frac{\mu_1 - \mu_2}{\sigma_{pooled}}

Non-centrality parameter:

λ=d1/n1+1/n2=d×n2\lambda = \frac{d}{\sqrt{1/n_1 + 1/n_2}} = d \times \sqrt{\frac{n}{2}}

Sample size per group (approximate):

nper  group=2(zα/2+z1β)2d2n_{per\;group} = \frac{2(z_{\alpha/2} + z_{1-\beta})^2}{d^2}

Unequal group sizes: Let n2=k×n1n_2 = k \times n_1 (allocation ratio kk). The total sample size is minimised when k=1k = 1 (equal groups). For unequal allocation:

n1=(zα/2+z1β)2(1+1/k)d2,n2=k×n1n_1 = \frac{(z_{\alpha/2} + z_{1-\beta})^2(1 + 1/k)}{d^2}, \qquad n_2 = k \times n_1

⚠️ Unequal group sizes are less efficient than equal groups for a given total NN. Unless there is a compelling reason (e.g., one group is more expensive to recruit, or a 2:1 allocation is ethically required), equal groups maximise power per participant.

3.4 Power for the Paired t-Test

Testing whether the mean of paired differences μD=μ1μ2\mu_D = \mu_1 - \mu_2 equals zero.

Cohen's dzd_z (based on the SD of differences):

dz=μDσDd_z = \frac{\mu_D}{\sigma_D}

Where σD=σ12+σ222ρσ1σ2\sigma_D = \sqrt{\sigma_1^2 + \sigma_2^2 - 2\rho\sigma_1\sigma_2} and ρ\rho is the correlation between paired measurements.

The relationship to Cohen's dd for independent groups:

dz=d2(1ρ)d_z = \frac{d}{\sqrt{2(1-\rho)}}

This shows that paired designs are more powerful when ρ>0\rho > 0 — the higher the correlation between paired measurements, the greater the efficiency gain over an independent design.

Sample size (number of pairs):

npairs=(zα/2+z1βdz)2n_{pairs} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d_z}\right)^2

3.5 Power for One-Way ANOVA

Testing whether kk group means are equal (H1H_1: at least two means differ).

Cohen's ff (standardised SD of group means):

f=σmσwithin=j=1knj(μjμ)2/Nσwithin2f = \frac{\sigma_m}{\sigma_{within}} = \sqrt{\frac{\sum_{j=1}^k n_j(\mu_j - \mu)^2 / N}{\sigma_{within}^2}}

Where σm\sigma_m is the SD of the group means and σwithin\sigma_{within} is the common within-group SD.

Relationship to η2\eta^2 (eta-squared, the proportion of variance explained):

f=η21η2f = \sqrt{\frac{\eta^2}{1 - \eta^2}}

Non-centrality parameter:

λ=f2×N\lambda = f^2 \times N

Under H1H_1, the F-statistic follows a non-central F distribution with numerator df =k1= k - 1, denominator df =Nk= N - k, and non-centrality parameter λ\lambda.

Sample size per group (equal groups):

nper  group=λrequiredk×f2n_{per\;group} = \frac{\lambda_{required}}{k \times f^2}

Where λrequired\lambda_{required} is the non-centrality parameter needed to achieve the desired power at the specified α\alpha, df1=k1df_1 = k-1, and df2=k(n1)df_2 = k(n-1) (solved iteratively as df2df_2 depends on nn).

3.6 Power for Pearson Correlation

Testing whether the population correlation ρ=0\rho = 0 (H1:ρ0H_1: \rho \neq 0).

Effect size: The population correlation coefficient ρ\rho itself.

Fisher's z-transformation: To stabilise the variance:

zr=12ln ⁣(1+r1r)=arctanh(r),SEzr=1n3z_r = \frac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right) = \text{arctanh}(r), \qquad SE_{z_r} = \frac{1}{\sqrt{n-3}}

Non-centrality parameter:

λ=zρ×n3\lambda = z_\rho \times \sqrt{n - 3}

Where zρ=arctanh(ρ)z_\rho = \text{arctanh}(\rho).

Sample size:

n=(zα/2+z1βzρ)2+3n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3

3.7 Power for the Chi-Square Test of Association

Testing whether two categorical variables are independent. See also the chi-square tutorial for additional detail.

Effect size — Cohen's ww:

w=ij(PijPiPj)2PiPjw = \sqrt{\sum_{i}\sum_{j} \frac{(P_{ij} - P_{i\cdot}P_{\cdot j})^2}{P_{i\cdot}P_{\cdot j}}}

For 2×22 \times 2 tables, w=ϕw = \phi (phi coefficient); for larger tables, ww is related to Cramér's VV:

w=V×min(r1,  c1)w = V \times \sqrt{\min(r-1,\; c-1)}

Non-centrality parameter:

λ=w2×N\lambda = w^2 \times N

Under H1H_1, χ2\chi^2 follows a non-central chi-square distribution with df=(r1)(c1)df = (r-1)(c-1) and non-centrality parameter λ\lambda.

Sample size:

N=λrequiredw2N = \frac{\lambda_{required}}{w^2}

3.8 Power for Proportion Tests

3.8.1 One-Sample Proportion Test

Testing H0:π=π0H_0: \pi = \pi_0 vs. H1:ππ0H_1: \pi \neq \pi_0.

Cohen's hh (arcsine difference):

h=2arcsin(π1)2arcsin(π0)h = 2\arcsin(\sqrt{\pi_1}) - 2\arcsin(\sqrt{\pi_0})

Sample size:

n=(zα/2+z1βh)2n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2

3.8.2 Two-Sample Proportion Test

Testing H0:π1=π2H_0: \pi_1 = \pi_2 vs. H1:π1π2H_1: \pi_1 \neq \pi_2.

h=2arcsin(π1)2arcsin(π2)h = 2\arcsin(\sqrt{\pi_1}) - 2\arcsin(\sqrt{\pi_2})

nper  group=(zα/2+z1βh)2n_{per\;group} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2

3.9 Power for Multiple Regression

Testing whether a set of predictors explains a meaningful proportion of variance in an outcome, or whether a specific predictor contributes above and beyond others.

Cohen's f2f^2 (for the overall model or incremental R2R^2):

f2=R21R2f^2 = \frac{R^2}{1 - R^2}

For testing an increment in R2R^2 when adding uu new predictors:

f2=Rfull2Rreduced21Rfull2f^2 = \frac{R^2_{full} - R^2_{reduced}}{1 - R^2_{full}}

Non-centrality parameter:

λ=f2×N\lambda = f^2 \times N

Under H1H_1, the F-statistic for testing uu predictors follows a non-central F distribution with df1=udf_1 = u and df2=Np1df_2 = N - p - 1 (where pp is the total number of predictors in the full model).

Sample size:

N=λrequiredf2+p+1N = \frac{\lambda_{required}}{f^2} + p + 1

3.10 Summary of Key Sample Size Formulae

TestEffect SizeSample Size Formula
One-sample z/tddn=(zα/2+z1βd)2n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d}\right)^2
Two-sample t (equal groups)ddnper=2(zα/2+z1β)2d2n_{per} = \frac{2(z_{\alpha/2} + z_{1-\beta})^2}{d^2}
Paired tdzd_znpairs=(zα/2+z1βdz)2n_{pairs} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d_z}\right)^2
Pearson correlationρ\rho (zρz_\rho)n=(zα/2+z1βzρ)2+3n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3
One proportionhhn=(zα/2+z1βh)2n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2
Two proportionshhnper=(zα/2+z1βh)2n_{per} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2
Chi-squarewwN=λreqw2N = \frac{\lambda_{req}}{w^2}
ANOVAffIterative (non-central F)
Regression (f2f^2)f2f^2N=λreqf2+p+1N = \frac{\lambda_{req}}{f^2} + p + 1

3.11 The Power Curve

The power curve plots power (1β1 - \beta) as a function of one of the four analysis inputs while holding the others constant. The most common power curve plots:

  • Power vs. NN: Shows how power increases as sample size grows. For most tests, power rises steeply at first and then plateaus. Used to identify the point of diminishing returns.
  • Power vs. effect size: Shows how power changes across a range of effect sizes for a fixed NN. Used for sensitivity analysis.
  • Power vs. α\alpha: Shows the trade-off between Type I and Type II error rates.

The power curve always satisfies:

  • 1βα1 - \beta \to \alpha as N0N \to 0 or effect size 0\to 0 (cannot do better than chance).
  • 1β11 - \beta \to 1 as NN \to \infty or effect size \to \infty.

4. Considerations and Planning Checklist

4.1 Specifying the Effect Size: The Critical Decision

The effect size is the single most consequential input to a power analysis. Poorly specified effect sizes lead to either seriously underpowered or wastefully overpowered studies. Use the following hierarchy of evidence for effect size specification:

PrioritySourceDescription
1Domain-specific minimum effect of interest (SESOI)The smallest effect that would be practically meaningful. Defined by theory, clinical guidelines, or cost-benefit analysis.
2Prior studies or meta-analysesEffect sizes from published research on the same or very similar questions. Apply a discount for publication bias.
3Pilot studyA small preliminary study; note that pilot effect size estimates are imprecise and should be used with caution.
4Expert opinion or theoretical predictionInformed estimates from domain experts or mathematical models.
5Cohen's conventionsUse as a last resort only. Small/medium/large benchmarks as described in Section 1.6.

⚠️ Do not use the effect size from a pilot study directly. Pilot effect sizes are estimated from small samples and are highly unstable. The pilot effect size will often overestimate the true effect (publication bias in miniature). Use pilot data to confirm feasibility and estimate nuisance parameters (e.g., SD, ICC), not to determine the effect size for the power calculation.

4.2 Choosing the Significance Level (α\alpha)

The choice of α\alpha should be deliberate and justified:

ContextRecommended α\alphaRationale
Standard social/behavioural science.05.05Convention; acceptable Type I:II error ratio
Clinical trial (efficacy).05.05 or .025.025 (one-sided)Regulatory convention
Safety outcomes.01.01 or smallerConsequences of false positives are severe
Exploratory / hypothesis-generating.10.10Higher sensitivity acceptable
Multiple primary outcomes.05/k.05 / k (Bonferroni)Controlling familywise error rate
Genomics / GWAS5×1085 \times 10^{-8}Multiple testing across millions of SNPs
Equivalence testing.05.05 (but applied differently)TOST framework

4.3 Choosing the Power Target (1β1 - \beta)

The power target should balance the cost of missing a real effect against the cost of increasing sample size:

Consider Higher Power WhenConsider Lower Power When
The cost of a false negative is high (clinical safety)Resources are very limited
The study is confirmatory and pre-registeredThe study is exploratory
The effect is expected to be smallThe effect is expected to be large
The study aims to replicate prior findingsA pilot study to assess feasibility
Regulatory approval depends on the resultMultiple outcomes with confirmatory follow-up planned

4.4 Identifying the Primary Outcome and Test

Power analysis is conducted for a single primary hypothesis and its associated primary test. Secondary hypotheses should have their own power analyses if they are to be formally tested.

Before beginning the analysis, clearly specify:

  1. What is the primary outcome variable (and its scale)?
  2. What is the primary comparison or test (e.g., mean difference, correlation)?
  3. What is the statistical test that will be applied?
  4. Is the test one-tailed or two-tailed?
  5. What are the key assumptions (e.g., equal variances, paired vs. independent)?

4.5 Accounting for Anticipated Attrition and Missing Data

In longitudinal studies or clinical trials, participants drop out or produce missing data. The required sample size at enrollment must account for this:

Nenroll=Nanalysis1rattritionN_{enroll} = \frac{N_{analysis}}{1 - r_{attrition}}

Where rattritionr_{attrition} is the expected attrition rate (as a proportion).

Example: A study needs Nanalysis=120N_{analysis} = 120 completers and expects 15% attrition:

Nenroll=12010.15=1200.85142N_{enroll} = \frac{120}{1 - 0.15} = \frac{120}{0.85} \approx 142

For multi-wave longitudinal studies, apply the attrition correction at each wave or use the cumulative attrition rate.

4.6 Accounting for Stratification and Clustering

In studies with complex sampling designs:

  • Stratified designs: Power analysis proceeds separately within strata, then sample sizes are combined.
  • Clustered designs (e.g., school classes, clinical sites): The design effect (DEFF) inflates the required sample size to account for within-cluster correlation (intraclass correlation, ICC):

DEFF=1+(m1)×ICCDEFF = 1 + (m - 1) \times ICC

Ncluster=Nsimple×DEFFN_{cluster} = N_{simple} \times DEFF

Where mm is the average cluster size and ICCICC is the intraclass correlation coefficient. For clustered randomised trials (CRTs):

nclusters=Nclustermn_{clusters} = \frac{N_{cluster}}{m}

ICCm=10m = 10m=20m = 20m=30m = 30
0.01DEFF = 1.091.191.29
0.05DEFF = 1.451.952.45
0.10DEFF = 1.902.903.90
0.20DEFF = 2.804.806.80

4.7 Multiple Testing Corrections

When testing kk hypotheses simultaneously, control the familywise error rate (FWER) or false discovery rate (FDR):

Bonferroni correction (FWER):

α=αk\alpha' = \frac{\alpha}{k}

For each test, use α\alpha' in the power analysis. This increases the required sample size substantially for large kk.

Holm-Bonferroni (less conservative): Apply a sequential correction; compute power for the jj-th most significant test using α=α/(kj+1)\alpha' = \alpha / (k - j + 1).

Benjamini-Hochberg (FDR): Controls the expected proportion of false positives among significant results. Less conservative than Bonferroni for large-scale testing.

4.8 Reporting the Power Analysis: Documentation Standards

A complete power analysis report must include:

ElementDescription
Analysis typeA priori, post-hoc, sensitivity, or criterion
Statistical testExact test and variant used
Effect size and justificationValue, measure used, and source/rationale
Significance level (α\alpha)Value and directionality (one- or two-tailed)
Desired power (1β1 - \beta)Value and rationale
Computed sample sizeTotal NN and per-group nn if applicable
Attrition/non-compliance adjustmentIf applicable
Design effectIf clustered or stratified
Multiple testing correctionIf multiple primary outcomes
Software and versione.g., DataStatPro v4.2

5. Power Analysis for Common Statistical Tests

5.1 One-Sample t-Test

Research question: Does the population mean differ from a known or hypothesised value μ0\mu_0?

Effect size: d=(μ1μ0)/σd = (\mu_1 - \mu_0) / \sigma

Key inputs:

  • μ0\mu_0: Hypothesised value under H0H_0
  • μ1\mu_1: Expected true mean under H1H_1
  • σ\sigma: Population or estimated SD

Power formula: Based on non-central t-distribution with df=n1df = n - 1 and non-centrality parameter λ=dn\lambda = d\sqrt{n}.

5.2 Two-Sample Independent t-Test

Research question: Do two independent group means differ?

Effect size: d=(μ1μ2)/σpooledd = (\mu_1 - \mu_2) / \sigma_{pooled}

Key inputs:

  • μ1\mu_1, μ2\mu_2: Expected means for each group
  • σpooled\sigma_{pooled}: Pooled within-group SD
  • kk: Allocation ratio n2/n1n_2 / n_1 (default k=1k = 1, equal groups)

Assumptions for power analysis:

  • Equal variances (if Welch correction is planned, add 10%\approx 10\% to NN)
  • Normally distributed outcomes within groups

5.3 Paired t-Test

Research question: Does the mean of within-subject or matched-pair differences differ from zero?

Effect size: dz=μD/σDd_z = \mu_D / \sigma_D

Key additional input:

  • ρ\rho: Expected correlation between paired measurements (used to derive σD\sigma_D from σ1\sigma_1 and σ2\sigma_2)

Efficiency gain over independent t-test:

Required pairs=Required per group (independent)2×(1ρ)\text{Required pairs} = \frac{\text{Required per group (independent)}}{2} \times (1 - \rho)

A within-subjects design with ρ=0.50\rho = 0.50 requires approximately half the total participants of an independent-groups design for the same power.

5.4 One-Way ANOVA (Fixed Effects)

Research question: Do kk group means differ?

Effect size: f=σm/σwithinf = \sigma_m / \sigma_{within}

Key inputs:

  • kk: Number of groups
  • μ1,,μk\mu_1, \ldots, \mu_k: Expected group means (or σm\sigma_m, the SD of means)
  • σwithin\sigma_{within}: Common within-group SD
  • nper  groupn_{per\;group}: Equal or specified group sizes

Important: ANOVA power analysis requires specifying the pattern of means (which groups differ by how much), not just the overall effect size. The same ff can arise from very different mean patterns.

5.5 Factorial ANOVA

Research question: Do main effects and/or interactions exist in a factorial design?

Each effect (main effect A, main effect B, interaction A×B) has its own effect size ff and its own power analysis. Key additional considerations:

  • Power for interaction effects is typically much lower than for main effects of the same nominal magnitude.
  • Interaction effect sizes should be estimated directly (not derived from main effects).
  • For a 2×22 \times 2 factorial design, the interaction effect size for a crossover interaction is often set to half the main effect size as a conservative estimate.

5.6 Repeated Measures ANOVA

Research question: Does a measured variable change across kk time points or conditions within subjects?

Effect size: Cohen's ff based on within-person variance.

Key additional input:

  • ρ\rho: Average correlation among repeated measures (intraclass correlation). Higher ρ\rho → greater power advantage of repeated measures over independent groups.

Non-centrality parameter adjustment for repeated measures:

λRM=f2×n×k1ρ\lambda_{RM} = \frac{f^2 \times n \times k}{1 - \rho}

5.7 Pearson Correlation

Research question: Is there a linear relationship between two continuous variables?

Effect size: ρ\rho (population correlation coefficient)

Sample size:

n=(zα/2+z1βzρ)2+3n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3

Where zρ=arctanh(ρ)=12ln1+ρ1ρz_\rho = \text{arctanh}(\rho) = \frac{1}{2}\ln\frac{1+\rho}{1-\rho}.

Note: Power for correlation tests is low for small correlations. Detecting ρ=0.30\rho = 0.30 with 80% power at α=.05\alpha = .05 requires n84n \approx 84.

5.8 Multiple Regression

Research question: Does a set of pp predictors explain variance in the outcome? Or does adding uu predictors significantly improve prediction?

Effect size: f2=R2/(1R2)f^2 = R^2 / (1 - R^2) (overall model) or f2=ΔR2/(1Rfull2)f^2 = \Delta R^2 / (1 - R^2_{full}) (incremental)

Key inputs:

  • uu: Number of predictors being tested
  • pp: Total predictors in the full model
  • R2R^2 or ΔR2\Delta R^2: Expected variance explained

Important: Power in multiple regression depends on the number of predictors being tested (uu), not on the total model. Testing a single predictor in a model with many covariates uses u=1u = 1; testing the full model uses u=pu = p.

5.9 Chi-Square Test of Association

Research question: Are two categorical variables associated?

Effect size: Cohen's ww (related to Cramér's VV: w=Vmin(r1,c1)w = V\sqrt{\min(r-1, c-1)})

Key inputs:

  • Table dimensions: rr rows, cc columns
  • df=(r1)(c1)df = (r-1)(c-1)
  • Expected cell proportions under H1H_1

Important: For 2×22 \times 2 tables, w=ϕw = \phi and the formula simplifies to the two-proportion case. For larger tables, specifying the full expected cell proportion matrix provides the most accurate power estimate.

5.10 Test Type Comparison Table

TestEffect SizedfdfNon-Central Dist.Equal Groups Optimal?
One-sample tddn1n-1Non-central tN/A
Two-sample tddn1+n22n_1+n_2-2Non-central tYes
Paired tdzd_zn1n-1Non-central tN/A
One-way ANOVAffk1k-1; NkN-kNon-central FYes
Correlationρ\rhon2n-2Non-central tN/A
Multiple regressionf2f^2uu; Np1N-p-1Non-central FN/A
Chi-squareww(r1)(c1)(r-1)(c-1)Non-central χ2\chi^2Yes
Proportion (one)hhNormalN/A
Proportion (two)hhNormalYes

6. Using the Sample Size and Power Analysis Calculator Component

The Sample Size and Power Analysis Calculator in DataStatPro provides a comprehensive tool for conducting, visualising, and reporting power analyses for all common statistical tests.

Step-by-Step Guide

Step 1 — Navigate to the Component

Go to Study Design → Sample Size and Power Analysis.

Step 2 — Select the Analysis Mode

Choose one of the four analysis modes:

  • A priori: Compute required sample size.
  • Post-hoc: Compute achieved power.
  • Sensitivity: Compute minimum detectable effect size.
  • Criterion: Compute required α\alpha (advanced use).

Step 3 — Select the Statistical Test

Choose the test family and specific test from the hierarchical menu:

  • Mean Tests
    • One-Sample t-Test
    • Two-Sample Independent t-Test
    • Paired t-Test
    • One-Way ANOVA
    • Factorial ANOVA (Two-Way, Three-Way)
    • Repeated Measures ANOVA
  • Association Tests
    • Pearson Correlation
    • Spearman Correlation
    • Multiple Regression (Linear)
  • Categorical Tests
    • Chi-Square Test of Association
    • Chi-Square Goodness-of-Fit
    • One-Sample Proportion Test
    • Two-Sample Proportion Test
  • Survival Analysis
    • Log-Rank Test
    • Cox Proportional Hazards
  • Advanced
    • Equivalence Test (TOST)
    • Non-Inferiority Test
    • Clustered Design (CRT)
    • Generic Non-Central Distribution

Step 4 — Specify Effect Size

Choose your effect size specification method:

  • Direct entry: Enter the effect size measure directly (e.g., d=0.50d = 0.50).
  • From parameters: Enter the raw parameters (e.g., μ1=100\mu_1 = 100, μ2=95\mu_2 = 95, σ=15\sigma = 15) and DataStatPro computes the effect size automatically.
  • From proportions: Enter π1\pi_1 and π2\pi_2 for proportion tests; DataStatPro computes Cohen's hh automatically.
  • From expected table: Enter the full expected cell proportion matrix for chi-square tests.
  • Effect size calculator: Use DataStatPro's built-in effect size converter to transform between dd, rr, ff, η2\eta^2, ω2\omega^2, OR, and RR.

Step 5 — Specify Remaining Parameters

Depending on the analysis mode, enter the known quantities:

  • Significance level (α\alpha): Default .05.05; specify .01.01 or .001.001 if needed.
  • Desired power (1β1 - \beta): Default .80.80; options .80.80, .85.85, .90.90, .95.95, .99.99, or custom.
  • Directionality: Two-tailed (default) or one-tailed.
  • Number of groups / predictors: As applicable to the selected test.
  • Allocation ratio k=n2/n1k = n_2/n_1: For two-group tests (default k=1k = 1).
  • ICC and cluster size: For clustered designs.
  • Attrition rate: For enrollment adjustment.

Step 6 — Set Display Options

  • ✅ Primary result: Required NN (or power, or MDE) with exact formula.
  • ✅ Enrollment NN (attrition-adjusted).
  • ✅ Per-group nn breakdown.
  • ✅ Power curve: Power vs. NN for current effect size and α\alpha.
  • ✅ Sensitivity curve: Power vs. effect size for current NN and α\alpha.
  • ✅ Power contour plot: Power as a function of both NN and effect size.
  • ✅ Non-centrality parameter λ\lambda and critical value.
  • ✅ Type I error rate (α\alpha), Type II error rate (β\beta), power (1β1-\beta).
  • ✅ Summary table: Power at N±10%N \pm 10\%, N±25%N \pm 25\%, N±50%N \pm 50\%.
  • ✅ Design effect and ICC-adjusted NN (for clustered designs).
  • ✅ APA 7th edition power analysis paragraph (auto-generated).

Step 7 — Run the Analysis

Click "Compute Sample Size / Power". DataStatPro will:

  1. Convert effect size inputs to the required format (apply transformations if needed).
  2. Solve for the requested output using exact non-central distribution methods.
  3. Apply attrition, ICC, and multiple testing corrections if specified.
  4. Generate power curve, sensitivity curve, and contour plot.
  5. Produce the APA-compliant power analysis reporting paragraph.

7. Step-by-Step Procedure

7.1 Full Manual Procedure for A Priori Sample Size Calculation

Step 1 — Identify the Primary Research Question

State the primary outcome, the comparison of interest, and the direction of the hypothesised effect. Confirm the appropriate statistical test.

Step 2 — Choose the Significance Level and Directionality

State α\alpha and justify the choice. Specify whether the test is one-tailed or two-tailed, with justification.

Identify zα/2z_{\alpha/2} (two-tailed) or zαz_\alpha (one-tailed) from the standard normal distribution:

α\alphazα/2z_{\alpha/2} (two-tailed)zαz_\alpha (one-tailed)
.10.101.6451.282
.05.051.9601.645
.025.0252.2411.960
.01.012.5762.326
.001.0013.2913.090

Step 3 — Choose the Power Target

State the desired power 1β1 - \beta and justify the choice. Identify z1βz_{1-\beta}:

Power (1β1-\beta)z1βz_{1-\beta}
0.700.700.524
0.800.800.842
0.850.851.036
0.900.901.282
0.950.951.645
0.990.992.326

Step 4 — Specify and Justify the Effect Size

State the effect size measure, its numerical value, and the source or rationale. Convert raw parameters to a standardised effect size using the appropriate formula (Section 3).

Step 5 — Apply the Sample Size Formula

Substitute the values of zα/2z_{\alpha/2}, z1βz_{1-\beta}, and the effect size into the appropriate formula from Section 3.10. Round up to the nearest whole number.

⚠️ Always round the required NN UP, never down. Rounding down results in a study with slightly less power than targeted.

Step 6 — Adjust for Unequal Groups (If Applicable)

For two-group designs with unequal allocation (ratio k1k \neq 1):

n1=(zα/2+z1β)2(1+1/k)d2,n2=k×n1n_1 = \left\lceil \frac{(z_{\alpha/2} + z_{1-\beta})^2(1 + 1/k)}{d^2} \right\rceil, \qquad n_2 = \lceil k \times n_1 \rceil

Verify that the total N=n1+n2N = n_1 + n_2 achieves the target power with exact non-central distribution methods.

Step 7 — Adjust for Attrition

Nenroll=Nanalysis1rattritionN_{enroll} = \left\lceil \frac{N_{analysis}}{1 - r_{attrition}} \right\rceil

Step 8 — Adjust for Clustering (If Applicable)

DEFF=1+(m1)×ICC,Ncluster=Nsimple×DEFFDEFF = 1 + (m - 1) \times ICC, \qquad N_{cluster} = \lceil N_{simple} \times DEFF \rceil

Number of clusters: nclusters=Ncluster/mn_{clusters} = \lceil N_{cluster} / m \rceil

Step 9 — Adjust for Multiple Testing (If Applicable)

Replace α\alpha with α=α/k\alpha' = \alpha/k in the sample size formula (Bonferroni), where kk is the number of primary hypotheses.

Step 10 — Verify with Power Curve

Using the computed NN, confirm the achieved power with exact non-central distribution calculations. Plot the power curve to show power at values of NN above and below the target. Confirm the achieved power is at or above the target.

Step 11 — Conduct Sensitivity Analysis

Report the minimum detectable effect size (MDE) at the computed NN:

dMDE=zα/2+z1βn/2d_{MDE} = \frac{z_{\alpha/2} + z_{1-\beta}}{\sqrt{n/2}} (two-sample t, per group)

This tells stakeholders the smallest effect the study is designed to detect.

Step 12 — Document and Report

Compile all inputs and outputs into a complete power analysis report (APA format provided in Section 15). Retain all working for audit and reproducibility.


8. Interpreting the Output

8.1 Reading the Required Sample Size

Output FeatureInterpretation
Total NNMinimum valid observations needed to complete the analysis
Per-group nnNumber needed in each arm (for multi-group tests)
Enrollment NNInflate total NN by attrition rate; participants to recruit
Exact achieved powerPower at the computed (rounded up) NN; should be \geq target
Power at N1N - 1Confirms one fewer participant would fall below the target power

8.2 Understanding the Power Curve

Feature of the Power CurveMeaning
Steep rise at low NNEach additional participant greatly increases power in this range
Plateau at high NNDiminishing returns; additional participants add little power
Power curve above target lineCurrent NN meets or exceeds the power requirement
Power curve crossing 0.50This NN yields coin-flip odds of detecting the true effect
Power at N=0N = 0Equals α\alpha (the false positive rate); cannot do worse than chance

8.3 Sensitivity Output: Minimum Detectable Effect

The minimum detectable effect (MDE) is the smallest effect the study has the specified power to detect:

MDE InterpretationAction
MDE << SESOIStudy is adequately powered to detect the smallest meaningful effect
MDE == SESOIStudy is precisely powered; just barely detects the minimum meaningful effect
MDE >> SESOIStudy is underpowered; cannot reliably detect the minimum meaningful effect

Report the MDE in original units (not just in standardised form) to make it interpretable to domain experts who may not be familiar with Cohen's dd.

8.4 The Non-Centrality Parameter (λ\lambda)

The non-centrality parameter λ\lambda summarises the total signal in the study — it captures both the effect size and the sample size:

λ=ES2×N(approximately, for many tests)\lambda = ES^2 \times N \quad \text{(approximately, for many tests)}

λ\lambda InterpretationMeaning
λ0\lambda \to 0Power α\to \alpha; study cannot distinguish H1H_1 from H0H_0
λ=λcrit\lambda = \lambda_{crit}Power is exactly at the target level
λ\lambda largeHigh power; test statistic distribution under H1H_1 well separated from critical value

8.5 Interpreting Achieved Post-Hoc Power

Post-hoc power (computed after data collection using the observed effect size) has a deterministic relationship with the p-value:

Post-Hoc Power RelationshipMeaning
p=αp = \alpha exactlyPost-hoc power =0.50= 0.50 always (mathematical identity)
p<αp < \alphaPost-hoc power >0.50> 0.50
p>αp > \alphaPost-hoc power <0.50< 0.50

Because of this mathematical relationship, post-hoc power using the observed effect size adds no information beyond the p-value. Instead, report:

  • The 95% CI for the effect size.
  • A sensitivity analysis: "What is the power for a range of plausible true effect sizes?"

8.6 Interpreting the Contour Plot

The power contour plot displays power as a function of both NN and effect size simultaneously, with contour lines at specific power levels (e.g., 0.60, 0.70, 0.80, 0.90, 0.95):

Region of the Contour PlotInterpretation
Above the 0.80 contourCombinations of NN and effect size where power 0.80\geq 0.80
Below the 0.80 contourUnderpowered for those NN and effect size combinations
Current study positionMarked on the plot; shows where the study falls relative to power targets
Steep contoursPower changes rapidly with NN in this region (steep learning curve)
Flat contoursDiminishing returns; large increases in NN needed for modest power gains

9. Visualising Power and Sample Size

9.1 Power Curve (Power vs. Sample Size)

The power curve is the primary visualisation for a priori power analysis. It plots the statistical power (yy-axis) as a function of sample size (xx-axis) for fixed α\alpha and effect size.

Key annotations on the DataStatPro power curve:

  • A horizontal dashed line at the target power level (e.g., 0.80).
  • A vertical dashed line at the required NN.
  • The intersection point highlighted and labelled with exact NN and power.
  • Shaded region below the target power: underpowered zone.
  • Optional: Multiple curves for different effect sizes or α\alpha levels.

Best practices:

  • Set the x-axis range to display at least [1,3×Nrequired][1, 3 \times N_{required}] to show the full shape of the curve.
  • Annotate the MDE at the required NN.
  • Use a logarithmic x-axis when the required NN is very large to avoid compression of the curve at small NN.

9.2 Sensitivity Curve (Power vs. Effect Size)

The sensitivity curve plots power (yy-axis) as a function of effect size (xx-axis) for a fixed NN and α\alpha.

Use cases:

  • Assessing robustness: "What happens to power if the true effect is smaller than anticipated?"
  • Reporting minimum detectable effect: The effect size at which the curve crosses the target power line.
  • Communicating uncertainty about the effect size to stakeholders.

Best practices:

  • Mark Cohen's small/medium/large benchmarks as vertical reference lines.
  • Annotate the MDE (the effect size where the curve intersects the target power line).
  • Shade the "adequately powered" region (to the right of the MDE).

9.3 Power Contour Plot (Power as a Function of NN and Effect Size)

The contour plot provides the most comprehensive two-dimensional view of how power depends on both sample size and effect size. Contour lines connect combinations of (N,ES)(N, ES) that yield equal power.

Reading the contour plot:

  • The study's planned (N,ES)(N, ES) combination is marked.
  • Power target contour (e.g., 0.80) divides the plot into adequate and inadequate power regions.
  • Researchers can identify the trade-off: increasing the effect size estimate by a given amount allows reducing NN by a corresponding amount while maintaining power.

9.4 Error Rate Trade-Off Plot

The error rate trade-off plot visualises the relationship between α\alpha (Type I error rate) and β\beta (Type II error rate = 11 - power) for a fixed NN and effect size:

  • As α\alpha decreases (stricter threshold), β\beta increases (lower power).
  • The optimal trade-off depends on the relative costs of Type I and Type II errors in the specific application.

Useful for:

  • Choosing between α=.05\alpha = .05 and α=.01\alpha = .01 given limited sample size.
  • Demonstrating to reviewers the implications of changing the significance threshold.

9.5 G*Power-Style Distribution Plot

DataStatPro generates the classic two-distribution diagram showing:

  • The central distribution of the test statistic under H0H_0 (blue curve).
  • The non-central distribution of the test statistic under H1H_1 (orange curve).
  • The critical value (tcritt_{crit}) as a vertical dashed line.
  • The α\alpha region (critical region under H0H_0, right tail of blue curve).
  • The power (1β1 - \beta) region (area under the orange curve beyond tcritt_{crit}).
  • The β\beta region (area under the orange curve to the left of tcritt_{crit}).

This plot is highly effective for teaching the concept of power and for communicating results to non-statistician audiences.

9.6 Sample Size Comparison Table Plot

For multiple scenarios (e.g., small/medium/large effect; power = 0.80/0.90/0.95), DataStatPro generates a bubble chart or heatmap where:

  • Rows represent power targets.
  • Columns represent effect sizes.
  • Cell values (or bubble sizes) represent the required NN.

This provides a rapid overview of how sample size requirements vary across the range of plausible inputs, supporting scenario planning.

9.7 Attrition-Adjusted Recruitment Funnel

For longitudinal studies or clinical trials, DataStatPro generates a funnel diagram showing:

  • Enrollment target (accounting for attrition).
  • Expected completers at each wave.
  • Final analytic sample.
  • Required sample vs. expected completers — highlighting any shortfall.

10. Sensitivity Analysis and Robustness Checks

10.1 What Is a Sensitivity Analysis in Power Planning?

A sensitivity analysis for power examines how the required sample size (or achieved power) changes as input parameters vary within plausible ranges. It answers: "How robust is my power calculation to uncertainty in the assumed effect size, standard deviation, or other inputs?"

10.2 Varying the Effect Size

The most important sensitivity analysis varies the effect size across a range defined by:

  • The SESOI (lower bound — the smallest effect that matters).
  • The expected effect from prior literature (central estimate).
  • A larger, optimistic effect (upper bound).

Report NrequiredN_{required} for each scenario:

ScenarioEffect SizeNrequiredN_{required} (power = 0.80)NrequiredN_{required} (power = 0.90)
Pessimistic (SESOI)SmallLargestLargest
Most likelyMediumTargetTarget
OptimisticLargeSmallestSmallest

Decision rule: Plan for the scenario that produces the largest NN to ensure adequate power across all plausible effect sizes.

10.3 Varying the Standard Deviation

For mean-based tests, the effect size d=Δμ/σd = \Delta\mu / \sigma depends on σ\sigma. If σ\sigma is estimated from a pilot study or literature with uncertainty, sensitivity analysis should vary σ\sigma across a plausible range (e.g., σ±20%\sigma \pm 20\%):

dlower=Δμσupper,dupper=Δμσlowerd_{lower} = \frac{\Delta\mu}{\sigma_{upper}}, \qquad d_{upper} = \frac{\Delta\mu}{\sigma_{lower}}

Report NrequiredN_{required} for dlowerd_{lower} and dupperd_{upper} as the worst and best cases.

10.4 The "What If" Power Table

A comprehensive "What If" power table reports power for a grid of NN values and effect sizes, enabling researchers and reviewers to assess robustness:

NN per groupd=0.20d = 0.20d=0.30d = 0.30d=0.40d = 0.40d=0.50d = 0.50d=0.80d = 0.80
20.10.18.29.41.69
30.13.23.38.54.83
50.17.32.52.70.94
80.23.45.68.85.99
100.26.52.75.90.99
150.33.64.86.96>.99>.99
200.39.73.92.99>.99>.99

(Two-sample independent t-test, α=.05\alpha = .05, two-tailed)

10.5 Bayesian Power Analysis

Classical power analysis assumes a fixed, known effect size. Bayesian power analysis incorporates uncertainty about the effect size by averaging power over a prior distribution of effect sizes:

Power=0(1β(d))×p(d)  dd\overline{Power} = \int_0^\infty (1 - \beta(d)) \times p(d) \; dd

Where p(d)p(d) is the prior distribution on the effect size dd (e.g., a half-normal or truncated normal distribution).

Average power is always lower than the power at the expected effect size. If the prior is wide (high uncertainty), average power can be substantially lower than the nominal target. DataStatPro supports average power calculations under normal, half-normal, and uniform prior distributions.

10.6 Sequential and Adaptive Designs

Traditional power analysis assumes a fixed sample size collected before any analysis. Sequential designs allow interim analyses with pre-specified stopping rules, which can reduce the expected sample size while controlling error rates.

Key concepts:

ConceptDescription
Group sequential designPlanned interim analyses with O'Brien-Fleming or Pocock stopping boundaries
Alpha spendingControls FWER across all interim and final analyses
Expected sample sizeAverage NN under H0H_0 and H1H_1; may be less than fixed design
Inflation factorRequired NN is larger than fixed design to preserve power after early stopping

DataStatPro supports group sequential design power analysis with O'Brien-Fleming, Pocock, and Kim-DeMets (power family) alpha spending functions.

10.7 Equivalence and Non-Inferiority Tests

Standard power analysis targets superiority — detecting that an effect is non-zero. Equivalence tests (TOST) and non-inferiority tests have different frameworks:

Equivalence (TOST — Two One-Sided Tests):

H0H_0: μ1μ2ΔE|\mu_1 - \mu_2| \geq \Delta_E (effect is outside equivalence bounds) H1H_1: μ1μ2<ΔE|\mu_1 - \mu_2| < \Delta_E (effect is within equivalence bounds)

Sample size for TOST (per group):

n=2(zα+z1β)2σ2(ΔEδ)2n = \frac{2(z_\alpha + z_{1-\beta})^2 \sigma^2}{(\Delta_E - |\delta|)^2}

Where ΔE\Delta_E is the equivalence margin and δ\delta is the assumed true difference.

Non-inferiority:

H0H_0: μ1μ2ΔNI\mu_1 - \mu_2 \leq -\Delta_{NI} (treatment is inferior by more than the margin) H1H_1: μ1μ2>ΔNI\mu_1 - \mu_2 > -\Delta_{NI} (treatment is not inferior)

Sample size (per group):

n=(zα+z1β)2σ2(δ+ΔNI)2n = \frac{(z_\alpha + z_{1-\beta})^2 \sigma^2}{(\delta + \Delta_{NI})^2}

Where ΔNI\Delta_{NI} is the non-inferiority margin and δ\delta is the expected true difference (δ=0\delta = 0 for a conservative assumption).


11. Advanced Topics

11.1 Effect Size Conversion

It is often necessary to convert between effect size measures. DataStatPro's built-in converter handles all common transformations:

FromToFormula
rr (correlation)ddd=2r1r2d = \frac{2r}{\sqrt{1-r^2}}
ddrrr=dd2+4r = \frac{d}{\sqrt{d^2 + 4}}
ddη2\eta^2η2=d2d2+4\eta^2 = \frac{d^2}{d^2 + 4}
ffη2\eta^2η2=f21+f2\eta^2 = \frac{f^2}{1 + f^2}
f2f^2R2R^2R2=f21+f2R^2 = \frac{f^2}{1 + f^2}
ORORddd=ln(OR)π/3ln(OR)1.814d = \frac{\ln(OR)}{\pi/\sqrt{3}} \approx \frac{\ln(OR)}{1.814}
ϕ\phiddd=2ϕ1ϕ2d = \frac{2\phi}{\sqrt{1-\phi^2}}
hhϕ\phiϕ=sin(h/2)\phi = \sin(h/2) (approximately, for small hh)

11.2 The Winner's Curse and Effect Size Inflation

Studies with low power that happen to produce a significant result tend to produce inflated effect size estimates. This phenomenon — the "Winner's Curse" — occurs because a small-nn study can only reach significance when the observed effect happens to be larger than the true effect by chance.

Consequences:

  • Effect sizes from small, significant studies overestimate the true population effect.
  • Replication studies using these inflated effect sizes are often underpowered.
  • The "replication crisis" in psychology and other sciences is partly driven by this phenomenon.

Mitigation:

  • Base power calculations on conservative (smaller) effect size estimates.
  • Use effect sizes from meta-analyses rather than individual significant studies.
  • Apply a shrinkage factor (e.g., dplanned=0.75×dpublishedd_{planned} = 0.75 \times d_{published}) as a conservative hedge.

11.3 Power Analysis for Multilevel Models

For multilevel (hierarchical) models with data nested within clusters (students within schools, patients within clinics):

The effective sample size for a cluster-randomised trial depends on both the number of clusters JJ and the cluster size mm:

Neff=J×m1+(m1)×ICCN_{eff} = \frac{J \times m}{1 + (m-1) \times ICC}

Power depends primarily on the number of clusters (not the number of individuals per cluster) when the ICC is high. Doubling the number of individuals per cluster has diminishing returns once m>1/ICCm > 1/ICC.

Optimal allocation: Add more clusters (not more individuals per cluster) when the ICC is high or when between-cluster variance is the limiting factor.

11.4 Power for Survival Analysis (Log-Rank Test)

For survival outcomes (time to event), the log-rank test's power depends on the number of events (not the sample size):

Required number of events (for two-group comparison):

E=4(zα/2+z1β)2(lnHR)2E = \frac{4(z_{\alpha/2} + z_{1-\beta})^2}{(\ln HR)^2}

Where HRHR is the hypothesised hazard ratio under H1H_1.

Required total NN (accounting for censoring rate cc):

N=E1cN = \frac{E}{1 - c}

The key insight is that studies with high censoring rates need larger NN to accumulate enough events — extending the follow-up period is often more efficient than increasing NN.

11.5 Precision Analysis: Planning for Confidence Interval Width

An alternative to power analysis is precision analysis — planning NN to achieve a desired confidence interval width, rather than a desired power level. This is consistent with an estimation-focused approach and does not require specifying the effect size under H1H_1.

Required NN for a 95% CI of width ±δ\pm \delta for the mean:

n=(1.96×σδ)2n = \left(\frac{1.96 \times \sigma}{\delta}\right)^2

Required NN for a 95% CI of width ±δ\pm \delta for a proportion:

n=1.962×p^(1p^)δ2n = \frac{1.96^2 \times \hat{p}(1-\hat{p})}{\delta^2}

Using p^=0.50\hat{p} = 0.50 gives the most conservative (largest) nn.

11.6 Prospective Power Analysis for Replication Studies

When planning a replication study of a previously published finding:

  1. Extract the original study's effect size and its SE (or CI).
  2. Apply the shrinkage factor: drep=0.75×doriginald_{rep} = 0.75 \times d_{original} (conservative hedge).
  3. Compute required NN for drepd_{rep} at power =0.90= 0.90 (higher than 0.80 to account for uncertainty).
  4. Report both the nominal power (if drepd_{rep} is correct) and the power at d=0.50×doriginald = 0.50 \times d_{original} (robustness check).

11.7 Negative Findings and Equivalence: Planning for Both

A study designed to test for superiority may fail to reject H0H_0 but not demonstrate equivalence. Planning for both outcomes requires pre-specifying:

  1. Equivalence margin ΔE\Delta_E: The largest effect that would be practically negligible.
  2. A TOST equivalence test as a secondary analysis alongside the primary superiority test.
  3. Sufficient power for both: The sample size is the maximum of NsuperiorityN_{superiority} and NequivalenceN_{equivalence}.

11.8 Reporting Power in Pre-Registration

Pre-registration of power analyses on platforms such as the Open Science Framework (OSF), ClinicalTrials.gov, or AsPredicted.org requires:

ElementRequired Detail
Research question and primary hypothesisSpecific and testable
Primary outcome and statistical testNamed explicitly
Effect size and justificationValue, measure, and source
α\alpha, power target, directionalityAll three specified
Computed NNTotal and per group
Attrition and design adjustmentsIf applicable
Software usedName and version
Deviation policyWhat will happen if NN cannot be reached

Pre-registration creates a public record of the planned analysis and protects against post-hoc power manipulation and researcher degrees of freedom.


12. Worked Examples

Example 1: A Priori — Two-Sample Independent t-Test

A clinical researcher plans to compare the effectiveness of a new cognitive training programme (Group A) vs. standard care (Group B) on memory scores. Based on a published meta-analysis, the expected Cohen's d=0.45d = 0.45. The researcher wants α=.05\alpha = .05 (two-tailed) and power =0.80= 0.80.

Step 1 — Effect size: d=0.45d = 0.45 (from meta-analysis).

Step 2 — Look up constants:

  • zα/2=z0.025=1.960z_{\alpha/2} = z_{0.025} = 1.960
  • z1β=z0.80=0.842z_{1-\beta} = z_{0.80} = 0.842

Step 3 — Apply formula:

nper  group=2(1.960+0.842)20.452=2×7.8520.2025=15.7040.2025=77.55n_{per\;group} = \frac{2(1.960 + 0.842)^2}{0.45^2} = \frac{2 \times 7.852}{0.2025} = \frac{15.704}{0.2025} = 77.55

Round up: nper  group=78n_{per\;group} = 78, Ntotal=156N_{total} = 156.

Step 4 — Verify with exact non-central t:

At n=78n = 78 per group: λ=0.45×78/2=0.45×6.245=2.810\lambda = 0.45 \times \sqrt{78/2} = 0.45 \times 6.245 = 2.810

1β=P(t154>tcritλ=2.810)1 - \beta = P(t_{154} > t_{crit} \mid \lambda = 2.810)

Using the non-central t-distribution: 1β=0.8021 - \beta = 0.802 ✅ (meets the 0.80 target).

Step 5 — Attrition adjustment (expecting 12% dropout):

Nenroll=156/(10.12)=156/0.88=177.3=178N_{enroll} = \lceil 156 / (1 - 0.12) \rceil = \lceil 156 / 0.88 \rceil = \lceil 177.3 \rceil = 178

Step 6 — MDE at n=78n = 78 per group:

dMDE=1.960+0.84278/2=2.8026.245=0.449d_{MDE} = \frac{1.960 + 0.842}{\sqrt{78/2}} = \frac{2.802}{6.245} = 0.449

The study is designed to detect effects of d0.45d \geq 0.45 with 80% power.

Summary:

ParameterValue
TestTwo-sample independent t-test (two-tailed)
Effect sized=0.45d = 0.45 (meta-analysis)
α\alpha.05.05
Power target0.800.80
NN per group (analysis)78
NN total (analysis)156
Achieved power0.8020.802
NN total (enrollment; 12% attrition)178
MDEd=0.449d = 0.449

APA write-up: "An a priori power analysis conducted in DataStatPro indicated that 78 participants per group (total N=156N = 156) were required to detect an effect of d=0.45d = 0.45 with 80% power at a two-tailed α=.05\alpha = .05 (achieved power = 0.80). The effect size was based on a published meta-analysis. Assuming 12% attrition, 89 participants per group (total N=178N = 178) will be recruited."


Example 2: A Priori — One-Way ANOVA (Three Groups)

An education researcher compares three teaching methods on test performance. Literature suggests group means of 65, 70, and 68 with a common within-group SD of 12. α=.05\alpha = .05, power target =0.80= 0.80.

Step 1 — Compute Cohen's ff:

Grand mean: μ=(65+70+68)/3=67.67\mu = (65 + 70 + 68)/3 = 67.67

σm=(6567.67)2+(7067.67)2+(6867.67)23=7.11+5.44+0.113=4.22=2.054\sigma_m = \sqrt{\frac{(65-67.67)^2 + (70-67.67)^2 + (68-67.67)^2}{3}} = \sqrt{\frac{7.11 + 5.44 + 0.11}{3}} = \sqrt{4.22} = 2.054

f=σmσwithin=2.05412=0.171f = \frac{\sigma_m}{\sigma_{within}} = \frac{2.054}{12} = 0.171

This is between Cohen's small (f=0.10f = 0.10) and medium (f=0.25f = 0.25) benchmarks.

Step 2 — Compute η2\eta^2 equivalent:

η2=f21+f2=0.02931.0293=0.0284\eta^2 = \frac{f^2}{1 + f^2} = \frac{0.0293}{1.0293} = 0.0284

Step 3 — Required NN (iterative, via DataStatPro):

Using non-central F with df1=2df_1 = 2, df2=N3df_2 = N - 3:

DataStatPro iterates: at n=53n = 53 per group (N=159N = 159): power =0.804= 0.804

Step 4 — Attrition adjustment (8%):

Nenroll=159/0.92=173N_{enroll} = \lceil 159 / 0.92 \rceil = 173

Summary:

ParameterValue
TestOne-way ANOVA (k=3k = 3), two-tailed
Effect sizef=0.171f = 0.171; η2=.028\eta^2 = .028
α\alpha.05.05
Power target0.800.80
nn per group (analysis)53
NN total (analysis)159
Achieved power0.8040.804
NN total (enrollment; 8% attrition)173

APA write-up: "A priori power analysis for a one-way ANOVA with three groups indicated that 53 participants per group (total N=159N = 159) were required to detect f=0.17f = 0.17 (η2=.03\eta^2 = .03) with 80% power at α=.05\alpha = .05 (achieved power = 0.80). The expected group means (M=65M = 65, 7070, 6868; pooled SD=12SD = 12) were derived from the literature. With anticipated 8% attrition, 58 participants per group (total N=173N = 173) will be recruited."


Example 3: A Priori — Pearson Correlation

A developmental psychologist hypothesises a moderate correlation (ρ=0.35\rho = 0.35) between parental involvement (hours/week) and child academic achievement. α=.05\alpha = .05 (two-tailed), power =0.90= 0.90.

Step 1 — Fisher z-transformation:

zρ=arctanh(0.35)=12ln ⁣(1.350.65)=0.3654z_\rho = \text{arctanh}(0.35) = \frac{1}{2}\ln\!\left(\frac{1.35}{0.65}\right) = 0.3654

Step 2 — Look up constants:

  • zα/2=1.960z_{\alpha/2} = 1.960
  • z1β=z0.90=1.282z_{1-\beta} = z_{0.90} = 1.282

Step 3 — Apply formula:

n=(1.960+1.2820.3654)2+3=(3.2420.3654)2+3=(8.872)2+3=78.71+3=81.71n = \left(\frac{1.960 + 1.282}{0.3654}\right)^2 + 3 = \left(\frac{3.242}{0.3654}\right)^2 + 3 = (8.872)^2 + 3 = 78.71 + 3 = 81.71

Round up: n=82n = 82.

Step 4 — MDE (minimum detectable correlation at n=82n = 82, power =0.90= 0.90):

zρMDE=zα/2+z1βn3=1.960+1.28279=3.2428.888=0.3648z_{\rho_{MDE}} = \frac{z_{\alpha/2} + z_{1-\beta}}{\sqrt{n-3}} = \frac{1.960 + 1.282}{\sqrt{79}} = \frac{3.242}{8.888} = 0.3648

ρMDE=tanh(0.3648)=0.349\rho_{MDE} = \tanh(0.3648) = 0.349

Summary:

ParameterValue
TestPearson correlation (two-tailed)
Effect sizeρ=0.35\rho = 0.35 (literature)
α\alpha.05.05
Power target0.900.90
Required nn82
Achieved power0.9000.900
MDEρ=0.349\rho = 0.349

APA write-up: "Based on an expected correlation of ρ=.35\rho = .35, a priori power analysis indicated that n=82n = 82 participants were required to achieve 90% power at α=.05\alpha = .05 (two-tailed). Calculations were conducted using DataStatPro."


Example 4: A Priori — Chi-Square Test of Association (2 × 3 Table)

A sociologist examines the association between age group (18–34, 35–54, 55+) and preferred news source (online, print, broadcast). Based on the literature, the expected cell proportions are:

OnlinePrintBroadcast
18–34.18.04.11
35–54.10.09.14
55+.06.12.16

α=.05\alpha = .05, power =0.80= 0.80.

Step 1 — Compute marginal proportions:

Row marginals: P1834=.33P_{18-34} = .33, P3554=.33P_{35-54} = .33, P55+=.34P_{55+} = .34

Column marginals: Ponline=.34P_{online} = .34, Pprint=.25P_{print} = .25, Pbroadcast=.41P_{broadcast} = .41

Step 2 — Compute Cohen's ww:

w=ij(PijPiPj)2PiPjw = \sqrt{\sum_{i}\sum_{j} \frac{(P_{ij} - P_{i\cdot}P_{\cdot j})^2}{P_{i\cdot}P_{\cdot j}}}

DataStatPro computes: w=0.187w = 0.187 (using the full cell proportion matrix).

Step 3 — Degrees of freedom:

df=(31)(31)=4df = (3-1)(3-1) = 4

Step 4 — Required NN (non-central chi-square, DataStatPro):

Nrequired=λrequiredw2N_{required} = \frac{\lambda_{required}}{w^2}

At df=4df = 4, α=.05\alpha = .05, power =0.80= 0.80: λrequired=10.90\lambda_{required} = 10.90

N=10.900.1872=10.900.0350=311.4N = \frac{10.90}{0.187^2} = \frac{10.90}{0.0350} = 311.4

Round up: N=312N = 312.

Summary:

ParameterValue
TestChi-square test of association (3×33 \times 3 table)
Effect sizew=0.187w = 0.187 (from expected cell proportions)
α\alpha.05.05
Power target0.800.80
Required NN312
Achieved power0.8010.801

APA write-up: "An a priori power analysis for a 3×33 \times 3 chi-square test of association indicated that N=312N = 312 participants were required to detect w=0.19w = 0.19 with 80% power at α=.05\alpha = .05. The expected cell proportions were derived from prior survey data."


Example 5: Sensitivity Analysis — Post-Hoc Assessment

A completed study of exam score differences between two teaching conditions found xˉ1=71.2\bar{x}_1 = 71.2, xˉ2=68.4\bar{x}_2 = 68.4, s1=11.8s_1 = 11.8, s2=12.2s_2 = 12.2, n1=n2=35n_1 = n_2 = 35. The result was non-significant (t(68)=1.04t(68) = 1.04, p=.302p = .302).

Observed effect size:

dobs=71.268.4(34×11.82+34×12.22)/68=2.812.0=0.233d_{obs} = \frac{71.2 - 68.4}{\sqrt{(34 \times 11.8^2 + 34 \times 12.2^2)/68}} = \frac{2.8}{12.0} = 0.233

Post-hoc power (observed dd, NOT recommended as standalone):

At n=35n = 35 per group, d=0.233d = 0.233, α=.05\alpha = .05: Power =0.242= 0.242.

This is low — but this is mathematically expected given the non-significant result.

More useful — Sensitivity analysis (power vs. effect size at n=35n = 35 per group):

True ddPower at n=35n = 35 per group
0.20.16
0.30.25
0.40.37
0.50.52
0.60.66
0.80.87

95% CI for observed dd: [0.24,  0.70][-0.24,\; 0.70] (computed via DataStatPro).

Interpretation: The study had sufficient power only for large effects (d0.80d \geq 0.80). The non-significant result is uninformative about effects in the small-to-medium range. The 95% CI for dd is wide ([0.24,0.70][-0.24, 0.70]), spanning from negligible to large. A future study designed to detect d=0.30d = 0.30 with 80% power would require n=176n = 176 per group.

APA write-up: "The sample of n=35n = 35 per group provided 52% power to detect a medium effect of d=0.50d = 0.50 at α=.05\alpha = .05 (two-tailed), indicating the study was substantially underpowered for effects of practical interest. The 95% CI for Cohen's d=[.24,.70]d = [-.24, .70] spans a wide range. A sensitivity power analysis indicated that detecting d=0.30d = 0.30 with 80% power at α=.05\alpha = .05 would require n=176n = 176 per group. The non-significant result should therefore be interpreted with caution rather than as evidence of no effect."


13. Common Mistakes and How to Avoid Them

Mistake 1: Using Post-Hoc Power with the Observed Effect Size

Problem: Computing "observed power" using the effect size estimated from the completed study's data and presenting it as an independent finding. Because observed power is a monotonically increasing function of the p-value, p=αp = \alpha always gives power =0.50= 0.50. The observed power adds no information whatsoever beyond the p-value itself.

Solution: Replace post-hoc power with: (a) A 95% CI for the effect size, and (b) A sensitivity power analysis showing power for a range of plausible true effect sizes. This genuinely informs about what the study could and could not detect.


Mistake 2: Basing Effect Size on a Single Pilot Study

Problem: Running a pilot study (n=20n = 20), observing d=0.65d = 0.65, and using this value directly in a power calculation. Small pilot studies produce highly unstable effect size estimates. The true effect could easily be d=0.20d = 0.20 — leading to a seriously underpowered main study.

Solution: Use pilot studies for feasibility and nuisance parameter estimation (SD, retention rate, ICC) only. Determine the target effect size from the SESOI, published literature, or meta-analyses. If a pilot effect size must be used, apply a conservative discount factor (e.g., multiply by 0.60–0.75).


Mistake 3: Confusing Total NN with Per-Group nn

Problem: A formula yields n=50n = 50 per group, but the researcher enrolls 50 participants total (25 per group), resulting in only 25% of the required power.

Solution: Always explicitly distinguish total NN from per-group nn in both calculations and reports. DataStatPro reports both total NN and the per-group breakdown on all output screens.


Mistake 4: Not Adjusting for Attrition

Problem: Calculating that 120 completers are needed and recruiting exactly 120 participants, then losing 18 to dropout — leaving 102 completers with power substantially below target.

Solution: Always calculate the attrition-adjusted enrollment target: Nenroll=Nanalysis/(1rattrition)N_{enroll} = N_{analysis} / (1 - r_{attrition}). Obtain attrition estimates from the literature or previous studies in the same population. Be conservative (overestimate attrition rates).


Mistake 5: Ignoring the Design Effect in Clustered Studies

Problem: Treating a clustered design (e.g., 20 students per class) as if observations were independent, underestimating the required number of clusters by a factor of DEFF.

Solution: Always specify the expected ICC and average cluster size, and apply the design effect: Ncluster=Nsimple×DEFFN_{cluster} = N_{simple} \times DEFF. Err on the side of overestimating the ICC. Use DataStatPro's clustered design module.


Mistake 6: Using Cohen's Conventions as the Default Effect Size

Problem: Entering d=0.50d = 0.50 ("medium") into a power calculation simply because it is conventional, without any scientific justification. This produces a sample size that may be completely inappropriate for the specific research question — the true effect could be d=0.10d = 0.10 (requiring 6× more participants).

Solution: Always justify the effect size from the SESOI, prior literature, or meta-analysis. Use Cohen's conventions only as an absolute last resort, and document that they were used in the absence of domain-specific information. Never present Cohen's conventions as though they represent the expected effect.


Mistake 7: Performing a Power Analysis for the Wrong Test

Problem: Computing power for a two-sample t-test when the actual analysis will be a mixed ANOVA (within × between), or computing power for a chi-square test when logistic regression will be used. Different tests have different power functions.

Solution: Identify the exact statistical test to be used (including model specification, covariates, and correction methods) before computing power. The power analysis must match the planned analysis.


Mistake 8: Conducting Multiple Tests but Powering Only for One

Problem: Planning 5 outcome variables but computing power only for the most important one, without applying a multiple testing correction. The familywise false-positive rate for 5 independent tests at α=.05\alpha = .05 is .23\approx .23.

Solution: Clearly specify the single primary outcome and power accordingly. For secondary outcomes, apply Bonferroni or Holm-Bonferroni corrections: α=.05/k\alpha' = .05/k where kk is the number of primary hypotheses. Compute power at α\alpha' for all primary outcomes, or justify a less conservative correction.


Mistake 9: Treating Non-Significant Results as Evidence of No Effect

Problem: A study with n=30n = 30 fails to reject H0H_0 (p=.34p = .34) and concludes "the two conditions are equivalent". With n=30n = 30, power for a medium effect is 50%\approx 50\%. The non-significant result is as consistent with a medium true effect as with no effect.

Solution: Distinguish between "no evidence of an effect" and "evidence of no effect". To provide evidence of equivalence, use a TOST equivalence test with a pre-specified equivalence margin, or present the 95% CI for the effect size to show that meaningful effects can be ruled out. Power the study for equivalence, not just superiority, if equivalence is a potential conclusion.


Mistake 10: Reporting Sample Size Without Justification

Problem: Stating only "sample size was N=200N = 200" in a methods section with no reference to power, effect size, or target power. Readers (and reviewers) cannot assess whether the study was adequately powered.

Solution: Always include a complete power analysis justification in the methods section: test used, effect size with source, α\alpha, power target, computed NN, and software. Pre-register the power analysis before data collection.


14. Troubleshooting

ProblemLikely CauseSolution
Required NN is extremely large (e.g., >10,000> 10{,}000)Effect size is very small; α\alpha is very small; power target is very highCheck whether the effect size is realistic; consider whether the study is feasible; explore precision analysis as an alternative
Required NN is smaller than expectedEffect size is large; one-tailed test used; power target is low (e.g., 0.70)Verify inputs; confirm directionality; consider increasing power target
Power does not reach target even with very large NNEffect size effectively zero; test has an inherent power ceilingCheck whether H1H_1 is correctly specified; effect size of zero gives power =α= \alpha regardless of NN
Post-hoc power is very low (e.g., <0.20< 0.20)Study was substantially underpowered; effect is genuinely smallExpected when p>αp > \alpha; replace post-hoc power with CI for effect size and sensitivity analysis
DataStatPro gives different NN than another power calculatorDifferent rounding conventions, approximation formulae, or non-central distribution methodsBoth may be correct; use exact non-central distribution methods (DataStatPro default); difference is typically 0–2 participants
Design effect is very large (>5> 5)Very high ICC or very large cluster sizeConsider increasing number of clusters rather than cluster size; add cluster-level covariates to reduce ICC
Power is not improved by doubling NN (clustered design)ICCis high; adding individuals within clusters is inefficientAdd more clusters, not more individuals per cluster; consult a biostatistician
Power for interaction effect is very lowInteraction effects are inherently smaller and harder to detect than main effectsPlan for 4× the NN needed for the main effect to detect a crossover interaction; report as a limitation
Cohen's hh is unusually largeProportions are both near 0 or near 1; arcsine transformation stretches the scaleVerify π1\pi_1 and π2\pi_2; the arcsine transformation is mathematically correct; large hh reflects high sensitivity in that region
Achieved power slightly below target after roundingNN formula gives a non-integer; rounding up gives target power; rounding down falls just belowAlways round up, never down; add 1–2 participants as a buffer
Equivalence test requires much larger NN than superiority testEquivalence requires showing the effect is within a narrow margin; inherently conservativeUse a realistic equivalence margin; consider whether the margin is defined appropriately
Sample size for ANOVA with many groups is surprisingly largeMany-group ANOVA has reduced power per group for fixed total NN; each group has small nnConcentrate comparisons on the most important pairwise contrasts; consider a planned contrast rather than omnibus ANOVA
Attrition-adjusted NN is unrealistically largeVery high assumed attrition rateRevisit attrition estimates; consider strategies to reduce dropout; report as a study limitation if NN is infeasible
Power analysis for regression gives very different NN from t-testDifferent effect size frameworks (f2f^2 vs. dd); different dfdfConvert between effect sizes using DataStatPro's converter; confirm uu (predictors tested) is specified correctly

15. Quick Reference Cheat Sheet

The Four Elements of Power Analysis

Power(1β)NEffect Sizeα\text{Power} (1-\beta) \quad \longleftrightarrow \quad N \quad \longleftrightarrow \quad \text{Effect Size} \quad \longleftrightarrow \quad \alpha

Specify any three → solve for the fourth.

Core Sample Size Formulae

TestEffect SizePer-Group nn or Total NN
One-sample tddn=(zα/2+z1βd)2n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d}\right)^2
Two-sample t (equal)ddnper=2(zα/2+z1β)2d2n_{per} = \frac{2(z_{\alpha/2} + z_{1-\beta})^2}{d^2}
Paired tdzd_znpairs=(zα/2+z1βdz)2n_{pairs} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d_z}\right)^2
Correlationρ\rho (zρ=arctanhρz_\rho = \text{arctanh}\rho)n=(zα/2+z1βzρ)2+3n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3
Proportion (one)hhn=(zα/2+z1βh)2n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2
Proportion (two)hhnper=(zα/2+z1βh)2n_{per} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2
Chi-squarewwN=λreq/w2N = \lambda_{req} / w^2
Regressionf2f^2N=λreq/f2+p+1N = \lambda_{req} / f^2 + p + 1

Key Z-Score Constants

α\alpha (two-tailed)zα/2z_{\alpha/2}Power (1β1-\beta)z1βz_{1-\beta}
.10.101.6450.700.700.524
.05.051.9600.800.800.842
.025.0252.2410.850.851.036
.01.012.5760.900.901.282
.001.0013.2910.950.951.645
0.990.992.326

Sample Size for Two-Sample t-Test (α=.05\alpha = .05, Two-Tailed)

ddPower = 0.70Power = 0.80Power = 0.90Power = 0.95
0.20 (small)264394526650
0.30118176234290
0.50 (medium)446486106
0.80 (large)18263442
1.0012182428
1.208121620

(Figures are per group; multiply by 2 for total NN.)

Sample Size for Correlation (α=.05\alpha = .05, Two-Tailed)

ρ\rhoPower = 0.80Power = 0.90
.10.107821046
.20.20194259
.30.3084112
.40.404661
.50.502837
.70.701216

Cohen's Effect Size Conventions

TestSmallMediumLarge
t-test (dd)0.200.500.80
ANOVA (ff)0.100.250.40
Correlation (rr)0.100.300.50
Chi-square (ww)0.100.300.50
Regression (f2f^2)0.020.150.35
Proportion (hh)0.200.500.80

Attrition Adjustment

Nenroll=Nanalysis1rattritionN_{enroll} = \left\lceil \frac{N_{analysis}}{1 - r_{attrition}} \right\rceil

Attrition RateInflation Factor
5%× 1.053
10%× 1.111
15%× 1.176
20%× 1.250
25%× 1.333
30%× 1.429

Design Effect for Clustered Studies

DEFF=1+(m1)×ICC,Ncluster=Nsimple×DEFFDEFF = 1 + (m - 1) \times ICC, \qquad N_{cluster} = N_{simple} \times DEFF

ICCCluster size m=10m = 10m=20m = 20m=30m = 30
0.011.091.191.29
0.051.451.952.45
0.101.902.903.90
0.202.804.806.80

Effect Size Conversions

FromToFormula
rrddd=2r/1r2d = 2r/\sqrt{1-r^2}
ddrrr=d/d2+4r = d/\sqrt{d^2+4}
ffη2\eta^2η2=f2/(1+f2)\eta^2 = f^2/(1+f^2)
f2f^2R2R^2R2=f2/(1+f2)R^2 = f^2/(1+f^2)
ORORddd=ln(OR)/1.814d = \ln(OR)/1.814

Four Modes of Power Analysis Decision Guide

GoalAnalysis ModeFixedSolved
Plan sample size before data collectionA prioriα\alpha, 1β1-\beta, ESESNN
Assess power of a completed studyPost-hocα\alpha, NN, ESES1β1-\beta
Find smallest detectable effectSensitivityα\alpha, NN, 1β1-\betaESminES_{min}
Justify a non-standard α\alphaCriterionNN, 1β1-\beta, ESESα\alpha

APA 7th Edition Power Analysis Reporting Templates

A priori (standard): "An a priori power analysis conducted in DataStatPro indicated that [N per group / total N] participants were required to detect [effect size measure] = [value] with [power]% power at a [one/two]-tailed α\alpha = [value] (achieved power = [value]). The effect size was based on [source/justification]."

A priori (with attrition): "[As above]. Assuming [X]% attrition, [inflated N] participants will be recruited."

A priori (clustered design): "[As above]. Assuming an ICC of [value] and an average cluster size of [m], the design effect was [DEFF], yielding a required [N clusters] clusters of [m] participants each (total NN = [value])."

Sensitivity analysis: "With [N] participants per group, the study had [power]% power to detect an effect of [ES measure] = [MDE value] at α\alpha = [value] (two-tailed). Power for a range of effect sizes is provided in [Table/Figure X]."

Non-significant result with sensitivity: "With [N] per group, the study had [power]% power to detect [ES measure] = [value] at α\alpha = [value]. The 95% CI for [effect size] = [[LB], [UB]], indicating that effects as large as [UB value] cannot be ruled out. A future study powered to detect [ES measure] = [target value] with 80% power would require [future N] per group."

Power Analysis Reporting Checklist

ElementRequired
Analysis mode (a priori / post-hoc / sensitivity)✅ Always
Statistical test named exactly✅ Always
Effect size measure and value✅ Always
Effect size source and justification✅ Always
Significance level α\alpha and directionality✅ Always
Power target (1β1 - \beta)✅ Always
Computed NN total and per group✅ Always
Achieved power at computed NN✅ Always
Software and version✅ Always
Attrition rate and enrollment NN✅ When attrition is anticipated
Design effect, ICC, cluster size✅ For clustered designs
Multiple testing correction and adjusted α\alpha'✅ When multiple primary outcomes
MDE in original units✅ Recommended
Sensitivity power table or curve✅ Recommended
Equivalence margin (for TOST)✅ For equivalence studies
Pre-registration reference✅ When pre-registered
Discussion of feasibility✅ When NN is large or constrained
Bayesian / average power✅ When prior uncertainty about effect size is substantial

This tutorial provides a comprehensive foundation for understanding, conducting, interpreting, visualising, and reporting sample size and power analyses within the DataStatPro application. For further reading, consult Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988) for foundational theory and conventions; Lakens, Scheel & Isager's "Equivalence Testing for Psychological Research: A Tutorial" (2018) for TOST methods; Gelman & Carlin's "Beyond Power Calculations" (2014) for design analysis and Type M/S errors; Faul, Erdfelder, Lang & Buchner's "GPower 3" (2007) for computational methods; and Zar's "Biostatistical Analysis" (5th ed., 2010) for biological and health science applications. For feature requests or support, contact the DataStatPro team.*