Sample Size and Power Analysis: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of statistical power and sample size determination all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are planning a new study for the first time or deepening your understanding of how to design adequately powered research, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What are Sample Size and Power Analysis?
The Mathematics Behind Power Analysis
Considerations and Planning Checklist
Power Analysis for Common Statistical Tests
Using the Sample Size and Power Analysis Calculator Component
Step-by-Step Procedure
Interpreting the Output
Visualising Power and Sample Size
Sensitivity Analysis and Robustness Checks
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into sample size and power analysis, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 Hypothesis Testing Framework

All power analyses are grounded in the hypothesis testing framework. A statistical test evaluates the evidence in a dataset against a null hypothesis:

Null hypothesis ( $H_0$ ): The default position — typically, that no effect exists, no difference is present, or variables are unassociated.
Alternative hypothesis ( $H_1$ ): The research hypothesis — that an effect exists, a difference is present, or variables are associated.

The test produces a test statistic (e.g., $t$ , $F$ , $\chi^2$ , $z$ ) and a corresponding p-value. If $p \leq \alpha$ , we reject $H_0$ in favour of $H_1$ .

1.2 The Four Outcomes of a Hypothesis Test

Every hypothesis test results in one of four possible outcomes, two of which are correct decisions and two of which are errors:

	$H_0$ is TRUE	$H_0$ is FALSE
Fail to reject $H_0$	✅ Correct decision (True negative)	❌ Type II error (False negative)
Reject $H_0$	❌ Type I error (False positive)	✅ Correct decision (True positive)

The probabilities associated with each outcome:

Outcome	Symbol	Definition
Type I error rate (false positive rate)	$\alpha$	$P(\text{Reject } H_0 \mid H_0 \text{ is true})$
Type II error rate (false negative rate)	$\beta$	$P(\text{Fail to reject } H_0 \mid H_0 \text{ is false})$
Significance level	$\alpha$	Controlled by the researcher; conventionally $.05$
Statistical power	$1 - \beta$	$P(\text{Reject } H_0 \mid H_0 \text{ is false})$
Specificity	$1 - \alpha$	$P(\text{Fail to reject } H_0 \mid H_0 \text{ is true})$

1.3 The Significance Level ( $\alpha$ )

The significance level $\alpha$ is the maximum acceptable probability of a Type I error — that is, the probability of declaring a significant result when the null hypothesis is actually true. The researcher chooses $\alpha$ before collecting data.

Conventional values:

$\alpha$	Context
$.05$	Standard in most social, behavioural, and health sciences
$.01$	More stringent; clinical trials, policy-relevant decisions
$.001$	Very stringent; genomics, physics, large-scale testing
$.10$	Sometimes used in exploratory research or small pilot studies

1.4 The p-Value

The p-value is the probability of observing a test statistic as extreme as or more extreme than the one obtained, assuming $H_0$ is true:

$p = P(\text{test statistic} \geq t_{obs} \mid H_0)$

A small p-value (below $\alpha$ ) means the observed result is unlikely under $H_0$ and constitutes evidence against $H_0$ . Crucially, the p-value does not tell you the probability that $H_0$ is true, nor does it measure the size or practical importance of an effect.

1.5 Effect Size

An effect size is a standardised, scale-free measure of the magnitude of a phenomenon. It is the single most important input to any power analysis. Common effect size measures by test type:

Test	Effect Size Measure	Symbol	Range
t-test (two groups)	Cohen's $d$	$d$	$(-\infty, +\infty)$
ANOVA (multiple groups)	Cohen's $f$	$f$	$[0, \infty)$
Correlation	Pearson correlation	$r$	$[-1, +1]$
Chi-square test	Cramér's $V$ (or $\phi$ )	$V$ , $\phi$	$[0, 1]$
Regression (multiple)	Cohen's $f^2$	$f^2$	$[0, \infty)$
Proportion test	Cohen's $h$ (arcsine difference)	$h$	$(-\infty, +\infty)$
Repeated measures	Cohen's $d_{rm}$ or $f$	—	—

1.6 Cohen's Conventions for Effect Size

Jacob Cohen (1988) proposed widely used benchmarks for effect size magnitudes across common statistical tests. These are conventions of last resort — domain knowledge always supersedes them:

Test	Small	Medium	Large
t-test ( $d$ )	0.20	0.50	0.80
ANOVA ( $f$ )	0.10	0.25	0.40
Correlation ( $r$ )	0.10	0.30	0.50
Chi-square ( $w$ )	0.10	0.30	0.50
Regression ( $f^2$ )	0.02	0.15	0.35
Proportion test ( $h$ )	0.20	0.50	0.80

1.7 The Normal and Non-Central Distributions

Power analysis relies on understanding how test statistics are distributed under two scenarios:

Under $H_0$ : The test statistic follows a central distribution (e.g., standard normal $\mathcal{N}(0,1)$ , central t, central $\chi^2$ , central F).
Under $H_1$ : The test statistic follows a non-central distribution — the same family but shifted by a non-centrality parameter $\lambda$ , which quantifies the true size of the effect and the sample size:

$\lambda = \delta \times \sqrt{n} \quad \text{(for z/t-tests, simplified form)}$

Power is the probability that a non-centrally distributed test statistic exceeds the critical value derived from the central distribution.

1.8 Directionality: One-Tailed vs. Two-Tailed Tests

The directionality of a test affects the critical region and therefore the power:

Two-tailed test: The critical region is split across both tails of the distribution. Rejects $H_0$ for both very large and very small test statistics. Critical value: $z_{\alpha/2}$ (e.g., $1.96$ for $\alpha = .05$ ).
One-tailed test: The critical region is entirely in one tail. More powerful for detecting effects in the predicted direction but cannot detect effects in the opposite direction. Critical value: $z_{\alpha}$ (e.g., $1.645$ for $\alpha = .05$ ).

For a given effect size and sample size, a one-tailed test has greater power than a two-tailed test. However, one-tailed tests require a strong directional a priori justification and are vulnerable to criticism if the actual effect is in the opposite direction.

⚠️ Most journals and reporting guidelines recommend two-tailed tests unless there is a strong, pre-registered, directional theoretical justification. DataStatPro defaults to two-tailed tests for all procedures.

2. What are Sample Size and Power Analysis?

2.1 The Core Questions

Power analysis and sample size determination address four deeply interconnected questions in research design:

Power analysis (post-hoc): Given my sample size, effect size, and $\alpha$ , what was the probability of detecting a true effect?
Sample size calculation (a priori): Given a desired power, effect size, and $\alpha$ , how many participants do I need?
Minimum detectable effect: Given my sample size and $\alpha$ , what is the smallest effect I have adequate power to detect?
Sensitivity analysis: Given my sample size and $\alpha$ , how does power vary across a range of plausible effect sizes?

These four questions form the four modes of power analysis. A priori sample size calculation — performed before data collection — is the most important and is the focus of most of this tutorial.

2.2 Why Power Analysis Matters

Consequence of Ignoring Power	Effect
Underpowered study	High probability of Type II error; genuine effects missed; wasted resources
Overpowered study	Resources wasted; trivially small, practically meaningless effects declared significant
Post-hoc power gaming	Misleading; observed power with observed effect size is 50% when $p = \alpha$
Non-replicable findings	Underpowered studies produce inflated effect size estimates (the "Winner's Curse")
Ethical implications	Exposing participants to risk or burden without adequate chance of meaningful results
Grant and ethics requirements	Most funding bodies and ethics committees require a priori power justification

2.3 The Four Elements of Power Analysis

Every power analysis involves exactly four quantities, any one of which can be computed from the other three:

Power (1 - β)  ←─────────────────────────────┐
         │                                   │
Sample size (N) ─────────────────────►       │
         ↕                                   │
Effect size (ES) ───────────────────►         │
         │                                   │
Significance level (α) ─────────────►         │
         └───────────────────────────────────┘

Specify any three → solve for the fourth.

2.4 Four Modes of Power Analysis

Mode	Fixed Inputs	Solved Output	When Used
A priori	$\alpha$ , $1-\beta$ , $ES$	$N$	Before data collection (study planning)
Post-hoc	$\alpha$ , $N$ , $ES$	$1-\beta$	After data collection (result interpretation)
Criterion	$N$ , $1-\beta$ , $ES$	$\alpha$	Rare; sometimes used in quality control
Sensitivity	$\alpha$ , $N$ , $1-\beta$	$ES_{min}$	Before or after collection; what can I detect?

⚠️ Post-hoc power analysis computed using the observed effect size is widely regarded as uninformative and should be avoided. When $p > \alpha$ , observed power will typically be low — but this is a mathematical consequence of the non-significant result, not an independent finding. Instead of post-hoc power, report the 95% confidence interval for the effect size and a sensitivity power analysis.

2.5 Desired Power: Choosing $1 - \beta$

The conventional target for statistical power is 0.80 (80%), implying a 20% Type II error rate. Higher power targets are increasingly recommended:

Power ( $1 - \beta$ )	$\beta$	Context
$0.80$	$.20$	Minimum conventional standard (Cohen, 1988)
$0.90$	$.10$	Recommended for clinical trials; replication studies
$0.95$	$.05$	High-stakes research; confirmatory studies
$0.99$	$.01$	Safety-critical or regulatory contexts

The ratio of Type I to Type II error rates is also informative:

At $\alpha = .05$ and power $= .80$ : ratio $= \beta/\alpha = .20/.05 = 4$ (Type II errors are 4× more likely than Type I errors).
At $\alpha = .05$ and power $= .95$ : ratio $= .05/.05 = 1$ (equal error rates).

2.6 Real-World Applications

Field	Context	Typical Power Target
Clinical Trials	RCT comparing drug vs. placebo	$0.90$ – $0.95$
Psychology	Between-subjects experiment	$0.80$ – $0.90$
Education Research	Intervention effectiveness study	$0.80$
Epidemiology	Case-control study; cohort study	$0.80$ – $0.90$
Genomics / GWAS	Association study ( $\alpha = 5 \times 10^{-8}$ )	$0.80$
Marketing Research	A/B test for conversion rate	$0.80$ – $0.90$
Quality Control	Detecting process shift	$0.90$ – $0.95$
Pilot Studies	Feasibility; parameter estimation	$0.60$ – $0.80$

3. The Mathematics Behind Power Analysis

3.1 The General Power Framework

For any test with test statistic $T$ and critical value $t_{crit}$ :

$\text{Power} = 1 - \beta = P(T \geq t_{crit} \mid H_1 \text{ is true})$

Under $H_1$ , $T$ follows a non-central distribution with non-centrality parameter $\lambda$ . Power is the probability that this non-centrally distributed test statistic exceeds the critical value determined under $H_0$ .

The critical value $t_{crit}$ is determined by $\alpha$ and the test type:

Two-tailed: $t_{crit} = z_{1-\alpha/2}$ (e.g., $1.960$ for $\alpha = .05$ )
One-tailed: $t_{crit} = z_{1-\alpha}$ (e.g., $1.645$ for $\alpha = .05$ )

3.2 Power for the One-Sample z-Test

The simplest case: testing whether a population mean $\mu$ equals a known value $\mu_0$ , with known population SD $\sigma$ and sample size $n$ .

Non-centrality parameter:

$\lambda = \frac{(\mu_1 - \mu_0)}{\sigma / \sqrt{n}} = d \times \sqrt{n}$

Where $d = (\mu_1 - \mu_0)/\sigma$ is Cohen's $d$ for the one-sample case.

Power (two-tailed):

$1 - \beta = 1 - \Phi(z_{\alpha/2} - \lambda) + \Phi(-z_{\alpha/2} - \lambda)$

For practical purposes (when $\lambda$ is not near zero):

$1 - \beta \approx \Phi(\lambda - z_{\alpha/2})$

Where $\Phi$ is the standard normal CDF.

Sample size formula (solving for $n$ ):

$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d}\right)^2$

Where $z_{1-\beta}$ is the critical value for the desired power (e.g., $0.842$ for $1-\beta = 0.80$ ; $1.282$ for $1-\beta = 0.90$ ; $1.645$ for $1-\beta = 0.95$ ).

3.3 Power for the Two-Sample Independent t-Test

Testing whether two population means differ ( $H_1: \mu_1 \neq \mu_2$ ), with equal group sizes $n_1 = n_2 = n$ and pooled SD $\sigma_{pooled}$ .

Cohen's $d$ (standardised mean difference):

$d = \frac{\mu_1 - \mu_2}{\sigma_{pooled}}$

Non-centrality parameter:

$\lambda = \frac{d}{\sqrt{1/n_1 + 1/n_2}} = d \times \sqrt{\frac{n}{2}}$

Sample size per group (approximate):

$n_{per\;group} = \frac{2(z_{\alpha/2} + z_{1-\beta})^2}{d^2}$

Unequal group sizes: Let $n_2 = k \times n_1$ (allocation ratio $k$ ). The total sample size is minimised when $k = 1$ (equal groups). For unequal allocation:

$n_1 = \frac{(z_{\alpha/2} + z_{1-\beta})^2(1 + 1/k)}{d^2}, \qquad n_2 = k \times n_1$

⚠️ Unequal group sizes are less efficient than equal groups for a given total $N$ . Unless there is a compelling reason (e.g., one group is more expensive to recruit, or a 2:1 allocation is ethically required), equal groups maximise power per participant.

3.4 Power for the Paired t-Test

Testing whether the mean of paired differences $\mu_D = \mu_1 - \mu_2$ equals zero.

Cohen's $d_z$ (based on the SD of differences):

$d_z = \frac{\mu_D}{\sigma_D}$

Where $\sigma_D = \sqrt{\sigma_1^2 + \sigma_2^2 - 2\rho\sigma_1\sigma_2}$ and $\rho$ is the correlation between paired measurements.

The relationship to Cohen's $d$ for independent groups:

$d_z = \frac{d}{\sqrt{2(1-\rho)}}$

This shows that paired designs are more powerful when $\rho > 0$ — the higher the correlation between paired measurements, the greater the efficiency gain over an independent design.

Sample size (number of pairs):

$n_{pairs} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d_z}\right)^2$

3.5 Power for One-Way ANOVA

Testing whether $k$ group means are equal ( $H_1$ : at least two means differ).

Cohen's $f$ (standardised SD of group means):

$f = \frac{\sigma_m}{\sigma_{within}} = \sqrt{\frac{\sum_{j=1}^k n_j(\mu_j - \mu)^2 / N}{\sigma_{within}^2}}$

Where $\sigma_m$ is the SD of the group means and $\sigma_{within}$ is the common within-group SD.

Relationship to $\eta^2$ (eta-squared, the proportion of variance explained):

$f = \sqrt{\frac{\eta^2}{1 - \eta^2}}$

Non-centrality parameter:

$\lambda = f^2 \times N$

Under $H_1$ , the F-statistic follows a non-central F distribution with numerator df $= k - 1$ , denominator df $= N - k$ , and non-centrality parameter $\lambda$ .

Sample size per group (equal groups):

$n_{per\;group} = \frac{\lambda_{required}}{k \times f^2}$

Where $\lambda_{required}$ is the non-centrality parameter needed to achieve the desired power at the specified $\alpha$ , $df_1 = k-1$ , and $df_2 = k(n-1)$ (solved iteratively as $df_2$ depends on $n$ ).

3.6 Power for Pearson Correlation

Testing whether the population correlation $\rho = 0$ ( $H_1: \rho \neq 0$ ).

Effect size: The population correlation coefficient $\rho$ itself.

Fisher's z-transformation: To stabilise the variance:

$z_r = \frac{1}{2}\ln\!\left(\frac{1+r}{1-r}\right) = \text{arctanh}(r), \qquad SE_{z_r} = \frac{1}{\sqrt{n-3}}$

Non-centrality parameter:

$\lambda = z_\rho \times \sqrt{n - 3}$

Where $z_\rho = \text{arctanh}(\rho)$ .

Sample size:

$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3$

3.7 Power for the Chi-Square Test of Association

Testing whether two categorical variables are independent. See also the chi-square tutorial for additional detail.

Effect size — Cohen's $w$ :

$w = \sqrt{\sum_{i}\sum_{j} \frac{(P_{ij} - P_{i\cdot}P_{\cdot j})^2}{P_{i\cdot}P_{\cdot j}}}$

For $2 \times 2$ tables, $w = \phi$ (phi coefficient); for larger tables, $w$ is related to Cramér's $V$ :

$w = V \times \sqrt{\min(r-1,\; c-1)}$

Non-centrality parameter:

$\lambda = w^2 \times N$

Under $H_1$ , $\chi^2$ follows a non-central chi-square distribution with $df = (r-1)(c-1)$ and non-centrality parameter $\lambda$ .

Sample size:

$N = \frac{\lambda_{required}}{w^2}$

3.8 Power for Proportion Tests

3.8.1 One-Sample Proportion Test

Testing $H_0: \pi = \pi_0$ vs. $H_1: \pi \neq \pi_0$ .

Cohen's $h$ (arcsine difference):

$h = 2\arcsin(\sqrt{\pi_1}) - 2\arcsin(\sqrt{\pi_0})$

Sample size:

$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2$

3.8.2 Two-Sample Proportion Test

Testing $H_0: \pi_1 = \pi_2$ vs. $H_1: \pi_1 \neq \pi_2$ .

$h = 2\arcsin(\sqrt{\pi_1}) - 2\arcsin(\sqrt{\pi_2})$

$n_{per\;group} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2$

3.9 Power for Multiple Regression

Testing whether a set of predictors explains a meaningful proportion of variance in an outcome, or whether a specific predictor contributes above and beyond others.

Cohen's $f^2$ (for the overall model or incremental $R^2$ ):

$f^2 = \frac{R^2}{1 - R^2}$

For testing an increment in $R^2$ when adding $u$ new predictors:

$f^2 = \frac{R^2_{full} - R^2_{reduced}}{1 - R^2_{full}}$

Non-centrality parameter:

$\lambda = f^2 \times N$

Under $H_1$ , the F-statistic for testing $u$ predictors follows a non-central F distribution with $df_1 = u$ and $df_2 = N - p - 1$ (where $p$ is the total number of predictors in the full model).

Sample size:

$N = \frac{\lambda_{required}}{f^2} + p + 1$

3.10 Summary of Key Sample Size Formulae

Test	Effect Size	Sample Size Formula
One-sample z/t	$d$	$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d}\right)^2$
Two-sample t (equal groups)	$d$	$n_{per} = \frac{2(z_{\alpha/2} + z_{1-\beta})^2}{d^2}$
Paired t	$d_z$	$n_{pairs} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d_z}\right)^2$
Pearson correlation	$\rho$ ( $z_\rho$ )	$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3$
One proportion	$h$	$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2$
Two proportions	$h$	$n_{per} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2$
Chi-square	$w$	$N = \frac{\lambda_{req}}{w^2}$
ANOVA	$f$	Iterative (non-central F)
Regression ( $f^2$ )	$f^2$	$N = \frac{\lambda_{req}}{f^2} + p + 1$

3.11 The Power Curve

The power curve plots power ( $1 - \beta$ ) as a function of one of the four analysis inputs while holding the others constant. The most common power curve plots:

Power vs. $N$ : Shows how power increases as sample size grows. For most tests, power rises steeply at first and then plateaus. Used to identify the point of diminishing returns.
Power vs. effect size: Shows how power changes across a range of effect sizes for a fixed $N$ . Used for sensitivity analysis.
Power vs. $\alpha$ : Shows the trade-off between Type I and Type II error rates.

The power curve always satisfies:

$1 - \beta \to \alpha$ as $N \to 0$ or effect size $\to 0$ (cannot do better than chance).
$1 - \beta \to 1$ as $N \to \infty$ or effect size $\to \infty$ .

4. Considerations and Planning Checklist

4.1 Specifying the Effect Size: The Critical Decision

The effect size is the single most consequential input to a power analysis. Poorly specified effect sizes lead to either seriously underpowered or wastefully overpowered studies. Use the following hierarchy of evidence for effect size specification:

Priority	Source	Description
1	Domain-specific minimum effect of interest (SESOI)	The smallest effect that would be practically meaningful. Defined by theory, clinical guidelines, or cost-benefit analysis.
2	Prior studies or meta-analyses	Effect sizes from published research on the same or very similar questions. Apply a discount for publication bias.
3	Pilot study	A small preliminary study; note that pilot effect size estimates are imprecise and should be used with caution.
4	Expert opinion or theoretical prediction	Informed estimates from domain experts or mathematical models.
5	Cohen's conventions	Use as a last resort only. Small/medium/large benchmarks as described in Section 1.6.

⚠️ Do not use the effect size from a pilot study directly. Pilot effect sizes are estimated from small samples and are highly unstable. The pilot effect size will often overestimate the true effect (publication bias in miniature). Use pilot data to confirm feasibility and estimate nuisance parameters (e.g., SD, ICC), not to determine the effect size for the power calculation.

4.2 Choosing the Significance Level ( $\alpha$ )

The choice of $\alpha$ should be deliberate and justified:

Context	Recommended $\alpha$	Rationale
Standard social/behavioural science	$.05$	Convention; acceptable Type I:II error ratio
Clinical trial (efficacy)	$.05$ or $.025$ (one-sided)	Regulatory convention
Safety outcomes	$.01$ or smaller	Consequences of false positives are severe
Exploratory / hypothesis-generating	$.10$	Higher sensitivity acceptable
Multiple primary outcomes	$.05 / k$ (Bonferroni)	Controlling familywise error rate
Genomics / GWAS	$5 \times 10^{-8}$	Multiple testing across millions of SNPs
Equivalence testing	$.05$ (but applied differently)	TOST framework

4.3 Choosing the Power Target ( $1 - \beta$ )

The power target should balance the cost of missing a real effect against the cost of increasing sample size:

Consider Higher Power When	Consider Lower Power When
The cost of a false negative is high (clinical safety)	Resources are very limited
The study is confirmatory and pre-registered	The study is exploratory
The effect is expected to be small	The effect is expected to be large
The study aims to replicate prior findings	A pilot study to assess feasibility
Regulatory approval depends on the result	Multiple outcomes with confirmatory follow-up planned

4.4 Identifying the Primary Outcome and Test

Power analysis is conducted for a single primary hypothesis and its associated primary test. Secondary hypotheses should have their own power analyses if they are to be formally tested.

Before beginning the analysis, clearly specify:

What is the primary outcome variable (and its scale)?
What is the primary comparison or test (e.g., mean difference, correlation)?
What is the statistical test that will be applied?
Is the test one-tailed or two-tailed?
What are the key assumptions (e.g., equal variances, paired vs. independent)?

4.5 Accounting for Anticipated Attrition and Missing Data

In longitudinal studies or clinical trials, participants drop out or produce missing data. The required sample size at enrollment must account for this:

$N_{enroll} = \frac{N_{analysis}}{1 - r_{attrition}}$

Where $r_{attrition}$ is the expected attrition rate (as a proportion).

Example: A study needs $N_{analysis} = 120$ completers and expects 15% attrition:

$N_{enroll} = \frac{120}{1 - 0.15} = \frac{120}{0.85} \approx 142$

For multi-wave longitudinal studies, apply the attrition correction at each wave or use the cumulative attrition rate.

4.6 Accounting for Stratification and Clustering

In studies with complex sampling designs:

Stratified designs: Power analysis proceeds separately within strata, then sample sizes are combined.
Clustered designs (e.g., school classes, clinical sites): The design effect (DEFF) inflates the required sample size to account for within-cluster correlation (intraclass correlation, ICC):

$DEFF = 1 + (m - 1) \times ICC$

$N_{cluster} = N_{simple} \times DEFF$

Where $m$ is the average cluster size and $ICC$ is the intraclass correlation coefficient. For clustered randomised trials (CRTs):

$n_{clusters} = \frac{N_{cluster}}{m}$

ICC	$m = 10$	$m = 20$	$m = 30$
0.01	DEFF = 1.09	1.19	1.29
0.05	DEFF = 1.45	1.95	2.45
0.10	DEFF = 1.90	2.90	3.90
0.20	DEFF = 2.80	4.80	6.80

4.7 Multiple Testing Corrections

When testing $k$ hypotheses simultaneously, control the familywise error rate (FWER) or false discovery rate (FDR):

Bonferroni correction (FWER):

$\alpha' = \frac{\alpha}{k}$

For each test, use $\alpha'$ in the power analysis. This increases the required sample size substantially for large $k$ .

Holm-Bonferroni (less conservative): Apply a sequential correction; compute power for the $j$ -th most significant test using $\alpha' = \alpha / (k - j + 1)$ .

Benjamini-Hochberg (FDR): Controls the expected proportion of false positives among significant results. Less conservative than Bonferroni for large-scale testing.

4.8 Reporting the Power Analysis: Documentation Standards

A complete power analysis report must include:

Element	Description
Analysis type	A priori, post-hoc, sensitivity, or criterion
Statistical test	Exact test and variant used
Effect size and justification	Value, measure used, and source/rationale
Significance level ( $\alpha$ )	Value and directionality (one- or two-tailed)
Desired power ( $1 - \beta$ )	Value and rationale
Computed sample size	Total $N$ and per-group $n$ if applicable
Attrition/non-compliance adjustment	If applicable
Design effect	If clustered or stratified
Multiple testing correction	If multiple primary outcomes
Software and version	e.g., DataStatPro v4.2

5. Power Analysis for Common Statistical Tests

5.1 One-Sample t-Test

Research question: Does the population mean differ from a known or hypothesised value $\mu_0$ ?

Effect size: $d = (\mu_1 - \mu_0) / \sigma$

Key inputs:

$\mu_0$ : Hypothesised value under $H_0$
$\mu_1$ : Expected true mean under $H_1$
$\sigma$ : Population or estimated SD

Power formula: Based on non-central t-distribution with $df = n - 1$ and non-centrality parameter $\lambda = d\sqrt{n}$ .

5.2 Two-Sample Independent t-Test

Research question: Do two independent group means differ?

Effect size: $d = (\mu_1 - \mu_2) / \sigma_{pooled}$

Key inputs:

$\mu_1$ , $\mu_2$ : Expected means for each group
$\sigma_{pooled}$ : Pooled within-group SD
$k$ : Allocation ratio $n_2 / n_1$ (default $k = 1$ , equal groups)

Assumptions for power analysis:

Equal variances (if Welch correction is planned, add $\approx 10\%$ to $N$ )
Normally distributed outcomes within groups

5.3 Paired t-Test

Research question: Does the mean of within-subject or matched-pair differences differ from zero?

Effect size: $d_z = \mu_D / \sigma_D$

Key additional input:

$\rho$ : Expected correlation between paired measurements (used to derive $\sigma_D$ from $\sigma_1$ and $\sigma_2$ )

Efficiency gain over independent t-test:

$\text{Required pairs} = \frac{\text{Required per group (independent)}}{2} \times (1 - \rho)$

A within-subjects design with $\rho = 0.50$ requires approximately half the total participants of an independent-groups design for the same power.

5.4 One-Way ANOVA (Fixed Effects)

Research question: Do $k$ group means differ?

Effect size: $f = \sigma_m / \sigma_{within}$

Key inputs:

$k$ : Number of groups
$\mu_1, \ldots, \mu_k$ : Expected group means (or $\sigma_m$ , the SD of means)
$\sigma_{within}$ : Common within-group SD
$n_{per\;group}$ : Equal or specified group sizes

Important: ANOVA power analysis requires specifying the pattern of means (which groups differ by how much), not just the overall effect size. The same $f$ can arise from very different mean patterns.

5.5 Factorial ANOVA

Research question: Do main effects and/or interactions exist in a factorial design?

Each effect (main effect A, main effect B, interaction A×B) has its own effect size $f$ and its own power analysis. Key additional considerations:

Power for interaction effects is typically much lower than for main effects of the same nominal magnitude.
Interaction effect sizes should be estimated directly (not derived from main effects).
For a $2 \times 2$ factorial design, the interaction effect size for a crossover interaction is often set to half the main effect size as a conservative estimate.

5.6 Repeated Measures ANOVA

Research question: Does a measured variable change across $k$ time points or conditions within subjects?

Effect size: Cohen's $f$ based on within-person variance.

Key additional input:

$\rho$ : Average correlation among repeated measures (intraclass correlation). Higher $\rho$ → greater power advantage of repeated measures over independent groups.

Non-centrality parameter adjustment for repeated measures:

$\lambda_{RM} = \frac{f^2 \times n \times k}{1 - \rho}$

5.7 Pearson Correlation

Research question: Is there a linear relationship between two continuous variables?

Effect size: $\rho$ (population correlation coefficient)

Sample size:

$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3$

Where $z_\rho = \text{arctanh}(\rho) = \frac{1}{2}\ln\frac{1+\rho}{1-\rho}$ .

Note: Power for correlation tests is low for small correlations. Detecting $\rho = 0.30$ with 80% power at $\alpha = .05$ requires $n \approx 84$ .

5.8 Multiple Regression

Research question: Does a set of $p$ predictors explain variance in the outcome? Or does adding $u$ predictors significantly improve prediction?

Effect size: $f^2 = R^2 / (1 - R^2)$ (overall model) or $f^2 = \Delta R^2 / (1 - R^2_{full})$ (incremental)

Key inputs:

$u$ : Number of predictors being tested
$p$ : Total predictors in the full model
$R^2$ or $\Delta R^2$ : Expected variance explained

Important: Power in multiple regression depends on the number of predictors being tested ( $u$ ), not on the total model. Testing a single predictor in a model with many covariates uses $u = 1$ ; testing the full model uses $u = p$ .

5.9 Chi-Square Test of Association

Research question: Are two categorical variables associated?

Effect size: Cohen's $w$ (related to Cramér's $V$ : $w = V\sqrt{\min(r-1, c-1)}$ )

Key inputs:

Table dimensions: $r$ rows, $c$ columns
$df = (r-1)(c-1)$
Expected cell proportions under $H_1$

Important: For $2 \times 2$ tables, $w = \phi$ and the formula simplifies to the two-proportion case. For larger tables, specifying the full expected cell proportion matrix provides the most accurate power estimate.

5.10 Test Type Comparison Table

Test	Effect Size	$df$	Non-Central Dist.	Equal Groups Optimal?
One-sample t	$d$	$n-1$	Non-central t	N/A
Two-sample t	$d$	$n_1+n_2-2$	Non-central t	Yes
Paired t	$d_z$	$n-1$	Non-central t	N/A
One-way ANOVA	$f$	$k-1$ ; $N-k$	Non-central F	Yes
Correlation	$\rho$	$n-2$	Non-central t	N/A
Multiple regression	$f^2$	$u$ ; $N-p-1$	Non-central F	N/A
Chi-square	$w$	$(r-1)(c-1)$	Non-central $\chi^2$	Yes
Proportion (one)	$h$	—	Normal	N/A
Proportion (two)	$h$	—	Normal	Yes

6. Using the Sample Size and Power Analysis Calculator Component

The Sample Size and Power Analysis Calculator in DataStatPro provides a comprehensive tool for conducting, visualising, and reporting power analyses for all common statistical tests.

Step-by-Step Guide

Step 1 — Navigate to the Component

Go to Study Design → Sample Size and Power Analysis.

Step 2 — Select the Analysis Mode

Choose one of the four analysis modes:

A priori: Compute required sample size.
Post-hoc: Compute achieved power.
Sensitivity: Compute minimum detectable effect size.
Criterion: Compute required $\alpha$ (advanced use).

Step 3 — Select the Statistical Test

Choose the test family and specific test from the hierarchical menu:

Mean Tests
- One-Sample t-Test
- Two-Sample Independent t-Test
- Paired t-Test
- One-Way ANOVA
- Factorial ANOVA (Two-Way, Three-Way)
- Repeated Measures ANOVA
Association Tests
- Pearson Correlation
- Spearman Correlation
- Multiple Regression (Linear)
Categorical Tests
- Chi-Square Test of Association
- Chi-Square Goodness-of-Fit
- One-Sample Proportion Test
- Two-Sample Proportion Test
Survival Analysis
- Log-Rank Test
- Cox Proportional Hazards
Advanced
- Equivalence Test (TOST)
- Non-Inferiority Test
- Clustered Design (CRT)
- Generic Non-Central Distribution

Step 4 — Specify Effect Size

Choose your effect size specification method:

Direct entry: Enter the effect size measure directly (e.g., $d = 0.50$ ).
From parameters: Enter the raw parameters (e.g., $\mu_1 = 100$ , $\mu_2 = 95$ , $\sigma = 15$ ) and DataStatPro computes the effect size automatically.
From proportions: Enter $\pi_1$ and $\pi_2$ for proportion tests; DataStatPro computes Cohen's $h$ automatically.
From expected table: Enter the full expected cell proportion matrix for chi-square tests.
Effect size calculator: Use DataStatPro's built-in effect size converter to transform between $d$ , $r$ , $f$ , $\eta^2$ , $\omega^2$ , OR, and RR.

Step 5 — Specify Remaining Parameters

Depending on the analysis mode, enter the known quantities:

Significance level ( $\alpha$ ): Default $.05$ ; specify $.01$ or $.001$ if needed.
Desired power ( $1 - \beta$ ): Default $.80$ ; options $.80$ , $.85$ , $.90$ , $.95$ , $.99$ , or custom.
Directionality: Two-tailed (default) or one-tailed.
Number of groups / predictors: As applicable to the selected test.
Allocation ratio $k = n_2/n_1$ : For two-group tests (default $k = 1$ ).
ICC and cluster size: For clustered designs.
Attrition rate: For enrollment adjustment.

Step 6 — Set Display Options

✅ Primary result: Required $N$ (or power, or MDE) with exact formula.
✅ Enrollment $N$ (attrition-adjusted).
✅ Per-group $n$ breakdown.
✅ Power curve: Power vs. $N$ for current effect size and $\alpha$ .
✅ Sensitivity curve: Power vs. effect size for current $N$ and $\alpha$ .
✅ Power contour plot: Power as a function of both $N$ and effect size.
✅ Non-centrality parameter $\lambda$ and critical value.
✅ Type I error rate ( $\alpha$ ), Type II error rate ( $\beta$ ), power ( $1-\beta$ ).
✅ Summary table: Power at $N \pm 10\%$ , $N \pm 25\%$ , $N \pm 50\%$ .
✅ Design effect and ICC-adjusted $N$ (for clustered designs).
✅ APA 7th edition power analysis paragraph (auto-generated).

Step 7 — Run the Analysis

Click "Compute Sample Size / Power". DataStatPro will:

Convert effect size inputs to the required format (apply transformations if needed).
Solve for the requested output using exact non-central distribution methods.
Apply attrition, ICC, and multiple testing corrections if specified.
Generate power curve, sensitivity curve, and contour plot.
Produce the APA-compliant power analysis reporting paragraph.

7. Step-by-Step Procedure

7.1 Full Manual Procedure for A Priori Sample Size Calculation

Step 1 — Identify the Primary Research Question

State the primary outcome, the comparison of interest, and the direction of the hypothesised effect. Confirm the appropriate statistical test.

Step 2 — Choose the Significance Level and Directionality

State $\alpha$ and justify the choice. Specify whether the test is one-tailed or two-tailed, with justification.

Identify $z_{\alpha/2}$ (two-tailed) or $z_\alpha$ (one-tailed) from the standard normal distribution:

$\alpha$	$z_{\alpha/2}$ (two-tailed)	$z_\alpha$ (one-tailed)
$.10$	1.645	1.282
$.05$	1.960	1.645
$.025$	2.241	1.960
$.01$	2.576	2.326
$.001$	3.291	3.090

Step 3 — Choose the Power Target

State the desired power $1 - \beta$ and justify the choice. Identify $z_{1-\beta}$ :

Power ( $1-\beta$ )	$z_{1-\beta}$
$0.70$	0.524
$0.80$	0.842
$0.85$	1.036
$0.90$	1.282
$0.95$	1.645
$0.99$	2.326

Step 4 — Specify and Justify the Effect Size

State the effect size measure, its numerical value, and the source or rationale. Convert raw parameters to a standardised effect size using the appropriate formula (Section 3).

Step 5 — Apply the Sample Size Formula

Substitute the values of $z_{\alpha/2}$ , $z_{1-\beta}$ , and the effect size into the appropriate formula from Section 3.10. Round up to the nearest whole number.

⚠️ Always round the required $N$ UP, never down. Rounding down results in a study with slightly less power than targeted.

Step 6 — Adjust for Unequal Groups (If Applicable)

For two-group designs with unequal allocation (ratio $k \neq 1$ ):

$n_1 = \left\lceil \frac{(z_{\alpha/2} + z_{1-\beta})^2(1 + 1/k)}{d^2} \right\rceil, \qquad n_2 = \lceil k \times n_1 \rceil$

Verify that the total $N = n_1 + n_2$ achieves the target power with exact non-central distribution methods.

Step 7 — Adjust for Attrition

$N_{enroll} = \left\lceil \frac{N_{analysis}}{1 - r_{attrition}} \right\rceil$

Step 8 — Adjust for Clustering (If Applicable)

$DEFF = 1 + (m - 1) \times ICC, \qquad N_{cluster} = \lceil N_{simple} \times DEFF \rceil$

Number of clusters: $n_{clusters} = \lceil N_{cluster} / m \rceil$

Step 9 — Adjust for Multiple Testing (If Applicable)

Replace $\alpha$ with $\alpha' = \alpha/k$ in the sample size formula (Bonferroni), where $k$ is the number of primary hypotheses.

Step 10 — Verify with Power Curve

Using the computed $N$ , confirm the achieved power with exact non-central distribution calculations. Plot the power curve to show power at values of $N$ above and below the target. Confirm the achieved power is at or above the target.

Step 11 — Conduct Sensitivity Analysis

Report the minimum detectable effect size (MDE) at the computed $N$ :

$d_{MDE} = \frac{z_{\alpha/2} + z_{1-\beta}}{\sqrt{n/2}}$ (two-sample t, per group)

This tells stakeholders the smallest effect the study is designed to detect.

Step 12 — Document and Report

Compile all inputs and outputs into a complete power analysis report (APA format provided in Section 15). Retain all working for audit and reproducibility.

8. Interpreting the Output

8.1 Reading the Required Sample Size

Output Feature	Interpretation
Total $N$	Minimum valid observations needed to complete the analysis
Per-group $n$	Number needed in each arm (for multi-group tests)
Enrollment $N$	Inflate total $N$ by attrition rate; participants to recruit
Exact achieved power	Power at the computed (rounded up) $N$ ; should be $\geq$ target
Power at $N - 1$	Confirms one fewer participant would fall below the target power

8.2 Understanding the Power Curve

Feature of the Power Curve	Meaning
Steep rise at low $N$	Each additional participant greatly increases power in this range
Plateau at high $N$	Diminishing returns; additional participants add little power
Power curve above target line	Current $N$ meets or exceeds the power requirement
Power curve crossing 0.50	This $N$ yields coin-flip odds of detecting the true effect
Power at $N = 0$	Equals $\alpha$ (the false positive rate); cannot do worse than chance

8.3 Sensitivity Output: Minimum Detectable Effect

The minimum detectable effect (MDE) is the smallest effect the study has the specified power to detect:

MDE Interpretation	Action
MDE $<$ SESOI	Study is adequately powered to detect the smallest meaningful effect
MDE $=$ SESOI	Study is precisely powered; just barely detects the minimum meaningful effect
MDE $>$ SESOI	Study is underpowered; cannot reliably detect the minimum meaningful effect

Report the MDE in original units (not just in standardised form) to make it interpretable to domain experts who may not be familiar with Cohen's $d$ .

8.4 The Non-Centrality Parameter ( $\lambda$ )

The non-centrality parameter $\lambda$ summarises the total signal in the study — it captures both the effect size and the sample size:

$\lambda = ES^2 \times N \quad \text{(approximately, for many tests)}$

$\lambda$ Interpretation	Meaning
$\lambda \to 0$	Power $\to \alpha$ ; study cannot distinguish $H_1$ from $H_0$
$\lambda = \lambda_{crit}$	Power is exactly at the target level
$\lambda$ large	High power; test statistic distribution under $H_1$ well separated from critical value

8.5 Interpreting Achieved Post-Hoc Power

Post-hoc power (computed after data collection using the observed effect size) has a deterministic relationship with the p-value:

Post-Hoc Power Relationship	Meaning
$p = \alpha$ exactly	Post-hoc power $= 0.50$ always (mathematical identity)
$p < \alpha$	Post-hoc power $> 0.50$
$p > \alpha$	Post-hoc power $< 0.50$

Because of this mathematical relationship, post-hoc power using the observed effect size adds no information beyond the p-value. Instead, report:

The 95% CI for the effect size.
A sensitivity analysis: "What is the power for a range of plausible true effect sizes?"

8.6 Interpreting the Contour Plot

The power contour plot displays power as a function of both $N$ and effect size simultaneously, with contour lines at specific power levels (e.g., 0.60, 0.70, 0.80, 0.90, 0.95):

Region of the Contour Plot	Interpretation
Above the 0.80 contour	Combinations of $N$ and effect size where power $\geq 0.80$
Below the 0.80 contour	Underpowered for those $N$ and effect size combinations
Current study position	Marked on the plot; shows where the study falls relative to power targets
Steep contours	Power changes rapidly with $N$ in this region (steep learning curve)
Flat contours	Diminishing returns; large increases in $N$ needed for modest power gains

9. Visualising Power and Sample Size

9.1 Power Curve (Power vs. Sample Size)

The power curve is the primary visualisation for a priori power analysis. It plots the statistical power ( $y$ -axis) as a function of sample size ( $x$ -axis) for fixed $\alpha$ and effect size.

Key annotations on the DataStatPro power curve:

A horizontal dashed line at the target power level (e.g., 0.80).
A vertical dashed line at the required $N$ .
The intersection point highlighted and labelled with exact $N$ and power.
Shaded region below the target power: underpowered zone.
Optional: Multiple curves for different effect sizes or $\alpha$ levels.

Best practices:

Set the x-axis range to display at least $[1, 3 \times N_{required}]$ to show the full shape of the curve.
Annotate the MDE at the required $N$ .
Use a logarithmic x-axis when the required $N$ is very large to avoid compression of the curve at small $N$ .

9.2 Sensitivity Curve (Power vs. Effect Size)

The sensitivity curve plots power ( $y$ -axis) as a function of effect size ( $x$ -axis) for a fixed $N$ and $\alpha$ .

Use cases:

Assessing robustness: "What happens to power if the true effect is smaller than anticipated?"
Reporting minimum detectable effect: The effect size at which the curve crosses the target power line.
Communicating uncertainty about the effect size to stakeholders.

Best practices:

Mark Cohen's small/medium/large benchmarks as vertical reference lines.
Annotate the MDE (the effect size where the curve intersects the target power line).
Shade the "adequately powered" region (to the right of the MDE).

9.3 Power Contour Plot (Power as a Function of $N$ and Effect Size)

The contour plot provides the most comprehensive two-dimensional view of how power depends on both sample size and effect size. Contour lines connect combinations of $(N, ES)$ that yield equal power.

Reading the contour plot:

The study's planned $(N, ES)$ combination is marked.
Power target contour (e.g., 0.80) divides the plot into adequate and inadequate power regions.
Researchers can identify the trade-off: increasing the effect size estimate by a given amount allows reducing $N$ by a corresponding amount while maintaining power.

9.4 Error Rate Trade-Off Plot

The error rate trade-off plot visualises the relationship between $\alpha$ (Type I error rate) and $\beta$ (Type II error rate = $1 -$ power) for a fixed $N$ and effect size:

As $\alpha$ decreases (stricter threshold), $\beta$ increases (lower power).
The optimal trade-off depends on the relative costs of Type I and Type II errors in the specific application.

Useful for:

Choosing between $\alpha = .05$ and $\alpha = .01$ given limited sample size.
Demonstrating to reviewers the implications of changing the significance threshold.

9.5 G*Power-Style Distribution Plot

DataStatPro generates the classic two-distribution diagram showing:

The central distribution of the test statistic under $H_0$ (blue curve).
The non-central distribution of the test statistic under $H_1$ (orange curve).
The critical value ( $t_{crit}$ ) as a vertical dashed line.
The $\alpha$ region (critical region under $H_0$ , right tail of blue curve).
The power ( $1 - \beta$ ) region (area under the orange curve beyond $t_{crit}$ ).
The $\beta$ region (area under the orange curve to the left of $t_{crit}$ ).

This plot is highly effective for teaching the concept of power and for communicating results to non-statistician audiences.

9.6 Sample Size Comparison Table Plot

For multiple scenarios (e.g., small/medium/large effect; power = 0.80/0.90/0.95), DataStatPro generates a bubble chart or heatmap where:

Rows represent power targets.
Columns represent effect sizes.
Cell values (or bubble sizes) represent the required $N$ .

This provides a rapid overview of how sample size requirements vary across the range of plausible inputs, supporting scenario planning.

9.7 Attrition-Adjusted Recruitment Funnel

For longitudinal studies or clinical trials, DataStatPro generates a funnel diagram showing:

Enrollment target (accounting for attrition).
Expected completers at each wave.
Final analytic sample.
Required sample vs. expected completers — highlighting any shortfall.

10. Sensitivity Analysis and Robustness Checks

10.1 What Is a Sensitivity Analysis in Power Planning?

A sensitivity analysis for power examines how the required sample size (or achieved power) changes as input parameters vary within plausible ranges. It answers: "How robust is my power calculation to uncertainty in the assumed effect size, standard deviation, or other inputs?"

10.2 Varying the Effect Size

The most important sensitivity analysis varies the effect size across a range defined by:

The SESOI (lower bound — the smallest effect that matters).
The expected effect from prior literature (central estimate).
A larger, optimistic effect (upper bound).

Report $N_{required}$ for each scenario:

Scenario	Effect Size	$N_{required}$ (power = 0.80)	$N_{required}$ (power = 0.90)
Pessimistic (SESOI)	Small	Largest	Largest
Most likely	Medium	Target	Target
Optimistic	Large	Smallest	Smallest

Decision rule: Plan for the scenario that produces the largest $N$ to ensure adequate power across all plausible effect sizes.

10.3 Varying the Standard Deviation

For mean-based tests, the effect size $d = \Delta\mu / \sigma$ depends on $\sigma$ . If $\sigma$ is estimated from a pilot study or literature with uncertainty, sensitivity analysis should vary $\sigma$ across a plausible range (e.g., $\sigma \pm 20\%$ ):

$d_{lower} = \frac{\Delta\mu}{\sigma_{upper}}, \qquad d_{upper} = \frac{\Delta\mu}{\sigma_{lower}}$

Report $N_{required}$ for $d_{lower}$ and $d_{upper}$ as the worst and best cases.

10.4 The "What If" Power Table

A comprehensive "What If" power table reports power for a grid of $N$ values and effect sizes, enabling researchers and reviewers to assess robustness:

$N$ per group	$d = 0.20$	$d = 0.30$	$d = 0.40$	$d = 0.50$	$d = 0.80$
20	.10	.18	.29	.41	.69
30	.13	.23	.38	.54	.83
50	.17	.32	.52	.70	.94
80	.23	.45	.68	.85	.99
100	.26	.52	.75	.90	.99
150	.33	.64	.86	.96	$>.99$
200	.39	.73	.92	.99	$>.99$

(Two-sample independent t-test, $\alpha = .05$ , two-tailed)

10.5 Bayesian Power Analysis

Classical power analysis assumes a fixed, known effect size. Bayesian power analysis incorporates uncertainty about the effect size by averaging power over a prior distribution of effect sizes:

$\overline{Power} = \int_0^\infty (1 - \beta(d)) \times p(d) \; dd$

Where $p(d)$ is the prior distribution on the effect size $d$ (e.g., a half-normal or truncated normal distribution).

Average power is always lower than the power at the expected effect size. If the prior is wide (high uncertainty), average power can be substantially lower than the nominal target. DataStatPro supports average power calculations under normal, half-normal, and uniform prior distributions.

10.6 Sequential and Adaptive Designs

Traditional power analysis assumes a fixed sample size collected before any analysis. Sequential designs allow interim analyses with pre-specified stopping rules, which can reduce the expected sample size while controlling error rates.

Key concepts:

Concept	Description
Group sequential design	Planned interim analyses with O'Brien-Fleming or Pocock stopping boundaries
Alpha spending	Controls FWER across all interim and final analyses
Expected sample size	Average $N$ under $H_0$ and $H_1$ ; may be less than fixed design
Inflation factor	Required $N$ is larger than fixed design to preserve power after early stopping

DataStatPro supports group sequential design power analysis with O'Brien-Fleming, Pocock, and Kim-DeMets (power family) alpha spending functions.

10.7 Equivalence and Non-Inferiority Tests

Standard power analysis targets superiority — detecting that an effect is non-zero. Equivalence tests (TOST) and non-inferiority tests have different frameworks:

Equivalence (TOST — Two One-Sided Tests):

$H_0$ : $|\mu_1 - \mu_2| \geq \Delta_E$ (effect is outside equivalence bounds) $H_1$ : $|\mu_1 - \mu_2| < \Delta_E$ (effect is within equivalence bounds)

Sample size for TOST (per group):

$n = \frac{2(z_\alpha + z_{1-\beta})^2 \sigma^2}{(\Delta_E - |\delta|)^2}$

Where $\Delta_E$ is the equivalence margin and $\delta$ is the assumed true difference.

Non-inferiority:

$H_0$ : $\mu_1 - \mu_2 \leq -\Delta_{NI}$ (treatment is inferior by more than the margin) $H_1$ : $\mu_1 - \mu_2 > -\Delta_{NI}$ (treatment is not inferior)

Sample size (per group):

$n = \frac{(z_\alpha + z_{1-\beta})^2 \sigma^2}{(\delta + \Delta_{NI})^2}$

Where $\Delta_{NI}$ is the non-inferiority margin and $\delta$ is the expected true difference ( $\delta = 0$ for a conservative assumption).

11. Advanced Topics

11.1 Effect Size Conversion

It is often necessary to convert between effect size measures. DataStatPro's built-in converter handles all common transformations:

From	To	Formula
$r$ (correlation)	$d$	$d = \frac{2r}{\sqrt{1-r^2}}$
$d$	$r$	$r = \frac{d}{\sqrt{d^2 + 4}}$
$d$	$\eta^2$	$\eta^2 = \frac{d^2}{d^2 + 4}$
$f$	$\eta^2$	$\eta^2 = \frac{f^2}{1 + f^2}$
$f^2$	$R^2$	$R^2 = \frac{f^2}{1 + f^2}$
$OR$	$d$	$d = \frac{\ln(OR)}{\pi/\sqrt{3}} \approx \frac{\ln(OR)}{1.814}$
$\phi$	$d$	$d = \frac{2\phi}{\sqrt{1-\phi^2}}$
$h$	$\phi$	$\phi = \sin(h/2)$ (approximately, for small $h$ )

11.2 The Winner's Curse and Effect Size Inflation

Studies with low power that happen to produce a significant result tend to produce inflated effect size estimates. This phenomenon — the "Winner's Curse" — occurs because a small- $n$ study can only reach significance when the observed effect happens to be larger than the true effect by chance.

Consequences:

Effect sizes from small, significant studies overestimate the true population effect.
Replication studies using these inflated effect sizes are often underpowered.
The "replication crisis" in psychology and other sciences is partly driven by this phenomenon.

Mitigation:

Base power calculations on conservative (smaller) effect size estimates.
Use effect sizes from meta-analyses rather than individual significant studies.
Apply a shrinkage factor (e.g., $d_{planned} = 0.75 \times d_{published}$ ) as a conservative hedge.

11.3 Power Analysis for Multilevel Models

For multilevel (hierarchical) models with data nested within clusters (students within schools, patients within clinics):

The effective sample size for a cluster-randomised trial depends on both the number of clusters $J$ and the cluster size $m$ :

$N_{eff} = \frac{J \times m}{1 + (m-1) \times ICC}$

Power depends primarily on the number of clusters (not the number of individuals per cluster) when the ICC is high. Doubling the number of individuals per cluster has diminishing returns once $m > 1/ICC$ .

Optimal allocation: Add more clusters (not more individuals per cluster) when the ICC is high or when between-cluster variance is the limiting factor.

11.4 Power for Survival Analysis (Log-Rank Test)

For survival outcomes (time to event), the log-rank test's power depends on the number of events (not the sample size):

Required number of events (for two-group comparison):

$E = \frac{4(z_{\alpha/2} + z_{1-\beta})^2}{(\ln HR)^2}$

Where $HR$ is the hypothesised hazard ratio under $H_1$ .

Required total $N$ (accounting for censoring rate $c$ ):

$N = \frac{E}{1 - c}$

The key insight is that studies with high censoring rates need larger $N$ to accumulate enough events — extending the follow-up period is often more efficient than increasing $N$ .

11.5 Precision Analysis: Planning for Confidence Interval Width

An alternative to power analysis is precision analysis — planning $N$ to achieve a desired confidence interval width, rather than a desired power level. This is consistent with an estimation-focused approach and does not require specifying the effect size under $H_1$ .

Required $N$ for a 95% CI of width $\pm \delta$ for the mean:

$n = \left(\frac{1.96 \times \sigma}{\delta}\right)^2$

Required $N$ for a 95% CI of width $\pm \delta$ for a proportion:

$n = \frac{1.96^2 \times \hat{p}(1-\hat{p})}{\delta^2}$

Using $\hat{p} = 0.50$ gives the most conservative (largest) $n$ .

11.6 Prospective Power Analysis for Replication Studies

When planning a replication study of a previously published finding:

Extract the original study's effect size and its SE (or CI).
Apply the shrinkage factor: $d_{rep} = 0.75 \times d_{original}$ (conservative hedge).
Compute required $N$ for $d_{rep}$ at power $= 0.90$ (higher than 0.80 to account for uncertainty).
Report both the nominal power (if $d_{rep}$ is correct) and the power at $d = 0.50 \times d_{original}$ (robustness check).

11.7 Negative Findings and Equivalence: Planning for Both

A study designed to test for superiority may fail to reject $H_0$ but not demonstrate equivalence. Planning for both outcomes requires pre-specifying:

Equivalence margin $\Delta_E$ : The largest effect that would be practically negligible.
A TOST equivalence test as a secondary analysis alongside the primary superiority test.
Sufficient power for both: The sample size is the maximum of $N_{superiority}$ and $N_{equivalence}$ .

11.8 Reporting Power in Pre-Registration

Pre-registration of power analyses on platforms such as the Open Science Framework (OSF), ClinicalTrials.gov, or AsPredicted.org requires:

Element	Required Detail
Research question and primary hypothesis	Specific and testable
Primary outcome and statistical test	Named explicitly
Effect size and justification	Value, measure, and source
$\alpha$ , power target, directionality	All three specified
Computed $N$	Total and per group
Attrition and design adjustments	If applicable
Software used	Name and version
Deviation policy	What will happen if $N$ cannot be reached

Pre-registration creates a public record of the planned analysis and protects against post-hoc power manipulation and researcher degrees of freedom.

12. Worked Examples

Example 1: A Priori — Two-Sample Independent t-Test

A clinical researcher plans to compare the effectiveness of a new cognitive training programme (Group A) vs. standard care (Group B) on memory scores. Based on a published meta-analysis, the expected Cohen's $d = 0.45$ . The researcher wants $\alpha = .05$ (two-tailed) and power $= 0.80$ .

Step 1 — Effect size: $d = 0.45$ (from meta-analysis).

Step 2 — Look up constants:

$z_{\alpha/2} = z_{0.025} = 1.960$
$z_{1-\beta} = z_{0.80} = 0.842$

Step 3 — Apply formula:

$n_{per\;group} = \frac{2(1.960 + 0.842)^2}{0.45^2} = \frac{2 \times 7.852}{0.2025} = \frac{15.704}{0.2025} = 77.55$

Round up: $n_{per\;group} = 78$ , $N_{total} = 156$ .

Step 4 — Verify with exact non-central t:

At $n = 78$ per group: $\lambda = 0.45 \times \sqrt{78/2} = 0.45 \times 6.245 = 2.810$

$1 - \beta = P(t_{154} > t_{crit} \mid \lambda = 2.810)$

Using the non-central t-distribution: $1 - \beta = 0.802$ ✅ (meets the 0.80 target).

Step 5 — Attrition adjustment (expecting 12% dropout):

$N_{enroll} = \lceil 156 / (1 - 0.12) \rceil = \lceil 156 / 0.88 \rceil = \lceil 177.3 \rceil = 178$

Step 6 — MDE at $n = 78$ per group:

$d_{MDE} = \frac{1.960 + 0.842}{\sqrt{78/2}} = \frac{2.802}{6.245} = 0.449$

The study is designed to detect effects of $d \geq 0.45$ with 80% power.

Summary:

Parameter	Value
Test	Two-sample independent t-test (two-tailed)
Effect size	$d = 0.45$ (meta-analysis)
$\alpha$	$.05$
Power target	$0.80$
$N$ per group (analysis)	78
$N$ total (analysis)	156
Achieved power	$0.802$
$N$ total (enrollment; 12% attrition)	178
MDE	$d = 0.449$

APA write-up: "An a priori power analysis conducted in DataStatPro indicated that 78 participants per group (total $N = 156$ ) were required to detect an effect of $d = 0.45$ with 80% power at a two-tailed $\alpha = .05$ (achieved power = 0.80). The effect size was based on a published meta-analysis. Assuming 12% attrition, 89 participants per group (total $N = 178$ ) will be recruited."

Example 2: A Priori — One-Way ANOVA (Three Groups)

An education researcher compares three teaching methods on test performance. Literature suggests group means of 65, 70, and 68 with a common within-group SD of 12. $\alpha = .05$ , power target $= 0.80$ .

Step 1 — Compute Cohen's $f$ :

Grand mean: $\mu = (65 + 70 + 68)/3 = 67.67$

$\sigma_m = \sqrt{\frac{(65-67.67)^2 + (70-67.67)^2 + (68-67.67)^2}{3}} = \sqrt{\frac{7.11 + 5.44 + 0.11}{3}} = \sqrt{4.22} = 2.054$

$f = \frac{\sigma_m}{\sigma_{within}} = \frac{2.054}{12} = 0.171$

This is between Cohen's small ( $f = 0.10$ ) and medium ( $f = 0.25$ ) benchmarks.

Step 2 — Compute $\eta^2$ equivalent:

$\eta^2 = \frac{f^2}{1 + f^2} = \frac{0.0293}{1.0293} = 0.0284$

Step 3 — Required $N$ (iterative, via DataStatPro):

Using non-central F with $df_1 = 2$ , $df_2 = N - 3$ :

DataStatPro iterates: at $n = 53$ per group ( $N = 159$ ): power $= 0.804$ ✅

Step 4 — Attrition adjustment (8%):

$N_{enroll} = \lceil 159 / 0.92 \rceil = 173$

Summary:

Parameter	Value
Test	One-way ANOVA ( $k = 3$ ), two-tailed
Effect size	$f = 0.171$ ; $\eta^2 = .028$
$\alpha$	$.05$
Power target	$0.80$
$n$ per group (analysis)	53
$N$ total (analysis)	159
Achieved power	$0.804$
$N$ total (enrollment; 8% attrition)	173

APA write-up: "A priori power analysis for a one-way ANOVA with three groups indicated that 53 participants per group (total $N = 159$ ) were required to detect $f = 0.17$ ( $\eta^2 = .03$ ) with 80% power at $\alpha = .05$ (achieved power = 0.80). The expected group means ( $M = 65$ , $70$ , $68$ ; pooled $SD = 12$ ) were derived from the literature. With anticipated 8% attrition, 58 participants per group (total $N = 173$ ) will be recruited."

Example 3: A Priori — Pearson Correlation

A developmental psychologist hypothesises a moderate correlation ( $\rho = 0.35$ ) between parental involvement (hours/week) and child academic achievement. $\alpha = .05$ (two-tailed), power $= 0.90$ .

Step 1 — Fisher z-transformation:

$z_\rho = \text{arctanh}(0.35) = \frac{1}{2}\ln\!\left(\frac{1.35}{0.65}\right) = 0.3654$

Step 2 — Look up constants:

$z_{\alpha/2} = 1.960$
$z_{1-\beta} = z_{0.90} = 1.282$

Step 3 — Apply formula:

$n = \left(\frac{1.960 + 1.282}{0.3654}\right)^2 + 3 = \left(\frac{3.242}{0.3654}\right)^2 + 3 = (8.872)^2 + 3 = 78.71 + 3 = 81.71$

Round up: $n = 82$ .

Step 4 — MDE (minimum detectable correlation at $n = 82$ , power $= 0.90$ ):

$z_{\rho_{MDE}} = \frac{z_{\alpha/2} + z_{1-\beta}}{\sqrt{n-3}} = \frac{1.960 + 1.282}{\sqrt{79}} = \frac{3.242}{8.888} = 0.3648$

$\rho_{MDE} = \tanh(0.3648) = 0.349$

Summary:

Parameter	Value
Test	Pearson correlation (two-tailed)
Effect size	$\rho = 0.35$ (literature)
$\alpha$	$.05$
Power target	$0.90$
Required $n$	82
Achieved power	$0.900$
MDE	$\rho = 0.349$

APA write-up: "Based on an expected correlation of $\rho = .35$ , a priori power analysis indicated that $n = 82$ participants were required to achieve 90% power at $\alpha = .05$ (two-tailed). Calculations were conducted using DataStatPro."

Example 4: A Priori — Chi-Square Test of Association (2 × 3 Table)

A sociologist examines the association between age group (18–34, 35–54, 55+) and preferred news source (online, print, broadcast). Based on the literature, the expected cell proportions are:

	Online	Print	Broadcast
18–34	.18	.04	.11
35–54	.10	.09	.14
55+	.06	.12	.16

$\alpha = .05$ , power $= 0.80$ .

Step 1 — Compute marginal proportions:

Row marginals: $P_{18-34} = .33$ , $P_{35-54} = .33$ , $P_{55+} = .34$

Column marginals: $P_{online} = .34$ , $P_{print} = .25$ , $P_{broadcast} = .41$

Step 2 — Compute Cohen's $w$ :

$w = \sqrt{\sum_{i}\sum_{j} \frac{(P_{ij} - P_{i\cdot}P_{\cdot j})^2}{P_{i\cdot}P_{\cdot j}}}$

DataStatPro computes: $w = 0.187$ (using the full cell proportion matrix).

Step 3 — Degrees of freedom:

$df = (3-1)(3-1) = 4$

Step 4 — Required $N$ (non-central chi-square, DataStatPro):

$N_{required} = \frac{\lambda_{required}}{w^2}$

At $df = 4$ , $\alpha = .05$ , power $= 0.80$ : $\lambda_{required} = 10.90$

$N = \frac{10.90}{0.187^2} = \frac{10.90}{0.0350} = 311.4$

Round up: $N = 312$ .

Summary:

Parameter	Value
Test	Chi-square test of association ( $3 \times 3$ table)
Effect size	$w = 0.187$ (from expected cell proportions)
$\alpha$	$.05$
Power target	$0.80$
Required $N$	312
Achieved power	$0.801$

APA write-up: "An a priori power analysis for a $3 \times 3$ chi-square test of association indicated that $N = 312$ participants were required to detect $w = 0.19$ with 80% power at $\alpha = .05$ . The expected cell proportions were derived from prior survey data."

Example 5: Sensitivity Analysis — Post-Hoc Assessment

A completed study of exam score differences between two teaching conditions found $\bar{x}_1 = 71.2$ , $\bar{x}_2 = 68.4$ , $s_1 = 11.8$ , $s_2 = 12.2$ , $n_1 = n_2 = 35$ . The result was non-significant ( $t(68) = 1.04$ , $p = .302$ ).

Observed effect size:

$d_{obs} = \frac{71.2 - 68.4}{\sqrt{(34 \times 11.8^2 + 34 \times 12.2^2)/68}} = \frac{2.8}{12.0} = 0.233$

Post-hoc power (observed $d$ , NOT recommended as standalone):

At $n = 35$ per group, $d = 0.233$ , $\alpha = .05$ : Power $= 0.242$ .

This is low — but this is mathematically expected given the non-significant result.

More useful — Sensitivity analysis (power vs. effect size at $n = 35$ per group):

True $d$	Power at $n = 35$ per group
0.20	.16
0.30	.25
0.40	.37
0.50	.52
0.60	.66
0.80	.87

95% CI for observed $d$ : $[-0.24,\; 0.70]$ (computed via DataStatPro).

Interpretation: The study had sufficient power only for large effects ( $d \geq 0.80$ ). The non-significant result is uninformative about effects in the small-to-medium range. The 95% CI for $d$ is wide ( $[-0.24, 0.70]$ ), spanning from negligible to large. A future study designed to detect $d = 0.30$ with 80% power would require $n = 176$ per group.

APA write-up: "The sample of $n = 35$ per group provided 52% power to detect a medium effect of $d = 0.50$ at $\alpha = .05$ (two-tailed), indicating the study was substantially underpowered for effects of practical interest. The 95% CI for Cohen's $d = [-.24, .70]$ spans a wide range. A sensitivity power analysis indicated that detecting $d = 0.30$ with 80% power at $\alpha = .05$ would require $n = 176$ per group. The non-significant result should therefore be interpreted with caution rather than as evidence of no effect."

13. Common Mistakes and How to Avoid Them

Mistake 1: Using Post-Hoc Power with the Observed Effect Size

Problem: Computing "observed power" using the effect size estimated from the completed study's data and presenting it as an independent finding. Because observed power is a monotonically increasing function of the p-value, $p = \alpha$ always gives power $= 0.50$ . The observed power adds no information whatsoever beyond the p-value itself.

Solution: Replace post-hoc power with: (a) A 95% CI for the effect size, and (b) A sensitivity power analysis showing power for a range of plausible true effect sizes. This genuinely informs about what the study could and could not detect.

Mistake 2: Basing Effect Size on a Single Pilot Study

Problem: Running a pilot study ( $n = 20$ ), observing $d = 0.65$ , and using this value directly in a power calculation. Small pilot studies produce highly unstable effect size estimates. The true effect could easily be $d = 0.20$ — leading to a seriously underpowered main study.

Solution: Use pilot studies for feasibility and nuisance parameter estimation (SD, retention rate, ICC) only. Determine the target effect size from the SESOI, published literature, or meta-analyses. If a pilot effect size must be used, apply a conservative discount factor (e.g., multiply by 0.60–0.75).

Mistake 3: Confusing Total $N$ with Per-Group $n$

Problem: A formula yields $n = 50$ per group, but the researcher enrolls 50 participants total (25 per group), resulting in only 25% of the required power.

Solution: Always explicitly distinguish total $N$ from per-group $n$ in both calculations and reports. DataStatPro reports both total $N$ and the per-group breakdown on all output screens.

Mistake 4: Not Adjusting for Attrition

Problem: Calculating that 120 completers are needed and recruiting exactly 120 participants, then losing 18 to dropout — leaving 102 completers with power substantially below target.

Solution: Always calculate the attrition-adjusted enrollment target: $N_{enroll} = N_{analysis} / (1 - r_{attrition})$ . Obtain attrition estimates from the literature or previous studies in the same population. Be conservative (overestimate attrition rates).

Mistake 5: Ignoring the Design Effect in Clustered Studies

Problem: Treating a clustered design (e.g., 20 students per class) as if observations were independent, underestimating the required number of clusters by a factor of DEFF.

Solution: Always specify the expected ICC and average cluster size, and apply the design effect: $N_{cluster} = N_{simple} \times DEFF$ . Err on the side of overestimating the ICC. Use DataStatPro's clustered design module.

Mistake 6: Using Cohen's Conventions as the Default Effect Size

Problem: Entering $d = 0.50$ ("medium") into a power calculation simply because it is conventional, without any scientific justification. This produces a sample size that may be completely inappropriate for the specific research question — the true effect could be $d = 0.10$ (requiring 6× more participants).

Solution: Always justify the effect size from the SESOI, prior literature, or meta-analysis. Use Cohen's conventions only as an absolute last resort, and document that they were used in the absence of domain-specific information. Never present Cohen's conventions as though they represent the expected effect.

Mistake 7: Performing a Power Analysis for the Wrong Test

Problem: Computing power for a two-sample t-test when the actual analysis will be a mixed ANOVA (within × between), or computing power for a chi-square test when logistic regression will be used. Different tests have different power functions.

Solution: Identify the exact statistical test to be used (including model specification, covariates, and correction methods) before computing power. The power analysis must match the planned analysis.

Mistake 8: Conducting Multiple Tests but Powering Only for One

Problem: Planning 5 outcome variables but computing power only for the most important one, without applying a multiple testing correction. The familywise false-positive rate for 5 independent tests at $\alpha = .05$ is $\approx .23$ .

Solution: Clearly specify the single primary outcome and power accordingly. For secondary outcomes, apply Bonferroni or Holm-Bonferroni corrections: $\alpha' = .05/k$ where $k$ is the number of primary hypotheses. Compute power at $\alpha'$ for all primary outcomes, or justify a less conservative correction.

Mistake 9: Treating Non-Significant Results as Evidence of No Effect

Problem: A study with $n = 30$ fails to reject $H_0$ ( $p = .34$ ) and concludes "the two conditions are equivalent". With $n = 30$ , power for a medium effect is $\approx 50\%$ . The non-significant result is as consistent with a medium true effect as with no effect.

Solution: Distinguish between "no evidence of an effect" and "evidence of no effect". To provide evidence of equivalence, use a TOST equivalence test with a pre-specified equivalence margin, or present the 95% CI for the effect size to show that meaningful effects can be ruled out. Power the study for equivalence, not just superiority, if equivalence is a potential conclusion.

Mistake 10: Reporting Sample Size Without Justification

Problem: Stating only "sample size was $N = 200$ " in a methods section with no reference to power, effect size, or target power. Readers (and reviewers) cannot assess whether the study was adequately powered.

Solution: Always include a complete power analysis justification in the methods section: test used, effect size with source, $\alpha$ , power target, computed $N$ , and software. Pre-register the power analysis before data collection.

14. Troubleshooting

Problem	Likely Cause	Solution
Required $N$ is extremely large (e.g., $> 10{,}000$ )	Effect size is very small; $\alpha$ is very small; power target is very high	Check whether the effect size is realistic; consider whether the study is feasible; explore precision analysis as an alternative
Required $N$ is smaller than expected	Effect size is large; one-tailed test used; power target is low (e.g., 0.70)	Verify inputs; confirm directionality; consider increasing power target
Power does not reach target even with very large $N$	Effect size effectively zero; test has an inherent power ceiling	Check whether $H_1$ is correctly specified; effect size of zero gives power $= \alpha$ regardless of $N$
Post-hoc power is very low (e.g., $< 0.20$ )	Study was substantially underpowered; effect is genuinely small	Expected when $p > \alpha$ ; replace post-hoc power with CI for effect size and sensitivity analysis
DataStatPro gives different $N$ than another power calculator	Different rounding conventions, approximation formulae, or non-central distribution methods	Both may be correct; use exact non-central distribution methods (DataStatPro default); difference is typically 0–2 participants
Design effect is very large ( $> 5$ )	Very high ICC or very large cluster size	Consider increasing number of clusters rather than cluster size; add cluster-level covariates to reduce ICC
Power is not improved by doubling $N$ (clustered design)	ICCis high; adding individuals within clusters is inefficient	Add more clusters, not more individuals per cluster; consult a biostatistician
Power for interaction effect is very low	Interaction effects are inherently smaller and harder to detect than main effects	Plan for 4× the $N$ needed for the main effect to detect a crossover interaction; report as a limitation
Cohen's $h$ is unusually large	Proportions are both near 0 or near 1; arcsine transformation stretches the scale	Verify $\pi_1$ and $\pi_2$ ; the arcsine transformation is mathematically correct; large $h$ reflects high sensitivity in that region
Achieved power slightly below target after rounding	$N$ formula gives a non-integer; rounding up gives target power; rounding down falls just below	Always round up, never down; add 1–2 participants as a buffer
Equivalence test requires much larger $N$ than superiority test	Equivalence requires showing the effect is within a narrow margin; inherently conservative	Use a realistic equivalence margin; consider whether the margin is defined appropriately
Sample size for ANOVA with many groups is surprisingly large	Many-group ANOVA has reduced power per group for fixed total $N$ ; each group has small $n$	Concentrate comparisons on the most important pairwise contrasts; consider a planned contrast rather than omnibus ANOVA
Attrition-adjusted $N$ is unrealistically large	Very high assumed attrition rate	Revisit attrition estimates; consider strategies to reduce dropout; report as a study limitation if $N$ is infeasible
Power analysis for regression gives very different $N$ from t-test	Different effect size frameworks ( $f^2$ vs. $d$ ); different $df$	Convert between effect sizes using DataStatPro's converter; confirm $u$ (predictors tested) is specified correctly

15. Quick Reference Cheat Sheet

The Four Elements of Power Analysis

$\text{Power} (1-\beta) \quad \longleftrightarrow \quad N \quad \longleftrightarrow \quad \text{Effect Size} \quad \longleftrightarrow \quad \alpha$

Specify any three → solve for the fourth.

Core Sample Size Formulae

Test	Effect Size	Per-Group $n$ or Total $N$
One-sample t	$d$	$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d}\right)^2$
Two-sample t (equal)	$d$	$n_{per} = \frac{2(z_{\alpha/2} + z_{1-\beta})^2}{d^2}$
Paired t	$d_z$	$n_{pairs} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{d_z}\right)^2$
Correlation	$\rho$ ( $z_\rho = \text{arctanh}\rho$ )	$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{z_\rho}\right)^2 + 3$
Proportion (one)	$h$	$n = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2$
Proportion (two)	$h$	$n_{per} = \left(\frac{z_{\alpha/2} + z_{1-\beta}}{h}\right)^2$
Chi-square	$w$	$N = \lambda_{req} / w^2$
Regression	$f^2$	$N = \lambda_{req} / f^2 + p + 1$

Key Z-Score Constants

$\alpha$ (two-tailed)	$z_{\alpha/2}$	Power ( $1-\beta$ )	$z_{1-\beta}$
$.10$	1.645	$0.70$	0.524
$.05$	1.960	$0.80$	0.842
$.025$	2.241	$0.85$	1.036
$.01$	2.576	$0.90$	1.282
$.001$	3.291	$0.95$	1.645
		$0.99$	2.326

Sample Size for Two-Sample t-Test ( $\alpha = .05$ , Two-Tailed)

$d$	Power = 0.70	Power = 0.80	Power = 0.90	Power = 0.95
0.20 (small)	264	394	526	650
0.30	118	176	234	290
0.50 (medium)	44	64	86	106
0.80 (large)	18	26	34	42
1.00	12	18	24	28
1.20	8	12	16	20

(Figures are per group; multiply by 2 for total $N$ .)

Sample Size for Correlation ( $\alpha = .05$ , Two-Tailed)

$\rho$	Power = 0.80	Power = 0.90
$.10$	782	1046
$.20$	194	259
$.30$	84	112
$.40$	46	61
$.50$	28	37
$.70$	12	16

Cohen's Effect Size Conventions

Test	Small	Medium	Large
t-test ( $d$ )	0.20	0.50	0.80
ANOVA ( $f$ )	0.10	0.25	0.40
Correlation ( $r$ )	0.10	0.30	0.50
Chi-square ( $w$ )	0.10	0.30	0.50
Regression ( $f^2$ )	0.02	0.15	0.35
Proportion ( $h$ )	0.20	0.50	0.80

Attrition Adjustment

$N_{enroll} = \left\lceil \frac{N_{analysis}}{1 - r_{attrition}} \right\rceil$

Attrition Rate	Inflation Factor
5%	× 1.053
10%	× 1.111
15%	× 1.176
20%	× 1.250
25%	× 1.333
30%	× 1.429

Design Effect for Clustered Studies

$DEFF = 1 + (m - 1) \times ICC, \qquad N_{cluster} = N_{simple} \times DEFF$

ICC	Cluster size $m = 10$	$m = 20$	$m = 30$
0.01	1.09	1.19	1.29
0.05	1.45	1.95	2.45
0.10	1.90	2.90	3.90
0.20	2.80	4.80	6.80

Effect Size Conversions

From	To	Formula
$r$	$d$	$d = 2r/\sqrt{1-r^2}$
$d$	$r$	$r = d/\sqrt{d^2+4}$
$f$	$\eta^2$	$\eta^2 = f^2/(1+f^2)$
$f^2$	$R^2$	$R^2 = f^2/(1+f^2)$
$OR$	$d$	$d = \ln(OR)/1.814$

Four Modes of Power Analysis Decision Guide

Goal	Analysis Mode	Fixed	Solved
Plan sample size before data collection	A priori	$\alpha$ , $1-\beta$ , $ES$	$N$
Assess power of a completed study	Post-hoc	$\alpha$ , $N$ , $ES$	$1-\beta$
Find smallest detectable effect	Sensitivity	$\alpha$ , $N$ , $1-\beta$	$ES_{min}$
Justify a non-standard $\alpha$	Criterion	$N$ , $1-\beta$ , $ES$	$\alpha$

APA 7th Edition Power Analysis Reporting Templates

A priori (standard): "An a priori power analysis conducted in DataStatPro indicated that [N per group / total N] participants were required to detect [effect size measure] = [value] with [power]% power at a [one/two]-tailed $\alpha$ = [value] (achieved power = [value]). The effect size was based on [source/justification]."

A priori (with attrition): "[As above]. Assuming [X]% attrition, [inflated N] participants will be recruited."

A priori (clustered design): "[As above]. Assuming an ICC of [value] and an average cluster size of [m], the design effect was [DEFF], yielding a required [N clusters] clusters of [m] participants each (total $N$ = [value])."

Sensitivity analysis: "With [N] participants per group, the study had [power]% power to detect an effect of [ES measure] = [MDE value] at $\alpha$ = [value] (two-tailed). Power for a range of effect sizes is provided in [Table/Figure X]."

Non-significant result with sensitivity: "With [N] per group, the study had [power]% power to detect [ES measure] = [value] at $\alpha$ = [value]. The 95% CI for [effect size] = [[LB], [UB]], indicating that effects as large as [UB value] cannot be ruled out. A future study powered to detect [ES measure] = [target value] with 80% power would require [future N] per group."

Power Analysis Reporting Checklist

Element	Required
Analysis mode (a priori / post-hoc / sensitivity)	✅ Always
Statistical test named exactly	✅ Always
Effect size measure and value	✅ Always
Effect size source and justification	✅ Always
Significance level $\alpha$ and directionality	✅ Always
Power target ( $1 - \beta$ )	✅ Always
Computed $N$ total and per group	✅ Always
Achieved power at computed $N$	✅ Always
Software and version	✅ Always
Attrition rate and enrollment $N$	✅ When attrition is anticipated
Design effect, ICC, cluster size	✅ For clustered designs
Multiple testing correction and adjusted $\alpha'$	✅ When multiple primary outcomes
MDE in original units	✅ Recommended
Sensitivity power table or curve	✅ Recommended
Equivalence margin (for TOST)	✅ For equivalence studies
Pre-registration reference	✅ When pre-registered
Discussion of feasibility	✅ When $N$ is large or constrained
Bayesian / average power	✅ When prior uncertainty about effect size is substantial

This tutorial provides a comprehensive foundation for understanding, conducting, interpreting, visualising, and reporting sample size and power analyses within the DataStatPro application. For further reading, consult Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988) for foundational theory and conventions; Lakens, Scheel & Isager's "Equivalence Testing for Psychological Research: A Tutorial" (2018) for TOST methods; Gelman & Carlin's "Beyond Power Calculations" (2014) for design analysis and Type M/S errors; Faul, Erdfelder, Lang & Buchner's "GPower 3" (2007) for computational methods; and Zar's "Biostatistical Analysis" (5th ed., 2010) for biological and health science applications. For feature requests or support, contact the DataStatPro team.*

Sample Size and Power Analysis

Sample Size and Power Analysis: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 Hypothesis Testing Framework

1.2 The Four Outcomes of a Hypothesis Test

1.3 The Significance Level (α\alphaα)

1.4 The p-Value

1.5 Effect Size

1.6 Cohen's Conventions for Effect Size

1.7 The Normal and Non-Central Distributions

1.8 Directionality: One-Tailed vs. Two-Tailed Tests

2. What are Sample Size and Power Analysis?

2.1 The Core Questions

2.2 Why Power Analysis Matters

2.3 The Four Elements of Power Analysis

2.4 Four Modes of Power Analysis

2.5 Desired Power: Choosing 1−β1 - \beta1−β

2.6 Real-World Applications

3. The Mathematics Behind Power Analysis

3.1 The General Power Framework

3.2 Power for the One-Sample z-Test

3.3 Power for the Two-Sample Independent t-Test

3.4 Power for the Paired t-Test

3.5 Power for One-Way ANOVA

3.6 Power for Pearson Correlation

3.7 Power for the Chi-Square Test of Association

3.8 Power for Proportion Tests

3.8.1 One-Sample Proportion Test

3.8.2 Two-Sample Proportion Test

3.9 Power for Multiple Regression

3.10 Summary of Key Sample Size Formulae

3.11 The Power Curve

4. Considerations and Planning Checklist

4.1 Specifying the Effect Size: The Critical Decision

4.2 Choosing the Significance Level (α\alphaα)

4.3 Choosing the Power Target (1−β1 - \beta1−β)

4.4 Identifying the Primary Outcome and Test

4.5 Accounting for Anticipated Attrition and Missing Data

4.6 Accounting for Stratification and Clustering

4.7 Multiple Testing Corrections

4.8 Reporting the Power Analysis: Documentation Standards

5. Power Analysis for Common Statistical Tests

5.1 One-Sample t-Test

5.2 Two-Sample Independent t-Test

5.3 Paired t-Test

5.4 One-Way ANOVA (Fixed Effects)

5.5 Factorial ANOVA

5.6 Repeated Measures ANOVA

5.7 Pearson Correlation

5.8 Multiple Regression

5.9 Chi-Square Test of Association

5.10 Test Type Comparison Table

6. Using the Sample Size and Power Analysis Calculator Component

Step-by-Step Guide

7. Step-by-Step Procedure

7.1 Full Manual Procedure for A Priori Sample Size Calculation

Step 1 — Identify the Primary Research Question

Step 2 — Choose the Significance Level and Directionality

Step 3 — Choose the Power Target

Step 4 — Specify and Justify the Effect Size

Step 5 — Apply the Sample Size Formula

Step 6 — Adjust for Unequal Groups (If Applicable)

Step 7 — Adjust for Attrition

Step 8 — Adjust for Clustering (If Applicable)

Step 9 — Adjust for Multiple Testing (If Applicable)

Step 10 — Verify with Power Curve

Step 11 — Conduct Sensitivity Analysis

Step 12 — Document and Report

8. Interpreting the Output

8.1 Reading the Required Sample Size

8.2 Understanding the Power Curve

8.3 Sensitivity Output: Minimum Detectable Effect

8.4 The Non-Centrality Parameter (λ\lambdaλ)

8.5 Interpreting Achieved Post-Hoc Power

8.6 Interpreting the Contour Plot

9. Visualising Power and Sample Size

9.1 Power Curve (Power vs. Sample Size)

9.2 Sensitivity Curve (Power vs. Effect Size)

9.3 Power Contour Plot (Power as a Function of NNN and Effect Size)

1.3 The Significance Level ( $\alpha$ )

2.5 Desired Power: Choosing $1 - \beta$

4.2 Choosing the Significance Level ( $\alpha$ )

4.3 Choosing the Power Target ( $1 - \beta$ )

8.4 The Non-Centrality Parameter ( $\lambda$ )

9.3 Power Contour Plot (Power as a Function of $N$ and Effect Size)

Mistake 3: Confusing Total $N$ with Per-Group $n$

Sample Size for Two-Sample t-Test ( $\alpha = .05$ , Two-Tailed)

Sample Size for Correlation ( $\alpha = .05$ , Two-Tailed)