Knowledge Base / Effect Size Calculator Inferential Statistics 60 min read

Effect Size Calculator

Comprehensive reference guide for effect size calculations and interpretation.

Effect Size Calculator: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of Effect Sizes all the way through advanced estimation, interpretation, reporting, and practical usage within the DataStatPro application. Whether you are encountering effect sizes for the first time or looking to deepen your understanding of practical significance in research, this guide builds your knowledge systematically from the ground up.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is an Effect Size?
  3. The Mathematics Behind Effect Sizes
  4. Assumptions of Effect Size Estimation
  5. Types of Effect Sizes
  6. Using the Effect Size Calculator Component
  7. Effect Sizes for Mean Differences
  8. Effect Sizes for Variance Explained
  9. Effect Sizes for Associations and Categorical Data
  10. Model Fit and Evaluation
  11. Advanced Topics
  12. Worked Examples
  13. Common Mistakes and How to Avoid Them
  14. Troubleshooting
  15. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into effect sizes, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 Statistical Significance vs. Practical Significance

A p-value answers the question: "If the null hypothesis were true, how likely is it that we would observe data at least as extreme as what we actually observed?"

A small p-value tells us the result is unlikely under H0H_0 — but it does not tell us how large the effect is or whether it matters in practice.

Consider two studies, both with p<.001p < .001:

Study A has a highly significant but trivially small effect. Study B has a large, practically meaningful effect. Effect sizes quantify the magnitude of an effect independently of sample size — they answer the question: "How big is the effect?"

1.2 Standard Deviation and Variance

The standard deviation σ\sigma (population) or ss (sample) measures the spread of a distribution:

s=1n1i=1n(xixˉ)2s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}

Most effect sizes for mean differences are standardised by dividing the raw difference by a standard deviation. This makes the effect size unit-free and comparable across studies using different measurement scales.

1.3 The Normal Distribution

Many effect size formulas assume that data come from normally distributed populations. The standard normal distribution ZN(0,1)Z \sim \mathcal{N}(0, 1) is used to convert effect sizes into probabilities such as the common language effect size and probability of superiority.

The relationship between an effect size dd and the area of non-overlap between two normal distributions is fundamental to interpreting effect sizes in terms of real-world probabilities.

1.4 Correlation and Covariance

The Pearson correlation coefficient rr is a standardised measure of linear association:

r=i=1n(xixˉ)(yiyˉ)(n1)sxsyr = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{(n-1)s_x s_y}

It ranges from 1-1 to +1+1 and is itself an effect size for the strength of a linear relationship between two continuous variables.

1.5 Variance Decomposition

Many effect sizes for ANOVA and regression are ratios of variances:

η2=SSeffectSStotal\eta^2 = \frac{SS_{effect}}{SS_{total}}

Understanding the decomposition of variance into between-group (explained) and within-group (unexplained) components is essential for interpreting these effect sizes.

1.6 Confidence Intervals

A confidence interval (CI) for an effect size gives a range of plausible values for the true population effect, given the sample data. A 95% CI means that if we repeated the study 100 times, approximately 95 of the resulting intervals would contain the true population effect size.

Always report effect sizes with confidence intervals — a point estimate alone is insufficient because it conveys no information about precision or uncertainty.

1.7 The Non-Central Distributions

Effect sizes such as Cohen's dd and η2\eta^2 follow non-central distributions in finite samples — their sampling distributions are not symmetric, especially when the true population effect is non-zero.

Confidence intervals for these effect sizes must account for the non-centrality of the sampling distribution, which is why exact CIs require iterative numerical methods rather than simple ±z×SE\pm z \times SE formulas.


2. What is an Effect Size?

2.1 The Core Idea

An effect size is a standardised, scale-free numerical index that quantifies the magnitude of a phenomenon — how large a difference is, how strong an association is, or how much variance is explained. Effect sizes are:

2.2 Why Effect Sizes Are Essential

The limitations of p-values alone:

  1. With large samples, even trivially small effects produce significant p-values.
  2. With small samples, even large effects may be non-significant.
  3. A p-value carries no information about the size or direction of an effect.
  4. p-values cannot be meaningfully compared across studies with different sample sizes.

What effect sizes add:

2.3 The Effect Size Framework

Every effect size belongs to one of three broad families:

FamilyWhat It MeasuresExamples
dd-familyStandardised mean differencesCohen's dd, Hedges' gg, Glass's Δ\Delta
rr-familyStrength of associationPearson rr, r2r^2, η2\eta^2, ω2\omega^2, ε2\varepsilon^2
Risk/Odds familyProbability-based contrastsOdds ratio, Risk ratio, NNT, ARD

2.4 Real-World Applications

FieldEffect Size ApplicationCommon Measure
Clinical PsychologyEffectiveness of CBT vs. control on depressionCohen's dd, Hedges' gg
MedicineDrug vs. placebo on blood pressureCohen's dd, Risk ratio, NNT
EducationEffect of tutoring on exam scoresCohen's dd, η2\eta^2
MarketingBrand A vs. B on purchase intentCohen's dd, Cramér's VV
NeuroscienceBrain region activation between groupsCohen's dd, ηp2\eta_p^2
GeneticsSNP association with disease riskOdds ratio, R2R^2
Organisational PsychologyLeadership training on productivityCohen's dd, f2f^2
Public HealthVaccination programme on infection rateRisk ratio, ARD, NNT
EcologySpecies richness across habitatsCohen's dd, η2\eta^2

2.5 Statistical Significance vs. Effect Size: A Unified View

The relationship between sample size, effect size, and statistical significance can be summarised by the power equation. For a tt-test:

t=dn2(independent samples)t = d \cdot \sqrt{\frac{n}{2}} \quad \text{(independent samples)}

This shows that the tt-statistic (and therefore p-value) is a joint function of both the effect size dd AND the sample size nn. A non-significant result could mean:

A significant result could mean:

Effect sizes disentangle magnitude from sample size.


3. The Mathematics Behind Effect Sizes

3.1 Cohen's dd — The Fundamental Standardised Mean Difference

Cohen's dd is the cornerstone effect size for comparing two means. It expresses the difference between two means in standard deviation units.

For two independent groups:

d=xˉ1xˉ2spooledd = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}

Where the pooled standard deviation is:

spooled=(n11)s12+(n21)s22n1+n22s_{pooled} = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}

For a one-sample design (comparing a sample mean to a known population value μ0\mu_0):

d=xˉμ0sd = \frac{\bar{x} - \mu_0}{s}

For a paired/repeated-measures design:

dpaired=dˉsdd_{paired} = \frac{\bar{d}}{s_d}

Where dˉ\bar{d} is the mean of the difference scores and sds_d is the standard deviation of the difference scores.

Interpretation: d=1.0d = 1.0 means the two group means are 1 standard deviation apart — a group with mean 50 differs from a group with mean 60 if both have s=10s = 10.

3.2 Hedges' gg — Bias-Corrected Cohen's dd

Cohen's dd is slightly positively biased in small samples — it overestimates the true population effect size. Hedges' gg applies a correction factor JJ to remove this bias:

g=d×Jg = d \times J

Where the correction factor is:

J=134ν1J = 1 - \frac{3}{4\nu - 1}

With ν=n1+n22\nu = n_1 + n_2 - 2 degrees of freedom (for independent samples) or ν=n1\nu = n - 1 (for one-sample or paired designs).

A more precise version uses the gamma function:

J=Γ(ν/2)ν/2Γ((ν1)/2)J = \frac{\Gamma(\nu/2)}{\sqrt{\nu/2} \cdot \Gamma((\nu-1)/2)}

The bias is negligible for n>20n > 20 per group but can be substantial (>5%> 5\%) for very small samples (n<10n < 10).

3.3 Glass's Δ\Delta — Using the Control Group SD

When the two groups have different variances (especially in pre-post or treatment-control designs where the treatment may change variability), Glass's Δ\Delta standardises by only the control group standard deviation:

Δ=xˉtreatmentxˉcontrolscontrol\Delta = \frac{\bar{x}_{treatment} - \bar{x}_{control}}{s_{control}}

This makes the effect size interpretable as "how many standard deviation units above (or below) the control group distribution is the average treatment participant?"

3.4 Confidence Intervals for dd Using the Non-Central tt-Distribution

The exact 95% CI for Cohen's dd uses the non-central tt-distribution. The observed tt-statistic has a non-central tt-distribution with non-centrality parameter:

λ=dn1n2n1+n2\lambda = d \sqrt{\frac{n_1 n_2}{n_1 + n_2}}

The confidence limits for λ\lambda are found by solving:

P(tν(λL)tobs)=0.025P(t_\nu(\lambda_L) \geq t_{obs}) = 0.025 and P(tν(λU)tobs)=0.025P(t_\nu(\lambda_U) \leq t_{obs}) = 0.025

Then converting back to dd:

dL=λLn1+n2n1n2,dU=λUn1+n2n1n2d_{L} = \lambda_L \sqrt{\frac{n_1 + n_2}{n_1 n_2}}, \quad d_{U} = \lambda_U \sqrt{\frac{n_1 + n_2}{n_1 n_2}}

This requires numerical iteration (no closed form) and is computed automatically by DataStatPro.

An approximate 95% CI (adequate for n>20n > 20 per group) uses:

d±1.96×SEd,SEdn1+n2n1n2+d22(n1+n2)d \pm 1.96 \times SE_d, \quad SE_d \approx \sqrt{\frac{n_1 + n_2}{n_1 n_2} + \frac{d^2}{2(n_1 + n_2)}}

3.5 Eta Squared (η2\eta^2) — Proportion of Variance Explained

Eta squared is the proportion of total variance in the dependent variable attributable to the independent variable (group membership in ANOVA):

η2=SSeffectSStotal\eta^2 = \frac{SS_{effect}}{SS_{total}}

For a one-way ANOVA:

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}

η2=SSbetweenSSbetween+SSwithin\eta^2 = \frac{SS_{between}}{SS_{between} + SS_{within}}

Relationship to Cohen's dd (two groups only):

η2=d2d2+4,d=2η21η2\eta^2 = \frac{d^2}{d^2 + 4}, \quad d = \frac{2\sqrt{\eta^2}}{\sqrt{1 - \eta^2}}

Limitation: η2\eta^2 is biased upward — it overestimates the population effect because it uses the total sum of squares from the sample. It should not be reported for multi-factor ANOVA (use partial η2\eta^2 or ω2\omega^2 instead).

3.6 Partial Eta Squared (ηp2\eta_p^2) — Controlling for Other Effects

In factorial ANOVA (multiple IVs), partial eta squared estimates the proportion of variance explained by one effect after removing the variance attributable to other effects:

ηp2=SSeffectSSeffect+SSerror\eta_p^2 = \frac{SS_{effect}}{SS_{effect} + SS_{error}}

Note that in a one-way ANOVA (single IV), ηp2=η2\eta_p^2 = \eta^2. In multi-factor ANOVA, ηp2>η2\eta_p^2 > \eta^2 for every effect, and the sum of all partial η2\eta^2 values can exceed 1.0.

⚠️ Because partial η2\eta^2 values can sum to more than 1.0 across all effects in a factorial design, they should never be compared to the "proportion of total variance explained" — that interpretation applies only to η2\eta^2, not ηp2\eta_p^2.

3.7 Omega Squared (ω2\omega^2) — Unbiased Variance-Explained Effect Size

Omega squared (ω2\omega^2) is a bias-corrected version of η2\eta^2 that better estimates the population proportion of variance explained:

For one-way ANOVA:

ω2=SSbetween(K1)MSwithinSStotal+MSwithin\omega^2 = \frac{SS_{between} - (K-1) \cdot MS_{within}}{SS_{total} + MS_{within}}

Where:

Partial omega squared for factorial designs:

ωp2=SSeffectdfeffectMSerrorSStotal+MSerror\omega_p^2 = \frac{SS_{effect} - df_{effect} \cdot MS_{error}}{SS_{total} + MS_{error}}

ω2\omega^2 is generally preferred over η2\eta^2 because it does not inflate with small samples and provides a less biased estimate of the population effect.

3.8 Epsilon Squared (ε2\varepsilon^2) — Another Unbiased Estimate

Epsilon squared is an alternative to omega squared, computationally simpler:

ε2=SSbetween(K1)MSwithinSStotal\varepsilon^2 = \frac{SS_{between} - (K-1) \cdot MS_{within}}{SS_{total}}

Like ω2\omega^2, ε2\varepsilon^2 corrects for positive bias and can be slightly negative in small samples when the true population effect is near zero.

3.9 Cohen's ff and f2f^2 — Effect Size for ANOVA and Regression

Cohen's ff converts variance-explained effect sizes into a ratio suitable for power analysis:

f=η21η2f = \sqrt{\frac{\eta^2}{1 - \eta^2}}

Or from ω2\omega^2:

f=ω21ω2f = \sqrt{\frac{\omega^2}{1 - \omega^2}}

Cohen's f2f^2 is used for multiple regression and includes several variants:

Global f2f^2 (overall model fit):

fglobal2=R21R2f^2_{global} = \frac{R^2}{1 - R^2}

Local f2f^2 (effect of a specific predictor or set of predictors, controlling for others):

flocal2=Rfull2Rreduced21Rfull2=ΔR21Rfull2f^2_{local} = \frac{R^2_{full} - R^2_{reduced}}{1 - R^2_{full}} = \frac{\Delta R^2}{1 - R^2_{full}}

3.10 Pearson's rr and r2r^2 — Correlation and Coefficient of Determination

Pearson's rr is the effect size for the linear relationship between two continuous variables:

r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n(x_i-\bar{x})^2 \cdot \sum_{i=1}^n(y_i-\bar{y})^2}}

r2r^2 (the coefficient of determination) is the proportion of variance in YY explained by XX:

r2=1SSresidualSStotalr^2 = 1 - \frac{SS_{residual}}{SS_{total}}

Confidence interval for rr using Fisher's zz-transformation:

zr=12ln(1+r1r)=arctanh(r)z_r = \frac{1}{2}\ln\left(\frac{1+r}{1-r}\right) = \text{arctanh}(r)

SEzr=1n3SE_{z_r} = \frac{1}{\sqrt{n-3}}

95% CI for zrz_r:

zr±1.961n3z_r \pm 1.96 \cdot \frac{1}{\sqrt{n-3}}

Converting CI bounds back to rr:

r=e2zr1e2zr+1=tanh(zr)r = \frac{e^{2z_r} - 1}{e^{2z_r} + 1} = \tanh(z_r)

3.11 Odds Ratio, Risk Ratio, and Number Needed to Treat

For binary outcomes (event vs. no event) in two groups, the primary effect sizes are:

2×22 \times 2 Contingency Table:

EventNo EventTotal
Group 1 (Treatment)aabba+ba+b
Group 2 (Control)ccddc+dc+d

Risk (Probability) in each group:

p1=aa+b,p2=cc+dp_1 = \frac{a}{a+b}, \quad p_2 = \frac{c}{c+d}

Absolute Risk Difference (ARD):

ARD=p1p2\text{ARD} = p_1 - p_2

Risk Ratio (Relative Risk, RR):

RR=p1p2=a/(a+b)c/(c+d)\text{RR} = \frac{p_1}{p_2} = \frac{a/(a+b)}{c/(c+d)}

Odds Ratio (OR):

OR=p1/(1p1)p2/(1p2)=a/bc/d=adbc\text{OR} = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} = \frac{a/b}{c/d} = \frac{ad}{bc}

Number Needed to Treat (NNT):

NNT=1ARD=1p1p2\text{NNT} = \frac{1}{\lvert \text{ARD} \rvert} = \frac{1}{\lvert p_1 - p_2 \rvert}

NNT is the number of patients who must receive the treatment for one additional patient to benefit (or be harmed, if NNT is expressed as NNH — Number Needed to Harm).

95% CI for the log Odds Ratio:

ln(OR^)±1.96×SEln(OR)\ln(\widehat{\text{OR}}) \pm 1.96 \times SE_{\ln(OR)}

Where:

SEln(OR)=1a+1b+1c+1dSE_{\ln(OR)} = \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}}

Back-transforming:

OR95%CI=[eln(OR^)1.96SE,  eln(OR^)+1.96SE]\text{OR}_{95\%\text{CI}} = \left[e^{\ln(\widehat{\text{OR}}) - 1.96 \cdot SE}, \; e^{\ln(\widehat{\text{OR}}) + 1.96 \cdot SE}\right]

3.12 Effect Sizes for Categorical Association

Phi coefficient (ϕ\phi) for 2×22 \times 2 tables:

ϕ=adbc(a+b)(c+d)(a+c)(b+d)\phi = \frac{ad - bc}{\sqrt{(a+b)(c+d)(a+c)(b+d)}}

Equivalent to Pearson rr for two binary variables. Ranges from 1-1 to +1+1.

Cramér's VV for r×cr \times c contingency tables (rr rows, cc columns):

V=χ2nmin(r1,c1)V = \sqrt{\frac{\chi^2}{n \cdot \min(r-1, c-1)}}

Ranges from 0 (no association) to 1 (perfect association).

Cohen's ww for goodness-of-fit and χ2\chi^2 tests:

w=i=1k(P0iP1i)2P0iw = \sqrt{\sum_{i=1}^{k} \frac{(P_{0i} - P_{1i})^2}{P_{0i}}}

Where P0iP_{0i} are the null (expected) proportions and P1iP_{1i} are the alternative (observed/hypothesised) proportions.

3.13 Rank-Biserial Correlation

The rank-biserial correlation (rrbr_{rb}) is the effect size for the Mann-Whitney U test (non-parametric alternative to Cohen's dd when normality is not assumed):

rrb=12Un1n2=2Un1n21r_{rb} = 1 - \frac{2U}{n_1 n_2} = \frac{2U}{n_1 n_2} - 1

Or equivalently:

rrb=Rˉ1Rˉ2n/2r_{rb} = \frac{\bar{R}_1 - \bar{R}_2}{n/2}

Where Rˉ1\bar{R}_1 and Rˉ2\bar{R}_2 are the mean ranks of the two groups, and n=n1+n2n = n_1 + n_2.

Ranges from 1-1 to +1+1. rrb=0.5r_{rb} = 0.5 means that 75% of observations in Group 1 exceed those in Group 2.


4. Assumptions of Effect Size Estimation

4.1 Correct Scale and Direction of Variables

Effect sizes are only meaningful when variables are measured on an appropriate scale and when the direction of differences is clearly defined.

Why it matters: Reversing the direction of scoring (e.g., higher score = worse outcome vs. higher score = better outcome) changes the sign of the effect size. Ambiguous scoring leads to misinterpretation.

How to check: Before computing any effect size, clearly state:

4.2 Normally Distributed Populations (for dd-family)

Cohen's dd, Hedges' gg, and Glass's Δ\Delta assume that the observed scores come from normally distributed populations. Violations of normality can distort the pooled standard deviation and produce misleading effect size estimates.

How to check:

When violated: Use rank-biserial correlation (rrbr_{rb}) for the Mann-Whitney test or common language effect size (CL) as alternatives that do not assume normality.

4.3 Equal or Known Population Variances

Cohen's dd uses the pooled standard deviation, which implicitly assumes homogeneity of variance. When population variances differ substantially:

Variance ratio rule of thumb: If slarger2/ssmaller2>4s^2_{larger}/s^2_{smaller} > 4, consider using Glass's Δ\Delta rather than Cohen's dd.

4.4 Independence of Observations

Effect sizes based on means (Cohen's dd), correlations (rr), and variance-explained measures (η2\eta^2) all assume that observations are independent of each other.

When violated:

4.5 Adequate Sample Size for Stable Estimates

Effect size estimates are very unstable in small samples. The sampling variability of dd can be enormous with n<10n < 10 per group:

nn per groupSEdSE_d (for true d=0.5d = 0.5)95% CI width
50.702.74
100.461.80
200.331.28
500.210.81
1000.150.57
2000.100.40

This table shows that with only 5 observations per group, the 95% CI for d=0.5d = 0.5 spans nearly 3 standard deviation units — essentially uninformative. Effect sizes require adequate sample sizes to be interpretable.

4.6 No Selective Reporting (Publication Bias)

When effect sizes are extracted from published literature, they are subject to publication bias — studies with larger, significant effects are more likely to be published than those with small, non-significant effects. This means that the average published effect size overestimates the true population effect.

Remedies:


5. Types of Effect Sizes

5.1 The Three Families of Effect Sizes

The dd-Family (Standardised Mean Differences)

These effect sizes express the difference between means in standard deviation units.

Effect SizeFormulaStandardiserBest For
Cohen's dd(xˉ1xˉ2)/spooled(\bar{x}_1-\bar{x}_2)/s_{pooled}Pooled SDIndependent samples, equal variances
Hedges' ggd×Jd \times JPooled SD (bias-corrected)Small samples (n<20n < 20 per group)
Glass's Δ\Delta(xˉ1xˉ2)/scontrol(\bar{x}_1-\bar{x}_2)/s_{control}Control group SDUnequal variances; treatment-control
dpairedd_{paired}dˉ/sd\bar{d}/s_dSD of differencesPaired/repeated measures
davd_{av}dˉ/sav\bar{d}/s_{av}Average of group SDsPaired; avoids population assumption
dzd_zt/nt/\sqrt{n}Paired; directly from tt-test

The rr-Family (Variance-Explained and Correlation)

These effect sizes express how much of the total variance is explained by the effect.

Effect SizeFormulaRangeBest For
Pearson rrCov(X,Y)/(sXsY)\text{Cov}(X,Y)/(s_X s_Y)[1,1][-1, 1]Linear correlation
r2r^2SSexplained/SStotalSS_{explained}/SS_{total}[0,1][0, 1]Simple regression
R2R^2 (multiple)1SSres/SStot1 - SS_{res}/SS_{tot}[0,1][0, 1]Multiple regression
η2\eta^2SSeffect/SStotalSS_{effect}/SS_{total}[0,1][0, 1]One-way ANOVA
ηp2\eta_p^2SSeffect/(SSeffect+SSerror)SS_{effect}/(SS_{effect}+SS_{error})[0,1][0, 1]Factorial ANOVA
ω2\omega^2Bias-corrected η2\eta^2(,1](-\infty, 1]ANOVA (preferred)
ωp2\omega_p^2Bias-corrected ηp2\eta_p^2(,1](-\infty, 1]Factorial ANOVA (preferred)
ε2\varepsilon^2Alternative bias correction(,1](-\infty, 1]One-way ANOVA
Cohen's ffη2/(1η2)\sqrt{\eta^2/(1-\eta^2)}[0,)[0, \infty)Power analysis for ANOVA
Cohen's f2f^2R2/(1R2)R^2/(1-R^2)[0,)[0, \infty)Power analysis for regression
Rank-biserial rr12U/(n1n2)1 - 2U/(n_1 n_2)[1,1][-1, 1]Mann-Whitney U test

The Risk/Odds Family

These effect sizes are appropriate for binary outcomes.

Effect SizeFormulaRangeBest For
Absolute Risk Differencep1p2p_1 - p_2[1,1][-1, 1]Clinical decision-making
Risk Ratio (RR)p1/p2p_1/p_2(0,)(0, \infty)Prospective / cohort studies
Odds Ratio (OR)(p1/(1p1))/(p2/(1p2))(p_1/(1-p_1))/(p_2/(1-p_2))(0,)(0, \infty)Case-control studies
Number Needed to Treat1/p1p21/\lvert p_1 - p_2 \rvert(1,)(1, \infty)Clinical applicability
Phi (ϕ\phi)χ2/n\chi^2/\sqrt{n} (for 2×22\times 2)[1,1][-1, 1]2×22\times 2 tables
Cramér's VVχ2/(nmin(r1,c1))\sqrt{\chi^2/(n \cdot \min(r-1,c-1))}[0,1][0, 1]r×cr \times c tables
Cohen's ww(P0P1)2/P0\sqrt{\sum(P_0-P_1)^2/P_0}[0,)[0, \infty)χ2\chi^2 goodness-of-fit

5.2 Choosing the Right Effect Size

The table below provides a quick reference for selecting the appropriate effect size measure based on your statistical test:

Statistical TestEffect Size to ReportNotes
t-test (independent samples)Cohen's dd or Hedges' ggHedges' gg preferred for small samples (n<20n < 20)
t-test (one sample or paired)Cohen's dpairedd_{paired} or dzd_zUse dzd_z when comparing to a known parameter
One-way ANOVAω2\omega^2 (preferred), η2\eta^2 (common)ω2\omega^2 is less biased; η2\eta^2 tends to overestimate
Factorial ANOVAωp2\omega_p^2 (preferred), ηp2\eta_p^2 (common)Use partial versions for factorial designs
Multiple regressionR2R^2, adjusted R2R^2, Cohen's f2f^2Cohen's f2f^2 for local/effect-size-specific measures
CorrelationPearson rr, r2r^2r2r^2 shows variance explained
Chi-squared (2×22 \times 2)Phi (ϕ\phi)Special case of Pearson rr for 2×22 \times 2 tables
Chi-squared (r×cr \times c)Cramér's VVGeneralized version of Phi for larger tables
Risk comparison (binary, two groups)Risk Ratio + ARD + NNTARD = Absolute Risk Difference; NNT = Number Needed to Treat
Case-control study (binary)Odds RatioStandard measure for case-control studies
Mann-Whitney U / WilcoxonRank-biserial correlation (rrbr_{rb})Non-parametric alternative to rr

6. Using the Effect Size Calculator Component

The Effect Size Calculator component in DataStatPro provides a comprehensive tool for computing, visualising, and interpreting effect sizes across all major statistical designs.

Step-by-Step Guide

Step 1 — Select the Effect Size Family

Choose from the "Effect Size Type" dropdown:

Step 2 — Select the Specific Effect Size

Based on your design, select the specific effect size:

💡 Recommendation: For ANOVA, always compute and report ω2\omega^2 (or ωp2\omega_p^2 for factorial designs) in addition to or instead of η2\eta^2. Omega squared is less biased and is increasingly required by journals.

Step 3 — Input Method

Choose how to provide the data:

💡 Tip: When computing effect sizes from published papers that only report test statistics, use the "From test statistics" input method. For example, d=t1/n1+1/n2d = t\sqrt{1/n_1 + 1/n_2} and η2=Fdfbetween/(Fdfbetween+dfwithin)\eta^2 = F \cdot df_{between} / (F \cdot df_{between} + df_{within}).

Step 4 — Specify Design Details

Step 5 — Select Confidence Level

Choose the confidence level for intervals (default: 95%). The application computes:

Step 6 — Select Benchmarks

Choose the benchmark system for interpreting the magnitude:

⚠️ Important: Cohen's benchmarks were intended as rough conventions when no better information is available. Always prioritise domain-specific benchmarks and contextual interpretation over generic small/medium/large labels.

Step 7 — Display Options

Select which outputs and visualisations to display:

Step 8 — Run the Calculation

Click "Calculate Effect Size". The application will:

  1. Compute the requested effect size(s) from the provided data or statistics.
  2. Construct confidence intervals using the appropriate method.
  3. Classify the magnitude using the selected benchmark system.
  4. Generate all selected visualisations.
  5. Provide an interpretation paragraph in plain language.

7. Effect Sizes for Mean Differences

7.1 Cohen's dd for Independent Samples — Full Procedure

Step 1 — Compute group means and standard deviations

xˉ1=1n1i=1n1x1i,xˉ2=1n2i=1n2x2i\bar{x}_1 = \frac{1}{n_1}\sum_{i=1}^{n_1} x_{1i}, \quad \bar{x}_2 = \frac{1}{n_2}\sum_{i=1}^{n_2} x_{2i}

s1=1n11i=1n1(x1ixˉ1)2,s2=1n21i=1n2(x2ixˉ2)2s_1 = \sqrt{\frac{1}{n_1-1}\sum_{i=1}^{n_1}(x_{1i}-\bar{x}_1)^2}, \quad s_2 = \sqrt{\frac{1}{n_2-1}\sum_{i=1}^{n_2}(x_{2i}-\bar{x}_2)^2}

Step 2 — Compute pooled standard deviation

spooled=(n11)s12+(n21)s22n1+n22s_{pooled} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}

Step 3 — Compute Cohen's dd

d=xˉ1xˉ2spooledd = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}

Step 4 — Apply Hedges' correction (especially if n<20n < 20 per group)

g=d×(134(n1+n22)1)g = d \times \left(1 - \frac{3}{4(n_1 + n_2 - 2) - 1}\right)

Step 5 — Compute the 95% CI (exact, via non-central tt)

The exact CI is computed numerically. The approximate 95% CI is:

d±1.96×n1+n2n1n2+d22(n1+n22)d \pm 1.96 \times \sqrt{\frac{n_1 + n_2}{n_1 n_2} + \frac{d^2}{2(n_1 + n_2 - 2)}}

Step 6 — Compute the Common Language Effect Size (CL)

The Common Language Effect Size (McGraw & Wong, 1992) is the probability that a randomly selected person from Group 1 scores higher than a randomly selected person from Group 2:

CL=Φ(d2)CL = \Phi\left(\frac{d}{\sqrt{2}}\right)

Where Φ\Phi is the standard normal CDF.

CL=0.50CL = 0.50 means 50% probability of superiority (no effect); CL=0.75CL = 0.75 means 75% of the time, a person from Group 1 outscores a person from Group 2.

7.2 Cohen's Benchmark Classification for dd and gg

Cohen (1988) proposed the following conventions, intended as rough guides only:

Cohen's ddVerbal LabelEquivalent rrOverlap (%)
0.000.00No effect0.000.00100%100\%
0.200.20Small0.100.1085%85\%
0.500.50Medium0.240.2467%67\%
0.800.80Large0.370.3753%53\%
1.201.20Very large0.510.5140%40\%
2.002.00Huge0.710.7118%18\%

⚠️ Cohen himself warned against mechanical application of these benchmarks. He stated: "The effect size conventions are offered as conventions of last resort, to be used only when no better basis for setting the ES is available." Always contextualise effect sizes within your specific research domain.

Extended benchmarks (Sawilowsky, 2009):

Labeldd
Tiny<0.10< 0.10
Very small0.100.190.10 - 0.19
Small0.200.490.20 - 0.49
Medium0.500.790.50 - 0.79
Large0.801.190.80 - 1.19
Very large1.201.991.20 - 1.99
Huge2.00\geq 2.00

7.3 Variance Overlap Statistics (Cohen's UU)

To complement Cohen's dd, three overlap statistics provide intuitive, probabilistic interpretations of the separation between two normal distributions:

U1U_1: The proportion of the combined distributions that is NOT overlapping:

U1=2Φ(d2)1U_1 = 2\Phi\left(\frac{\lvert d \rvert}{2}\right) - 1

U2U_2: The proportion of one distribution that exceeds the same proportion in the other distribution (percentage of the non-treatment distribution exceeded by the median of the treatment distribution):

U2=Φ(d2)×100%U_2 = \Phi\left(\frac{\lvert d \rvert}{2}\right) \times 100\%

U3U_3 (Cohen's U3U_3): The proportion of the treatment distribution that exceeds the median of the control distribution:

U3=Φ(d)×100%U_3 = \Phi(\lvert d \rvert) \times 100\%

Example for d=0.50d = 0.50:

U3=Φ(0.50)=0.69169.1%U_3 = \Phi(0.50) = 0.691 \to 69.1\%

Interpretation: 69.1% of the treatment group scores above the median of the control group.

ddU3U_3CL (%)Overlap (%)
0.2057.9%55.6%85.3%
0.5069.1%63.8%66.9%
0.8078.8%71.4%52.5%
1.0084.1%76.0%44.8%
1.5093.3%85.6%28.1%
2.0097.7%92.1%16.9%

7.4 Computing dd from Common Test Statistics

When raw data are unavailable, dd can be computed from reported test statistics:

From an independent samples tt-test:

d=t1n1+1n2=tn1+n2n1n2d = t \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} = \frac{t\sqrt{n_1 + n_2}}{\sqrt{n_1 n_2}}

From a one-sample or paired tt-test:

dz=tnd_z = \frac{t}{\sqrt{n}}

From an FF-ratio (two-group ANOVA, df1=1df_1 = 1):

d=F(1n1+1n2)d = \sqrt{F \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}

From a χ2\chi^2 statistic (for ϕ\phi or Cramér's VV):

ϕ=χ2n,V=χ2nmin(r1,c1)\phi = \sqrt{\frac{\chi^2}{n}}, \quad V = \sqrt{\frac{\chi^2}{n \cdot \min(r-1,c-1)}}

Converting between effect size families:

r=dd2+(n1+n2)2n1n2,d=2r1r2r = \frac{d}{\sqrt{d^2 + \frac{(n_1+n_2)^2}{n_1 n_2}}}, \quad d = \frac{2r}{\sqrt{1-r^2}} (for equal group sizes)

η2=d2d2+4\eta^2 = \frac{d^2}{d^2+4} (for equal group sizes)


8. Effect Sizes for Variance Explained

8.1 η2\eta^2, ω2\omega^2, and ε2\varepsilon^2 Comparison

For a one-way ANOVA with KK groups and nn total observations:

Given the ANOVA table:

SourceSSdfMS
Between (Effect)SSBSS_BK1K-1MSB=SSB/(K1)MS_B = SS_B/(K-1)
Within (Error)SSWSS_WnKn-KMSW=SSW/(nK)MS_W = SS_W/(n-K)
TotalSSTSS_Tn1n-1

Eta squared:

η2=SSBSST\eta^2 = \frac{SS_B}{SS_T}

Omega squared (preferred):

ω2=SSB(K1)MSWSST+MSW\omega^2 = \frac{SS_B - (K-1)MS_W}{SS_T + MS_W}

Epsilon squared:

ε2=SSB(K1)MSWSST\varepsilon^2 = \frac{SS_B - (K-1)MS_W}{SS_T}

Relationship: ω2ε2η2\omega^2 \leq \varepsilon^2 \leq \eta^2

All three measure the same thing (proportion of variance explained) but ω2\omega^2 and ε2\varepsilon^2 are corrected for the positive bias of η2\eta^2 in finite samples.

8.2 Benchmark Interpretations for Variance-Explained Effect Sizes

Cohen (1988) benchmarks:

Labelη2\eta^2 or ω2\omega^2fff2f^2rr or RR
Small0.010.010.100.100.020.020.100.10
Medium0.060.060.250.250.150.150.300.30
Large0.140.140.400.400.350.350.500.50

Note on η2\eta^2 benchmarks: These were established when η2\eta^2 was the standard report. Since ω2\omega^2 and ε2\varepsilon^2 are systematically smaller than η2\eta^2, the same verbal benchmarks do not directly transfer. Use the ff or f2f^2 conversions for power analysis regardless of which variance-explained index you report.

8.3 Generalised Eta Squared (ηG2\eta_G^2)

Generalised eta squared (ηG2\eta_G^2; Olejnik & Algina, 2003) is designed for between-subjects comparison across studies by distinguishing between:

ηG2=SSeffectSSeffect+mSSmeasured+SSerror\eta_G^2 = \frac{SS_{effect}}{SS_{effect} + \sum_{m} SS_{measured} + SS_{error}}

Where the summation is over all measured (non-manipulated) variables in the design.

ηG2\eta_G^2 is more comparable across different experimental designs (between-subjects, within-subjects, mixed) than either η2\eta^2 or ηp2\eta_p^2 and is increasingly recommended for factorial and mixed ANOVA designs.

8.4 R2R^2 and Adjusted R2R^2 for Regression

The coefficient of determination R2R^2 is the proportion of variance in YY explained by the regression model:

R2=1SSresidualSStotal=SSregressionSStotalR^2 = 1 - \frac{SS_{residual}}{SS_{total}} = \frac{SS_{regression}}{SS_{total}}

Adjusted R2R^2 corrects for the number of predictors pp in the model:

Radj2=1(1R2)n1np1R^2_{adj} = 1 - (1-R^2)\frac{n-1}{n-p-1}

Adjusted R2R^2 can be negative when the model fits worse than a horizontal line.

R2R^2 change (ΔR2\Delta R^2) for evaluating the increment from adding predictors:

ΔR2=Rfull2Rreduced2\Delta R^2 = R^2_{full} - R^2_{reduced}

Cohen's f2f^2 for the increment:

flocal2=ΔR21Rfull2f^2_{local} = \frac{\Delta R^2}{1 - R^2_{full}}

8.5 Confidence Intervals for η2\eta^2 and ω2\omega^2

CIs for variance-explained effect sizes use the non-central F-distribution. The observed FF-ratio has a non-central FF distribution:

FF(dfbetween,dfwithin,λ)F \sim F'(df_{between}, df_{within}, \lambda)

Where λ\lambda is the non-centrality parameter:

λ=η2n1η2\lambda = \frac{\eta^2 \cdot n}{1 - \eta^2}

The CI bounds for λ\lambda are found numerically, then converted to η2\eta^2:

ηL2=λLλL+n,ηU2=λUλU+n\eta^2_L = \frac{\lambda_L}{\lambda_L + n}, \quad \eta^2_U = \frac{\lambda_U}{\lambda_U + n}

For ω2\omega^2 and ε2\varepsilon^2, a transformation approach is used: first compute the CI for η2\eta^2 (or ff), then convert to the desired effect size.


9. Effect Sizes for Associations and Categorical Data

9.1 Pearson rr — Correlation Effect Size

The Pearson correlation rr is simultaneously a descriptive statistic and an effect size. It requires no additional calculation — the correlation coefficient itself IS the standardised effect size for the strength of a linear relationship.

Benchmarks for rr:

r\lvert r \rvertCohen (1988)Funder & Ozer (2019)
<0.10< 0.10NegligibleVery small (potentially negligible)
0.100.290.10 - 0.29SmallSmall
0.300.490.30 - 0.49MediumMedium / large
0.50\geq 0.50LargeVery large

💡 Funder & Ozer (2019) argued that Cohen's benchmarks are too conservative for social/behavioural science, where r=0.30r = 0.30 is actually a large effect in practice. Consider the base rates in your field when applying benchmarks.

9.2 Interpreting the Odds Ratio

The Odds Ratio (OR) is the most common effect size in case-control studies and logistic regression.

ORInterpretation
1.01.0No difference in odds between groups
>1.0> 1.0Increased odds of event in Group 1 vs. Group 2
<1.0< 1.0Decreased odds of event in Group 1 vs. Group 2
2.02.0Twice the odds
0.50.5Half the odds (equivalent to OR = 2.0 in opposite direction)

Benchmark (Chen et al., 2010 for medical research):

ln(OR)\lvert \ln(OR) \rvertORLabel
0.200.201.221.22Small
0.500.501.651.65Medium
0.800.802.232.23Large

Converting OR to Cohen's dd (for meta-analytic purposes):

d=ln(OR)3πln(OR)×0.5513d = \frac{\ln(OR) \cdot \sqrt{3}}{\pi} \approx \ln(OR) \times 0.5513

Converting OR to rr:

r=dd2+4r = \frac{d}{\sqrt{d^2 + 4}}

9.3 Risk Ratio vs. Odds Ratio — When to Use Which

SituationRecommended Effect SizeWhy
Prospective/cohort studyRisk Ratio (RR)Probabilities are directly estimable
Case-control studyOdds Ratio (OR)Incidence not estimable; OR is invariant
Clinical trial (binary outcome)RR + ARD + NNTAll provide different, complementary information
Rare events (p<0.10p < 0.10)Either (OR \approx RR when rare)OR approximates RR for rare outcomes
Common events (p>0.10p > 0.10)RR preferredOR exaggerates the effect vs. RR
Logistic regressionORNatural output of logistic model

⚠️ A common mistake is interpreting an Odds Ratio as a Risk Ratio when the event is common (p>0.10p > 0.10). The OR always exaggerates the RR when the event is common. For example, OR = 3.0 may correspond to RR = 2.0 when the control event rate is 30%. Always report the ARD and NNT alongside OR or RR for clinical interpretability.

9.4 Number Needed to Treat (NNT)

The NNT is one of the most clinically interpretable effect sizes:

NNT=1ptreatmentpcontrol=1ARD\text{NNT} = \frac{1}{\lvert p_{treatment} - p_{control} \rvert} = \frac{1}{\lvert \text{ARD} \rvert}

A treatment with NNT = 5 means that on average, 5 patients must be treated for 1 additional patient to benefit compared to control.

NNTClinical Interpretation
11Every treated patient benefits (perfect)
252 - 5Excellent — highly effective treatment
5105 - 10Good — meaningful clinical benefit
105010 - 50Moderate — benefit for a minority of treated patients
>50> 50Small — many patients treated for little benefit
\inftyNo treatment benefit (ARD = 0)

95% CI for NNT (Altman method):

NNTCI=1ARD±1.96×SEARD\text{NNT}_{CI} = \frac{1}{\text{ARD} \pm 1.96 \times SE_{ARD}}

Where SEARD=p1(1p1)n1+p2(1p2)n2SE_{ARD} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}.

⚠️ NNT CIs can be awkward when the CI for ARD includes 0, producing a CI that spans from a negative NNT (Needed to Harm, NNH) to a positive NNT through an infinite discontinuity. In this case, report both sides of the CI as NNT and NNH separately.

9.5 Cramér's VV for Multi-Way Tables

Cramér's VV is the standard effect size for χ2\chi^2 tests of independence in tables larger than 2×22 \times 2:

V=χ2nmin(r1,c1)V = \sqrt{\frac{\chi^2}{n \cdot \min(r-1, c-1)}}

Benchmarks (Cohen, 1988, adjusted by min(r1,c1)\min(r-1,c-1)):

min(r1,c1)\min(r-1,c-1)SmallMediumLarge
1 (2×22\times 2)0.100.100.300.300.500.50
2 (3×33\times 3)0.070.070.210.210.350.35
3 (4×44\times 4)0.060.060.170.170.290.29
4 (5×55\times 5)0.050.050.150.150.250.25

Corrected Cramér's VV (Bergsma, 2013 correction for small samples and sparse tables):

V~=max(0,ϕ~21n(r1)(c1)(min(r,c)1))\tilde{V} = \max\left(0, \sqrt{\tilde{\phi}^2 - \frac{1}{n}\frac{(r-1)(c-1)}{(\min(r,c)-1)}}\right)

Where ϕ~2=χ2/n(r1)(c1)n1\tilde{\phi}^2 = \chi^2/n - \frac{(r-1)(c-1)}{n-1}.

The corrected version corrects for positive bias and is recommended for small samples or sparse tables.


10. Model Fit and Evaluation

10.1 Evaluating Effect Size Precision — The Confidence Interval

The primary evaluation criterion for an effect size is its confidence interval (CI). The CI communicates both the direction and magnitude of the effect AND the uncertainty around the estimate.

Rules for interpreting effect size CIs:

CI propertyInterpretation
CI entirely above zero (or positive null)Effect is significantly positive
CI entirely below zero (or negative null)Effect is significantly negative
CI contains zeroEffect not statistically significant
Narrow CIPrecise estimate (large nn)
Wide CIImprecise estimate (small nn) — interpret point estimate cautiously
CI range entirely within small rangeEffect is definitely small
CI range spans from small to largeEffect magnitude is uncertain

10.2 Precision as a Function of Sample Size

The width of the 95% CI for Cohen's dd decreases as nn increases:

CI Width2×1.96×SEd2×1.96×n1+n2n1n2+d22(n1+n2)\text{CI Width} \approx 2 \times 1.96 \times SE_d \approx 2 \times 1.96 \times \sqrt{\frac{n_1+n_2}{n_1 n_2} + \frac{d^2}{2(n_1+n_2)}}

For equal group sizes (n1=n2=nn_1 = n_2 = n) and d=0.5d = 0.5:

nn per groupApprox CI WidthInterpretation
101.86Very imprecise
201.28Imprecise
500.80Moderate precision
1000.57Good precision
2000.40High precision
5000.25Very high precision

10.3 The Minimal Effect Size of Interest (MESI) — Equivalence Testing

For many applications, researchers are not just interested in whether an effect is non-zero, but whether it exceeds a minimum meaningful threshold. The Minimum Effect Size of Interest (MESI) defines the smallest effect that would be practically or clinically important.

Two One-Sided Tests (TOST) equivalence testing:

Define bounds Δ-\Delta and +Δ+\Delta as the MESI (e.g., Δ=0.20\Delta = 0.20 for a "trivially small" effect). The null hypothesis of the equivalence test is:

H0:dΔH_0: \lvert d \rvert \geq \Delta (the effect is NOT negligible)

H1:d<ΔH_1: \lvert d \rvert < \Delta (the effect IS negligible)

The equivalence is supported when both one-sided tests reject their respective nulls. Practically, the effect is declared equivalent to zero (negligible) when the 90% CI for dd falls entirely within (Δ,+Δ)(-\Delta, +\Delta).

💡 Equivalence testing is increasingly important for null results. A study that fails to reject H0:d=0H_0: d = 0 does not establish that the effect is zero or negligible — only equivalence testing can establish negligibility.

10.4 Power Analysis Based on Effect Size

Effect sizes are the primary input to a priori power analysis — determining the required sample size before conducting a study.

Required sample size for a two-sample tt-test at power 1β1-\beta and significance α\alpha:

nper  group=2(z1α/2+z1β)2d2n_{per\;group} = \frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{d^2}

For α=.05\alpha = .05 and power = 0.800.80 (z.975=1.96z_{.975} = 1.96, z.80=0.84z_{.80} = 0.84):

nper  group=2(1.96+0.84)2d2=15.68d2n_{per\;group} = \frac{2(1.96 + 0.84)^2}{d^2} = \frac{15.68}{d^2}

Cohen's ddnn per group (power = 0.80)nn per group (power = 0.90)
0.20 (small)394527
0.50 (medium)6485
0.80 (large)2634
1.00 (large)1722

10.5 Sensitivity Analysis — Detectable Effect for a Given nn

The sensitivity analysis (retrospective power analysis) asks: given the sample size already collected, what is the smallest effect size that could be detected with 80% power?

dmin=2(z1α/2+z1β)2nper  groupd_{min} = \sqrt{\frac{2(z_{1-\alpha/2} + z_{1-\beta})^2}{n_{per\;group}}}

For n=30n = 30 per group:

dmin=15.6830=0.523=0.72d_{min} = \sqrt{\frac{15.68}{30}} = \sqrt{0.523} = 0.72

This means a study with n=30n = 30 per group can only reliably detect effects of d0.72d \geq 0.72 (close to Cohen's "large" threshold). Effects smaller than this may exist but will often be missed.

⚠️ Sensitivity analysis (post-hoc power) should not be used to "explain" a non-significant result — this is circular reasoning. Sensitivity analysis is valuable for communicating what magnitudes of effects could have been detected, but it does not address whether a true effect exists.

10.6 Comparing Effect Sizes Across Studies

When comparing effect sizes across studies, ensure:

  1. Same family: dd and rr are different families; convert to a common metric.
  2. Same design: Paired dd and independent dd are not directly comparable.
  3. Same sample type: Clinical vs. community samples may have systematically different effect sizes.
  4. Bias correction: Use Hedges' gg (not Cohen's dd) when comparing across studies with different sample sizes.

Converting between families for comparison:

r=dd2+(n1+n2)2n1n2r = \frac{d}{\sqrt{d^2 + \frac{(n_1+n_2)^2}{n_1 n_2}}} (exact)

rdd2+4r \approx \frac{d}{\sqrt{d^2 + 4}} (equal group sizes)

d2r1r2d \approx \frac{2r}{\sqrt{1-r^2}} (equal group sizes)


11. Advanced Topics

11.1 Meta-Analytic Pooling of Effect Sizes

Meta-analysis combines effect sizes from multiple independent studies using weighted averaging. The weight of each study is the inverse of its variance.

Fixed-effects model (assumes all studies estimate the same true effect θ\theta):

θ^FE=i=1kwiθ^ii=1kwi\hat{\theta}_{FE} = \frac{\sum_{i=1}^{k} w_i \hat{\theta}_i}{\sum_{i=1}^{k} w_i}

Where wi=1/viw_i = 1/v_i and vi=SEi2v_i = SE_i^2 is the variance of the effect size in study ii.

Random-effects model (allows true effects to vary across studies, θiN(μ,τ2)\theta_i \sim \mathcal{N}(\mu, \tau^2)):

wi=1vi+τ^2w_i^* = \frac{1}{v_i + \hat{\tau}^2}

μ^RE=i=1kwiθ^ii=1kwi\hat{\mu}_{RE} = \frac{\sum_{i=1}^{k} w_i^* \hat{\theta}_i}{\sum_{i=1}^{k} w_i^*}

Where τ^2\hat{\tau}^2 is the estimated between-study variance (heterogeneity), computed using the DerSimonian-Laird estimator.

Heterogeneity statistics:

Q=i=1kwi(θ^iθ^FE)2χ2(k1)Q = \sum_{i=1}^k w_i(\hat{\theta}_i - \hat{\theta}_{FE})^2 \sim \chi^2(k-1)

I2=max(0,Q(k1)Q)×100%I^2 = \max\left(0, \frac{Q - (k-1)}{Q}\right) \times 100\%

I2I^2Heterogeneity
025%0 - 25\%Low
2550%25 - 50\%Moderate
5075%50 - 75\%Substantial
>75%> 75\%Considerable

11.2 Effect Sizes for Multilevel and Longitudinal Designs

In multilevel models (e.g., students within schools, patients within hospitals), effect sizes must account for the nested structure.

ICC-based dd for between-cluster effects:

dcluster=μ1μ2σtotal=μ1μ2σbetween2+σwithin2d_{cluster} = \frac{\mu_1 - \mu_2}{\sigma_{total}} = \frac{\mu_1 - \mu_2}{\sqrt{\sigma^2_{between} + \sigma^2_{within}}}

R2R^2 for multilevel models (Nakagawa & Schielzeth, 2013):

Rmarginal2=σf2σf2+σu2+σe2R^2_{marginal} = \frac{\sigma^2_f}{\sigma^2_f + \sigma^2_u + \sigma^2_e} (fixed effects only)

Rconditional2=σf2+σu2σf2+σu2+σe2R^2_{conditional} = \frac{\sigma^2_f + \sigma^2_u}{\sigma^2_f + \sigma^2_u + \sigma^2_e} (fixed + random effects)

Where σf2\sigma^2_f = variance explained by fixed effects, σu2\sigma^2_u = random effects variance, σe2\sigma^2_e = residual variance.

11.3 Standardised vs. Unstandardised Effect Sizes

Not all effect size applications require standardisation. Unstandardised effect sizes (raw mean differences, regression coefficients in original units) are often more informative and actionable than standardised counterparts.

When to use unstandardised effects:

When to use standardised effects:

The "point of controversy": Some methodologists (Lenth, 2001; Tukey, 1991) argue that standardised effect sizes are frequently misinterpreted and that the denominator (which SD is used) is itself a critical and often overlooked choice.

11.4 Effect Size for Interaction Effects in Factorial ANOVA

Interaction effect sizes in factorial designs require special treatment:

Partial omega squared for interaction:

ωp,AxB2=SSAxBdfAxBMSerrorSStotal+MSerror\omega^2_{p,AxB} = \frac{SS_{AxB} - df_{AxB} \cdot MS_{error}}{SS_{total} + MS_{error}}

Generalised eta squared for the interaction:

ηG,AxB2=SSAxBSSAxB+SSerror+mSSmeasured\eta^2_{G,AxB} = \frac{SS_{AxB}}{SS_{AxB} + SS_{error} + \sum_{m}SS_{measured}}

💡 For interaction effects, always compute and report the simple effects (main effects at each level of the other factor) alongside the overall interaction effect size. The interaction effect size alone does not communicate the direction or pattern of the interaction.

11.5 Rank-Based Effect Sizes for Non-Parametric Tests

When parametric assumptions are violated, rank-based effect sizes should be used:

Wilcoxon Signed-Rank test (paired or one-sample):

rW=Znr_W = \frac{Z}{\sqrt{n}}

Where ZZ is the standardised Wilcoxon test statistic.

Kruskal-Wallis test (non-parametric ANOVA equivalent):

ηH2=HK+1nK\eta^2_H = \frac{H - K + 1}{n - K}

Where HH is the Kruskal-Wallis HH statistic, KK is the number of groups, and nn is the total sample size.

Spearman's ρ\rho (non-parametric correlation, itself an effect size):

ρs=16i=1ndi2n(n21)\rho_s = 1 - \frac{6\sum_{i=1}^n d_i^2}{n(n^2-1)}

Where did_i is the difference between ranks of the ii-th observation on XX and YY.

11.6 Reporting Effect Sizes According to APA and Journal Standards

The APA Publication Manual (7th ed.) and major journals increasingly require reporting effect sizes with confidence intervals for all primary analyses. Best practice:

Minimum reporting requirements:

Example APA-compliant report: "The CBT group showed significantly lower depression scores than the control group (t(118)=4.21t(118) = 4.21, p<.001p < .001, d=0.77d = 0.77, 95% CI [0.40, 1.14]), indicating a large treatment effect."


12. Worked Examples

Example 1: Cohen's dd and Hedges' gg — CBT vs. Control for Depression

A clinical trial randomises n1=35n_1 = 35 participants to CBT and n2=35n_2 = 35 to a waitlist control. Depression is measured on the PHQ-9 (0–27 scale, lower = less depression).

Summary statistics:

GroupnnMean PHQ-9SD
CBT358.48.44.24.2
Control3513.113.14.84.8

Step 1 — Pooled SD:

spooled=(351)(4.2)2+(351)(4.8)235+352=34(17.64)+34(23.04)68=599.76+783.3668=1383.1268=20.34=4.51s_{pooled} = \sqrt{\frac{(35-1)(4.2)^2 + (35-1)(4.8)^2}{35+35-2}} = \sqrt{\frac{34(17.64) + 34(23.04)}{68}} = \sqrt{\frac{599.76 + 783.36}{68}} = \sqrt{\frac{1383.12}{68}} = \sqrt{20.34} = 4.51

Step 2 — Cohen's dd:

d=8.413.14.51=4.74.51=1.042d = \frac{8.4 - 13.1}{4.51} = \frac{-4.7}{4.51} = -1.042

The negative sign indicates CBT has lower (better) depression scores. By convention, report the absolute value with direction: d=1.042\lvert d \rvert = 1.042.

Step 3 — Hedges' gg (bias correction):

ν=35+352=68\nu = 35 + 35 - 2 = 68

J=134(68)1=13271=10.0111=0.9889J = 1 - \frac{3}{4(68) - 1} = 1 - \frac{3}{271} = 1 - 0.0111 = 0.9889

g=1.042×0.9889=1.031g = 1.042 \times 0.9889 = 1.031

(Minimal correction since nn is moderate.)

Step 4 — Approximate 95% CI:

SEd=35+3535×35+1.04222(35+352)=701225+1.086136=0.0571+0.0080=0.0651=0.255SE_d = \sqrt{\frac{35+35}{35 \times 35} + \frac{1.042^2}{2(35+35-2)}} = \sqrt{\frac{70}{1225} + \frac{1.086}{136}} = \sqrt{0.0571 + 0.0080} = \sqrt{0.0651} = 0.255

95% CI:1.042±1.96(0.255)=[0.542,1.542]95\% \text{ CI}: 1.042 \pm 1.96(0.255) = [0.542, 1.542]

Step 5 — Common Language Effect Size:

CL=Φ(1.0422)=Φ(0.737)=0.770CL = \Phi\left(\frac{1.042}{\sqrt{2}}\right) = \Phi(0.737) = 0.770

77.0% of CBT participants score lower (better) than the average control participant.

Step 6 — U3U_3 Statistic:

U3=Φ(1.042)=0.851U_3 = \Phi(1.042) = 0.851

85.1% of CBT participants have PHQ-9 scores below the mean of the control group.

Summary:

StatisticValueInterpretation
Cohen's dd1.0421.042Large effect (Cohen's benchmark: large 0.80\geq 0.80)
Hedges' gg1.0311.031Large effect (negligible bias correction)
95% CI for dd[0.542,1.542][0.542, 1.542]Entirely above zero — significant effect
CL77.0%77.0\%77% of CBT patients score better than average control
U3U_385.1%85.1\%85% of CBT patients below the control mean

Conclusion: CBT produced a large, statistically significant reduction in depression compared to waitlist control (d=1.04d = 1.04, 95% CI [0.54, 1.54]). The effect size indicates that approximately 85% of CBT participants had depression scores below the average control participant. This is a clinically meaningful and large treatment effect.


Example 2: ω2\omega^2 and η2\eta^2 — Effect of Teaching Method on Exam Scores

An educational researcher tests three teaching methods (Lecture, Flipped Classroom, Project- Based) on exam performance (YY, %). Total sample: n=90n = 90 (30 per group).

ANOVA table:

SourceSSdfMSFFpp
Between (Method)2840214208.94<.001< .001
Within (Error)1380087158.6
Total1664089

Step 1 — Eta squared:

η2=SSbetweenSStotal=284016640=0.171\eta^2 = \frac{SS_{between}}{SS_{total}} = \frac{2840}{16640} = 0.171

Step 2 — Omega squared (bias-corrected):

ω2=SSbetween(K1)MSwithinSStotal+MSwithin=2840(2)(158.6)16640+158.6=2840317.216798.6=2522.816798.6=0.150\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}} = \frac{2840 - (2)(158.6)}{16640 + 158.6} = \frac{2840 - 317.2}{16798.6} = \frac{2522.8}{16798.6} = 0.150

Step 3 — Epsilon squared:

ε2=SSbetween(K1)MSwithinSStotal=2840317.216640=2522.816640=0.152\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}} = \frac{2840 - 317.2}{16640} = \frac{2522.8}{16640} = 0.152

Step 4 — Cohen's ff:

f=ω21ω2=0.1500.850=0.176=0.420f = \sqrt{\frac{\omega^2}{1-\omega^2}} = \sqrt{\frac{0.150}{0.850}} = \sqrt{0.176} = 0.420

Step 5 — 95% CI for ω2\omega^2 (via non-central FF):

Using the non-central FF approach with Fobs=8.94F_{obs} = 8.94, df1=2df_1 = 2, df2=87df_2 = 87:

Non-centrality parameter λ^=F×df1=8.94×2=17.88\hat{\lambda} = F \times df_1 = 8.94 \times 2 = 17.88

95% CI for λ\lambda: [7.20,32.18][7.20, 32.18] (numerical)

Converting: ωL2=7.20/(7.20+90)=0.074\omega^2_{L} = 7.20/(7.20+90) = 0.074, ωU2=32.18/(32.18+90)=0.263\omega^2_U = 32.18/(32.18+90) = 0.263

ω2=0.150\omega^2 = 0.150, 95% CI [0.074,0.263][0.074, 0.263]

Comparison of estimates:

MeasureValueBenchmark (Cohen)Label
η2\eta^20.1710.171Large (0.14\geq 0.14)Large (biased)
ω2\omega^20.1500.150Large (0.14\geq 0.14)Large (unbiased)
ε2\varepsilon^20.1520.152Large (0.14\geq 0.14)Large (unbiased)
Cohen's ff0.4200.420Large (0.40\geq 0.40)Large

Conclusion: Teaching method has a large effect on exam performance (ω2=0.150\omega^2 = 0.150, 95% CI [0.074, 0.263]). The bias-corrected estimate (ω2=0.150\omega^2 = 0.150) is slightly smaller than η2=0.171\eta^2 = 0.171, as expected. Approximately 15% of the variance in exam scores is attributable to teaching method. Note that η2=0.171\eta^2 = 0.171 slightly overestimates the true population effect due to sampling bias, illustrating why ω2\omega^2 is preferred.


Example 3: Odds Ratio, Risk Ratio, and NNT — Vaccine Effectiveness

A clinical trial evaluates a new vaccine. Among n1=1,000n_1 = 1{,}000 vaccinated participants, a=20a = 20 develop the disease. Among n2=1,000n_2 = 1{,}000 control participants, c=80c = 80 develop the disease.

2×22 \times 2 Table:

DiseaseNo DiseaseTotal
Vaccinated209801000
Control809201000

Step 1 — Risks:

p1=201000=0.020p_1 = \frac{20}{1000} = 0.020 (vaccinated)

p2=801000=0.080p_2 = \frac{80}{1000} = 0.080 (control)

Step 2 — Absolute Risk Difference:

ARD=p1p2=0.0200.080=0.060\text{ARD} = p_1 - p_2 = 0.020 - 0.080 = -0.060

Vaccination reduces disease risk by 6 percentage points.

Step 3 — Risk Ratio:

RR=0.0200.080=0.25\text{RR} = \frac{0.020}{0.080} = 0.25

Vaccinated participants have 25% the risk of unvaccinated (a 75% risk reduction).

Step 4 — Odds Ratio:

OR=20×92080×980=1840078400=0.235\text{OR} = \frac{20 \times 920}{80 \times 980} = \frac{18400}{78400} = 0.235

ln(OR)=ln(0.235)=1.449\ln(\text{OR}) = \ln(0.235) = -1.449

SEln(OR)=120+1980+180+1920=0.0500+0.0010+0.0125+0.0011=0.0646=0.254SE_{\ln(OR)} = \sqrt{\frac{1}{20} + \frac{1}{980} + \frac{1}{80} + \frac{1}{920}} = \sqrt{0.0500 + 0.0010 + 0.0125 + 0.0011} = \sqrt{0.0646} = 0.254

95% CI for OR:

e1.449±1.96(0.254)=e[1.947,0.951]=[0.143,0.387]e^{-1.449 \pm 1.96(0.254)} = e^{[-1.947, -0.951]} = [0.143, 0.387]

Step 5 — NNT:

NNT=10.060=10.060=16.717\text{NNT} = \frac{1}{\lvert -0.060 \rvert} = \frac{1}{0.060} = 16.7 \approx 17

Step 6 — Vaccine Effectiveness (VE):

VE=(1RR)×100%=(10.25)×100%=75%\text{VE} = (1 - \text{RR}) \times 100\% = (1 - 0.25) \times 100\% = 75\%

SEARD=0.02(0.98)1000+0.08(0.92)1000=0.0000196+0.0000736=0.0000932=0.00965SE_{\text{ARD}} = \sqrt{\frac{0.02(0.98)}{1000} + \frac{0.08(0.92)}{1000}} = \sqrt{0.0000196 + 0.0000736} = \sqrt{0.0000932} = 0.00965

95% CI for ARD: 0.060±1.96(0.00965)=[0.079,0.041]-0.060 \pm 1.96(0.00965) = [-0.079, -0.041]

95% CI for NNT: [1/0.079,1/0.041]=[12.7,24.4][13,25][1/0.079, 1/0.041] = [12.7, 24.4] \approx [13, 25]

Summary:

Effect SizeValue95% CIInterpretation
ARD0.060-0.060[0.079,0.041][-0.079, -0.041]Vaccine reduces risk by 6%
Risk Ratio0.2500.250[0.161,0.389][0.161, 0.389]75% risk reduction
Odds Ratio0.2350.235[0.143,0.387][0.143, 0.387]Significantly protective
NNT1717[13,25][13, 25]17 vaccinated to prevent 1 case
Vaccine Effectiveness75%75\%[61%,84%][61\%, 84\%]High effectiveness

Conclusion: The vaccine is highly effective, with a risk ratio of 0.25 (75% risk reduction) and an NNT of 17 (13–25). For every 17 people vaccinated, one additional case of disease is prevented compared to no vaccination. All three complementary effect sizes (ARD, RR, NNT) consistently demonstrate a clinically important and statistically significant protective effect of the vaccine.


Example 4: Cramér's VV — Association Between Study Method and Grade

A researcher surveys n=300n = 300 students on their primary study method (Flashcards, Practice Tests, Re-Reading) and their final grade (A/B, C, D/F). The chi-squared test yields χ2(4)=22.8\chi^2(4) = 22.8, p<.001p < .001.

Cramér's VV:

V=χ2nmin(r1,c1)=22.8300×min(31,31)=22.8300×2=22.8600=0.038=0.195V = \sqrt{\frac{\chi^2}{n \cdot \min(r-1, c-1)}} = \sqrt{\frac{22.8}{300 \times \min(3-1, 3-1)}} = \sqrt{\frac{22.8}{300 \times 2}} = \sqrt{\frac{22.8}{600}} = \sqrt{0.038} = 0.195

95% CI for VV (using non-central χ2\chi^2 approach):

Non-centrality parameter λ^χ2=χ2df=22.84=18.8\hat{\lambda}_{\chi^2} = \chi^2 - df = 22.8 - 4 = 18.8

95% CI for λ\lambda: [8.2,33.4][8.2, 33.4] (numerical iteration)

VL=8.2/(300×2)=0.01367=0.117V_L = \sqrt{8.2/(300 \times 2)} = \sqrt{0.01367} = 0.117

VU=33.4/(300×2)=0.05567=0.236V_U = \sqrt{33.4/(300 \times 2)} = \sqrt{0.05567} = 0.236

Benchmark (for min(r1,c1)=2\min(r-1,c-1) = 2): Small = 0.07, Medium = 0.21, Large = 0.35.

V=0.195V = 0.195 falls just below the medium threshold.

Conclusion: There is a small-to-medium association between study method and grade (V=0.195V = 0.195, 95% CI [0.117, 0.236], p<.001p < .001). Study method explains approximately V2=0.038V^2 = 0.038 (3.8%) of the variance in grade outcomes, indicating a modest but statistically significant relationship. Practice testing and flashcard use appear to produce better grade distributions than re-reading, consistent with retrieval practice research.


13. Common Mistakes and How to Avoid Them

Mistake 1: Conflating Statistical Significance with Effect Size

Problem: Concluding that because p<.05p < .05, the effect is "large" or "important." Conversely, concluding that because p>.05p > .05, the effect is "zero" or "negligible." Statistical significance is entirely about the strength of evidence against H0H_0, not about the magnitude of the effect.
Solution: Always report BOTH the p-value AND the effect size with its CI. A significant result with d=0.08d = 0.08 (tiny effect) and a non-significant result with d=0.60d = 0.60 (large effect, underpowered study) tell very different stories.

Mistake 2: Using η2\eta^2 Instead of ω2\omega^2 for ANOVA Effect Sizes

Problem: η2\eta^2 is systematically biased upward — it always overestimates the true population effect size, especially in small samples with few groups. Many researchers report η2\eta^2 simply because it is the default output of SPSS.
Solution: Always report ω2\omega^2 (or ωp2\omega_p^2 for factorial designs) as the primary ANOVA effect size. Report η2\eta^2 only if explicitly required by a journal, and clearly label it as biased. In many cases, the difference is small but the correct labelling matters.

Mistake 3: Using Cohen's dd When Glass's Δ\Delta is Appropriate

Problem: When group variances differ substantially (variance ratio >4> 4), pooling the standard deviations to compute Cohen's dd produces a denominator that reflects neither group well and leads to a misleading effect size.
Solution: When s1/s2>2s_1/s_2 > 2 (or <0.5< 0.5), report Glass's Δ\Delta (standardising by the control group SD) alongside Cohen's dd. Clearly state which SD was used as the standardiser.

Mistake 4: Reporting Effect Sizes Without Confidence Intervals

Problem: A point estimate of d=0.50d = 0.50 from a study of n=20n = 20 per group has a 95% CI of approximately [0.12, 0.88] — a range spanning from small to large. Reporting only d=0.50d = 0.50 without the CI gives a false sense of precision.
Solution: Always report the 95% CI alongside every effect size. DataStatPro automatically computes exact CIs for all effect sizes using non-central distributions. This is increasingly required by APA and major journals.

Mistake 5: Applying Cohen's Benchmarks Without Context

Problem: Mechanically classifying d=0.21d = 0.21 as "small" based on Cohen's benchmarks regardless of the research context. In some fields (e.g., cognitive neuroscience or social psychology in field settings), d=0.21d = 0.21 is a large, practically important effect.
Solution: Use Cohen's benchmarks only as a last resort. Prioritise domain-specific benchmarks, compare to average effect sizes in your field (e.g., from meta-analyses), and consider the practical or clinical implications of the effect size given the context.

Mistake 6: Interpreting the OR as the RR

Problem: When the event is common (p>0.10p > 0.10), the Odds Ratio is numerically larger (more extreme) than the Risk Ratio. For example, if p1=0.40p_1 = 0.40 and p2=0.20p_2 = 0.20, then RR = 2.0 but OR = 2.67. Reporting "the odds of the event are 2.67 times higher" and implying that "the risk is 2.67 times higher" substantially overstates the effect.
Solution: Always report the RR (not OR) when the outcome is common (p>0.10p > 0.10) and absolute probabilities are estimable (prospective study). Clearly distinguish between "odds" (OR) and "risk" (RR) in all reporting. Always accompany OR with the ARD for context.

Mistake 7: Computing Paired dd as Independent Samples dd

Problem: Using the independent samples formula (with pooled SD) for paired or repeated measures data ignores the correlation between the two measurements, dramatically underestimating the true within-person effect size (because spooleds_{pooled} includes between- person variability, whereas sds_d does not).
Solution: For paired designs, always use dpaired=dˉ/sdd_{paired} = \bar{d}/s_d where dˉ\bar{d} is the mean of the difference scores and sds_d is the SD of the difference scores. The paired dd will typically be larger than the independent dd for the same data when the pre-post correlation is positive.

Mistake 8: Reporting ηp2\eta_p^2 Values as Proportions of "Total Variance"

Problem: Partial eta squared (ηp2\eta_p^2) in factorial ANOVA is NOT the proportion of total variance. In a 2×22 \times 2 ANOVA with interaction, the values of ηp2\eta_p^2 for the two main effects and interaction can sum to well over 1.0 — clearly impossible if they were proportions of total variance.
Solution: When reporting ηp2\eta_p^2, state explicitly that it is "the proportion of variance in the DV attributable to this effect after removing variance associated with other effects." Use η2\eta^2 (not ηp2\eta_p^2) if you want to convey what fraction of total variance each effect explains.

Mistake 9: Using the Wrong nn for Computing dd from tt

Problem: When computing dd from a reported tt-statistic, researchers sometimes use the total NN instead of the per-group nn in the formula d=t1/n1+1/n2d = t\sqrt{1/n_1 + 1/n_2}, or confuse the sample sizes when groups are unequal.
Solution: For independent samples: d=t(n1+n2)/(n1n2)d = t\sqrt{(n_1+n_2)/(n_1 n_2)}. For paired or one-sample: dz=t/nd_z = t/\sqrt{n}. Always double-check which tt-test formula was used by the original authors.

Mistake 10: Reporting NNT Without Specifying the Time Horizon and Base Rate

Problem: An NNT of 20 means very different things depending on whether the outcome is "prevent a heart attack over 5 years" vs. "cure a headache in 2 hours." Without specifying the comparison condition (vs. what?), the time horizon, the baseline event rate, and the population, NNT is not interpretable.
Solution: Always specify the NNT with: (1) the comparison condition (treatment vs. placebo/control), (2) the outcome, (3) the time horizon, and (4) the baseline event rate (control group risk). Example: "NNT = 17 (95% CI: 13–25) to prevent one case of disease in vaccinated vs. unvaccinated adults over 12 months, given a baseline risk of 8%."


14. Troubleshooting

ProblemLikely CauseSolution
dd is extremely large (>3.0> 3.0)Data entry error; outlier dominating; very small SDCheck raw data for errors; screen for outliers; verify SD calculation
ω2\omega^2 or ε2\varepsilon^2 is negativeTrue effect is near zero; small sample; MSerror>MSbetween/KMS_{error} > MS_{between}/KNegative ω2\omega^2 should be reported as 0 (convention); increase sample size
ηp2\eta_p^2 values sum to more than 1.0This is expected in factorial ANOVA; ηp2\eta_p^2 is not a proportion of total varianceSwitch to η2\eta^2 or ω2\omega^2 if total-variance proportions are needed
OR and RR give very different conclusionsCommon event (base rate >10%> 10\%) — OR exaggerates relative to RRReport RR (or ARD + NNT) for common outcomes; OR is appropriate for case-control
95% CI for NNT includes infinityARD CI includes zero (non-significant result)Report NNT from each bound of the ARD CI separately; report as NNH for lower bound if negative
dd from test statistic differs from dd from summary statisticsDifferent formula used; unequal group sizesFor unequal nn: use d=t(n1+n2)/(n1n2)d = t\sqrt{(n_1+n_2)/(n_1 n_2)}; verify which formula applies
CL (Common Language) effect size close to 0.50 for large ddPossible — check calculationCL = Φ(d/2)\Phi(d/\sqrt{2}), not Φ(d)\Phi(d); CL of 0.50 corresponds to d=0d=0
Fisher's zz CI for rr extends beyond [1,1][-1, 1]Very small nn or rr close to ±1\pm 1Check that n4n \geq 4; for r=±1r = \pm 1 the CI is degenerate; consider Bayesian credible interval
Cramér's VV is larger than expected for sparse tableSmall sample bias in VVUse bias-corrected Cramér's V~\tilde{V} (Bergsma, 2013)
Paired dd is larger than independent samples dd for same dataThis is expected — paired dd removes between-person varianceBoth are correct but measure different things; report paired dd for paired designs
Power calculation requires larger nn than resources allowEffect size is small or power requirement is highAccept lower power (state this as limitation); use a one-tailed test if directional; consider sequential design
rr and dd conversions give inconsistent resultsUnequal group sizes affecting the conversion formulaUse the exact formula r=d/d2+(n1+n2)2/(n1n2)r = d/\sqrt{d^2 + (n_1+n_2)^2/(n_1 n_2)} not the equal-nn approximation

15. Quick Reference Cheat Sheet

Core Equations

FormulaDescription
d=(xˉ1xˉ2)/spooledd = (\bar{x}_1 - \bar{x}_2)/s_{pooled}Cohen's dd (independent samples)
spooled=[(n11)s12+(n21)s22]/(n1+n22)s_{pooled} = \sqrt{[(n_1-1)s_1^2+(n_2-1)s_2^2]/(n_1+n_2-2)}Pooled standard deviation
g=d×(13/(4ν1))g = d \times (1 - 3/(4\nu-1))Hedges' gg (bias-corrected dd)
Δ=(xˉ1xˉ2)/scontrol\Delta = (\bar{x}_1 - \bar{x}_2)/s_{control}Glass's Δ\Delta
dpaired=dˉ/sdd_{paired} = \bar{d}/s_dCohen's dd for paired designs
SEd(n1+n2)/(n1n2)+d2/(2(n1+n2))SE_d \approx \sqrt{(n_1+n_2)/(n_1 n_2) + d^2/(2(n_1+n_2))}SE of Cohen's dd
CL=Φ(d/2)CL = \Phi(d/\sqrt{2})Common Language Effect Size
U3=Φ(d)U_3 = \Phi(\lvert d \rvert)Cohen's U3U_3
η2=SSeffect/SStotal\eta^2 = SS_{effect}/SS_{total}Eta squared
ηp2=SSeffect/(SSeffect+SSerror)\eta_p^2 = SS_{effect}/(SS_{effect}+SS_{error})Partial eta squared
ω2=(SSB(K1)MSW)/(SST+MSW)\omega^2 = (SS_B - (K-1)MS_W)/(SS_T + MS_W)Omega squared (one-way)
ωp2=(SSeffdfeffMSerr)/(SST+MSerr)\omega_p^2 = (SS_{eff} - df_{eff} \cdot MS_{err})/(SS_T + MS_{err})Partial omega squared
f=η2/(1η2)f = \sqrt{\eta^2/(1-\eta^2)}Cohen's ff
f2=R2/(1R2)f^2 = R^2/(1-R^2)Cohen's f2f^2 (global)
flocal2=ΔR2/(1Rfull2)f^2_{local} = \Delta R^2/(1-R^2_{full})Cohen's f2f^2 (local/incremental)
zr=arctanh(r)z_r = \text{arctanh}(r), SEzr=1/n3SE_{z_r} = 1/\sqrt{n-3}Fisher's zz for rr CI
OR=(p1/(1p1))/(p2/(1p2))=ad/bc\text{OR} = (p_1/(1-p_1))/(p_2/(1-p_2)) = ad/bcOdds Ratio
RR=p1/p2\text{RR} = p_1/p_2Risk Ratio
NNT=1/p1p2\text{NNT} = 1/\lvert p_1-p_2 \rvertNumber Needed to Treat
V=χ2/(nmin(r1,c1))V = \sqrt{\chi^2/(n \cdot \min(r-1,c-1))}Cramér's VV
rrb=12U/(n1n2)r_{rb} = 1 - 2U/(n_1 n_2)Rank-biserial correlation

Effect Size Family Selection Guide

TestEffect SizeNotes
One-sample tt-testd=(xˉμ0)/sd = (\bar{x}-\mu_0)/sCompared to known value
Independent tt-testCohen's dd or Hedges' gggg for small nn
Paired tt-testdpaired=dˉ/sdd_{paired} = \bar{d}/s_dUses difference scores
One-way ANOVAω2\omega^2 or ε2\varepsilon^2NOT η2\eta^2 (biased)
Factorial ANOVAωp2\omega_p^2 or ηp2\eta_p^2Partial versions
ANCOVAωp2\omega_p^2 (adjusted)After covariate removal
Simple regressionrr, r2r^2Both informative
Multiple regressionR2R^2, fglobal/local2f^2_{global/local}Report adjusted R2R^2
χ2\chi^2 (2×2)ϕ\phiSame as rr for binary
χ2\chi^2 (r×cr \times c)Cramér's VVUse corrected VV if small nn
Binary, prospectiveARD, RR, NNTAll three recommended
Binary, case-controlORRR not estimable
Mann-Whitney UUrrbr_{rb}Non-parametric
Wilcoxon signed-rankrW=Z/nr_W = Z/\sqrt{n}Non-parametric
Kruskal-WallisηH2\eta^2_HNon-parametric ANOVA

Cohen's Benchmarks (1988) — All Families

Labelddrrη2/ω2\eta^2 / \omega^2fff2f^2VVOR
Small0.200.200.100.100.010.010.100.100.020.020.100.101.221.22
Medium0.500.500.300.300.060.060.250.250.150.150.300.301.651.65
Large0.800.800.500.500.140.140.400.400.350.350.500.502.232.23

Conversion Formulas

FromToFormula
dd (equal nn)rrr=d/d2+4r = d/\sqrt{d^2+4}
rr (equal nn)ddd=2r/1r2d = 2r/\sqrt{1-r^2}
dd (unequal nn)rrr=d/d2+(n1+n2)2/(n1n2)r = d/\sqrt{d^2 + (n_1+n_2)^2/(n_1 n_2)}
ddη2\eta^2 (2 groups)η2=d2/(d2+4)\eta^2 = d^2/(d^2+4)
η2\eta^2dd (2 groups)d=2η2/(1η2)d = 2\sqrt{\eta^2/(1-\eta^2)}
η2\eta^2fff=η2/(1η2)f = \sqrt{\eta^2/(1-\eta^2)}
ffη2\eta^2η2=f2/(1+f2)\eta^2 = f^2/(1+f^2)
ORddd=ln(OR)×3/πln(OR)×0.5513d = \ln(\text{OR}) \times \sqrt{3}/\pi \approx \ln(\text{OR}) \times 0.5513
ddOROR=edπ/3\text{OR} = e^{d \pi/\sqrt{3}}
tt (independent)ddd=t(n1+n2)/(n1n2)d = t\sqrt{(n_1+n_2)/(n_1 n_2)}
tt (paired/one-sample)dddz=t/nd_z = t/\sqrt{n}
FF (2 groups, df1=1df_1=1)ddd=F(1/n1+1/n2)d = \sqrt{F(1/n_1+1/n_2)}
FFη2\eta^2η2=Fdf1/(Fdf1+df2)\eta^2 = F \cdot df_1/(F \cdot df_1 + df_2)

NNT Interpretation Guide

NNTClinical Impact
121 - 2Extraordinary benefit
353 - 5Excellent
6106 - 10Good
115011 - 50Moderate
5110051 - 100Small
>100> 100Minimal
\inftyNo benefit

Required Sample Size for 80% Power (Two-Sided α=.05\alpha = .05)

ddnn per grouprrnn totalω2\omega^2nn total (3 groups)
0.203940.107830.01969
0.351300.201930.04279
0.50640.30840.06159
0.65380.40460.1090
0.80260.50290.1466
1.00170.60190.2536

Effect Size Reporting Checklist

ItemRequired
Point estimate of effect size✅ Always
95% CI for effect size✅ Always
Which specific effect size (e.g., ω2\omega^2 not just "effect size")✅ Always
Which benchmark system used✅ Always
Sample sizes for each group/condition✅ Always
Direction of effect (which group is higher)✅ Always
Whether bias correction was applied (gg vs. dd)✅ When n<30n < 30
ARD + NNT for binary outcomes✅ For clinical/applied
Power analysis or sensitivity analysis✅ For null results
Domain-specific context for benchmark✅ Recommended

This tutorial provides a comprehensive foundation for understanding, computing, and interpreting Effect Sizes using the DataStatPro application. For further reading, consult Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Ellis's "The Essential Guide to Effect Sizes" (2010), Cumming's "Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis" (2012), and Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-Tests and ANOVAs" (Frontiers in Psychology, 2013). For feature requests or support, contact the DataStatPro team.