Knowledge Base / One-Way ANOVA Inferential Statistics 58 min read

One-Way ANOVA

Step-by-step guide to conducting one-way ANOVA using DataStatPro.

One-Way ANOVA: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of variance decomposition all the way through the mathematics, assumptions, effect sizes, post-hoc testing, non-parametric alternatives, interpretation, reporting, and practical usage of the One-Way ANOVA within the DataStatPro application. Whether you are encountering ANOVA for the first time or seeking a rigorous, unified understanding of between-groups inference, this guide builds your knowledge systematically from the ground up.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is a One-Way ANOVA?
  3. The Mathematics Behind One-Way ANOVA
  4. Assumptions of One-Way ANOVA
  5. Variants of One-Way ANOVA
  6. Using the One-Way ANOVA Calculator Component
  7. Full Step-by-Step Procedure
  8. Effect Sizes for One-Way ANOVA
  9. Post-Hoc Tests and Planned Contrasts
  10. Confidence Intervals
  11. Power Analysis and Sample Size Planning
  12. Non-Parametric Alternative: Kruskal-Wallis Test
  13. Advanced Topics
  14. Worked Examples
  15. Common Mistakes and How to Avoid Them
  16. Troubleshooting
  17. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into One-Way ANOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 The Logic of Comparing Groups

When we measure a continuous outcome across three or more independent groups, we ask: "Are the observed differences in group means larger than what we would expect from random sampling alone?" One-Way ANOVA answers this question by comparing two sources of variability:

  1. Between-groups variability: How much do the group means differ from each other? If groups have truly different population means, this variability should be large.
  2. Within-groups variability: How much do observations within each group vary around their own group mean? This reflects pure random (sampling) error.

If the between-groups variability is substantially larger than the within-groups variability, we conclude that the groups differ beyond what chance alone would produce.

1.2 Why Not Multiple t-Tests?

With KK groups, one could run all (K2)=K(K1)/2\binom{K}{2} = K(K-1)/2 pairwise t-tests. However, this inflates the familywise error rate (FWER):

FWER=1(1α)mFWER = 1 - (1-\alpha)^m

Where mm is the number of tests. For K=4K = 4 groups (m=6m = 6 tests) at α=.05\alpha = .05:

FWER=1(0.95)6=.265FWER = 1 - (0.95)^6 = .265

Over 26% chance of at least one false positive. The one-way ANOVA omnibus test maintains the FWER at exactly α\alpha for the simultaneous test that all group means are equal.

1.3 Variance and Its Decomposition

The variance of a dataset measures the average squared deviation from the mean:

s2=i=1n(xixˉ)2n1s^2 = \frac{\sum_{i=1}^n(x_i - \bar{x})^2}{n-1}

The key insight behind ANOVA is that total variance can be partitioned into meaningful components. For a one-way design:

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}

Each sum of squares (SS), when divided by its degrees of freedom, becomes a mean square (MS) — a variance estimate. The ratio of these variance estimates is the F-statistic.

1.4 The F-Distribution

The F-distribution arises from the ratio of two independent chi-squared variates divided by their degrees of freedom. In ANOVA:

F=MSbetweenMSwithinFK1,  NKF = \frac{MS_{between}}{MS_{within}} \sim F_{K-1,\;N-K} under H0H_0

Properties of the F-distribution:

1.5 The Expected Mean Squares

Understanding why the F-ratio works requires the expected values under H0H_0 and H1H_1:

E[MSwithin]=σ2E[MS_{within}] = \sigma^2 (always)

E[MSbetween]=σ2+j=1Knj(μjμ)2K1E[MS_{between}] = \sigma^2 + \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{K-1}

Under H0H_0 (all μj\mu_j equal): E[MSbetween]=σ2E[MS_{between}] = \sigma^2, so E[F]1E[F] \approx 1.

Under H1H_1: The second term is positive, so E[MSbetween]>σ2E[MS_{between}] > \sigma^2, giving E[F]>1E[F] > 1.

The non-centrality parameter:

λ=j=1Knj(μjμ)2σ2\lambda = \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{\sigma^2}

links the true population effect to the power of the test.

1.6 Statistical Significance vs. Practical Significance

Like the t-test, the F-test answers: "Is the result unlikely under H0H_0?" It does not answer: "How large is the effect?"

With very large NN, even trivially small group differences produce significant F-values. A study with N=1,000N = 1{,}000 participants across five groups might find F(4,995)=3.50F(4, 995) = 3.50, p=.008p = .008, while ω2=0.010\omega^2 = 0.010 — a statistically significant but practically negligible effect.

Always report:

  1. The FF-statistic, degrees of freedom, and p-value.
  2. ω2\omega^2 or ε2\varepsilon^2 with 95% CI (practical effect size).
  3. Group means and SDs.
  4. Post-hoc comparisons with individual effect sizes.

1.7 The Relationship Between ANOVA and the t-Test

When K=2K = 2, the one-way ANOVA F-statistic is exactly the square of the independent samples t-statistic:

F1,  N2=tN22F_{1,\;N-2} = t_{N-2}^2

The p-values are identical (both two-tailed). ANOVA generalises the independent samples t-test to K3K \geq 3 groups.

1.8 The Relationship Between ANOVA and Regression

ANOVA is a special case of the General Linear Model (GLM):

Yi=μ+τj+εi,εiN(0,σ2)Y_i = \mu + \tau_j + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, \sigma^2)

Where τj\tau_j is the effect of group jj and jτj=0\sum_j \tau_j = 0 (sum-to-zero constraint). In the regression framework, group membership is represented by dummy or effect-coded predictors. This equivalence is important because:


2. What is a One-Way ANOVA?

2.1 The Core Idea

One-Way Analysis of Variance (ANOVA) is a parametric inferential procedure for testing whether the means of three or more independent groups are simultaneously equal. It is called "one-way" because there is exactly one independent variable (IV) with K3K \geq 3 levels, and "between-subjects" because different participants appear in each group.

The general omnibus null hypothesis:

H0:μ1=μ2==μKH_0: \mu_1 = \mu_2 = \cdots = \mu_K

H1:At least one μj differs from the othersH_1: \text{At least one } \mu_j \text{ differs from the others}

2.2 What One-Way ANOVA Tests and Does Not Test

One-Way ANOVA tells you:

One-Way ANOVA does NOT tell you:

2.3 Design Requirements

For one-way between-subjects ANOVA, the design must satisfy:

2.4 One-Way ANOVA in Context

SituationTest
K=2K = 2 groups, independent, normalIndependent t-test (Welch's)
K3K \geq 3 groups, independent, normal, equal variancesOne-Way ANOVA
K3K \geq 3 groups, independent, normal, unequal variancesWelch's One-Way ANOVA
K3K \geq 3 groups, independent, non-normal or ordinalKruskal-Wallis test
K3K \geq 3 conditions, same participants, normalOne-Way Repeated Measures ANOVA
K3K \geq 3 conditions, same participants, non-normalFriedman test
K3K \geq 3 groups + covariateANCOVA
2\geq 2 IVs, independent groupsFactorial between-subjects ANOVA

2.5 Real-World Applications

FieldExample ApplicationIV (Levels)DV
Clinical PsychologyCBT vs. BA vs. Waitlist on depression3 therapy conditionsPHQ-9
EducationLecture vs. Flipped vs. Project-based on scores3 teaching methodsExam %
Medicine4 drug dosages on blood pressure4 dosesSystolic BP
Marketing5 ad formats on purchase intent5 formatsIntent (0–100)
Neuroscience3 sleep conditions on cognitive performance3 conditionsReaction time
Ecology4 habitats on species richness4 habitat typesSpecies count
HR/OB3 leadership styles on productivity3 stylesUnits/hour
Nutrition5 diets on weight loss5 diet typeskg lost

3. The Mathematics Behind One-Way ANOVA

3.1 Notation

SymbolMeaning
KKNumber of groups
njn_jSample size in group jj
N=j=1KnjN = \sum_{j=1}^K n_jTotal sample size
xijx_{ij}ii-th observation in group jj
xˉj=1njixij\bar{x}_j = \frac{1}{n_j}\sum_i x_{ij}Mean of group jj
xˉ..=1Njixij\bar{x}_{..} = \frac{1}{N}\sum_j\sum_i x_{ij}Grand mean
sj2=1nj1i(xijxˉj)2s_j^2 = \frac{1}{n_j-1}\sum_i(x_{ij}-\bar{x}_j)^2Variance of group jj

3.2 Sum of Squares Decomposition

Total Sum of Squares — total variability in the data:

SStotal=j=1Ki=1nj(xijxˉ..)2SS_{total} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_{..})^2

dftotal=N1df_{total} = N - 1

Between-Groups Sum of Squares — variability due to group membership:

SSbetween=j=1Knj(xˉjxˉ..)2SS_{between} = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2

dfbetween=K1df_{between} = K - 1

Within-Groups Sum of Squares — variability within each group (pure error):

SSwithin=j=1Ki=1nj(xijxˉj)2=j=1K(nj1)sj2SS_{within} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_j)^2 = \sum_{j=1}^K(n_j-1)s_j^2

dfwithin=NKdf_{within} = N - K

Verification: SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}; dftotal=dfbetween+dfwithindf_{total} = df_{between} + df_{within}

3.3 Mean Squares and the F-Ratio

Between-groups mean square:

MSbetween=SSbetweenK1MS_{between} = \frac{SS_{between}}{K-1}

Within-groups mean square (pooled error variance):

MSwithin=SSwithinNKMS_{within} = \frac{SS_{within}}{N-K}

Note: MSwithinMS_{within} is the pooled estimate of the common population variance σ2\sigma^2, assuming homogeneity of variance across groups.

The F-statistic:

F=MSbetweenMSwithinF = \frac{MS_{between}}{MS_{within}}

Under H0H_0: FFK1,  NKF \sim F_{K-1,\;N-K}

p-value:

p=P(FK1,  NKFobs)p = P(F_{K-1,\;N-K} \geq F_{obs})

3.4 The ANOVA Source Table

SourceSSdfdfMSFFpp
Between groupsSSBSS_BK1K-1MSB=SSB/(K1)MS_B = SS_B/(K-1)MSB/MSWMS_B/MS_WP(FFobs)P(F \geq F_{obs})
Within groups (Error)SSWSS_WNKN-KMSW=SSW/(NK)MS_W = SS_W/(N-K)
TotalSSTSS_TN1N-1

3.5 Computing the Grand Mean and Group Means

Grand mean (weighted by group sizes):

xˉ..=j=1KnjxˉjN=j=1Ki=1njxijN\bar{x}_{..} = \frac{\sum_{j=1}^K n_j\bar{x}_j}{N} = \frac{\sum_{j=1}^K\sum_{i=1}^{n_j} x_{ij}}{N}

For balanced designs (equal nj=nn_j = n):

xˉ..=1Kj=1Kxˉj\bar{x}_{..} = \frac{1}{K}\sum_{j=1}^K \bar{x}_j

3.6 The Pooled Standard Deviation

The pooled within-groups standard deviation spooleds_{pooled} is used for computing effect sizes for pairwise comparisons:

spooled=MSwithin=j=1K(nj1)sj2NKs_{pooled} = \sqrt{MS_{within}} = \sqrt{\frac{\sum_{j=1}^K(n_j-1)s_j^2}{N-K}}

This is a weighted average of the group standard deviations, using degrees of freedom as weights.

3.7 Computing SS from Summary Statistics

When only group means, SDs, and njn_j are available:

SSbetween=j=1Knj(xˉjxˉ..)2SS_{between} = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2

SSwithin=j=1K(nj1)sj2SS_{within} = \sum_{j=1}^K(n_j-1)s_j^2

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}

3.8 Computing ω2\omega^2 and η2\eta^2 from F

When only the ANOVA table is reported (useful for computing effect sizes from published results):

Eta squared from F:

η2=FdfBFdfB+dfW\eta^2 = \frac{F \cdot df_B}{F \cdot df_B + df_W}

Omega squared from F (approximate):

ω2(F1)dfBFdfB+dfW+1\omega^2 \approx \frac{(F-1) \cdot df_B}{F \cdot df_B + df_W + 1}

Exact omega squared from SS (preferred):

ω2=SSB(K1)MSWSST+MSW\omega^2 = \frac{SS_B - (K-1)MS_W}{SS_T + MS_W}

3.9 The Non-Central F-Distribution and Exact CIs

Under H1H_1, the F-statistic follows a non-central F-distribution with non-centrality parameter λ\lambda:

λ=j=1Knj(μjμ)2σ2=η2N1η2\lambda = \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{\sigma^2} = \frac{\eta^2 \cdot N}{1-\eta^2}

Exact 95% CI for ω2\omega^2 (via non-central F):

Find λL\lambda_L and λU\lambda_U such that:

P(FK1,  NK(λL)Fobs)=0.025P(F_{K-1,\;N-K}(\lambda_L) \geq F_{obs}) = 0.025

P(FK1,  NK(λU)Fobs)=0.025P(F_{K-1,\;N-K}(\lambda_U) \leq F_{obs}) = 0.025

Then convert to η2\eta^2: ηL2=λL/(λL+N)\eta^2_L = \lambda_L/(\lambda_L+N), ηU2=λU/(λU+N)\eta^2_U = \lambda_U/(\lambda_U+N)

And then to ω2\omega^2 using the bias correction. DataStatPro performs this numerical computation automatically.


4. Assumptions of One-Way ANOVA

4.1 Normality of Residuals (Within-Group Normality)

One-Way ANOVA assumes that within each population, the observations are normally distributed. Equivalently, the residuals eij=xijxˉje_{ij} = x_{ij} - \bar{x}_j should be normally distributed.

How to check:

Robustness: ANOVA is remarkably robust to mild non-normality, especially when:

When violated: Use the Kruskal-Wallis test as a non-parametric alternative (Section 12). Consider Box-Cox data transformations (log, square root) for right-skewed data. Report trimmed mean ANOVA for heavy-tailed distributions.

4.2 Homogeneity of Variance (Homoscedasticity)

One-Way ANOVA assumes all KK population variances are equal:

σ12=σ22==σK2\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_K^2

This assumption is required for MSwithinMS_{within} to serve as a valid pooled estimate of the common error variance σ2\sigma^2.

How to check:

Robustness: ANOVA is relatively robust to heteroscedasticity when:

When njn_j are unequal AND variances are unequal, ANOVA Type I error can be severely inflated (or deflated depending on the pattern).

When violated: Use Welch's one-way ANOVA (Section 5), which does not assume equal variances. Follow with Games-Howell pairwise comparisons.

4.3 Independence of Observations

All observations must be independent of each other, both within and across groups. This is a design assumption — it cannot be tested statistically from the data.

Common violations:

When violated: Use multilevel models (participants nested within clusters), repeated measures ANOVA (repeated observations within participants), or time-series methods.

4.4 Interval Scale of Measurement

The dependent variable must be measured on at least an interval scale — equal spacing between values. Difference scores must be meaningful.

When violated: Use the Kruskal-Wallis test (for ordinal data), or ordinal regression for ordered categorical outcomes.

4.5 Absence of Influential Outliers

Extreme outliers distort both SSbetweenSS_{between} and SSwithinSS_{within}, producing unreliable F-statistics.

How to check:

When outliers present: Investigate the cause (data entry error? legitimate extreme score?). Report analyses with and without outliers. Consider the Kruskal-Wallis test or trimmed mean ANOVA as robust alternatives.

4.6 Balanced vs. Unbalanced Designs

While equal group sizes (njn_j all equal, balanced design) are not formally required, they are strongly preferred because:

Unbalanced designs (unequal njn_j) are common in observational research. They require careful attention to variance heterogeneity and post-hoc test selection.

4.7 Assumption Summary Table

AssumptionDescriptionHow to CheckRemedy if Violated
NormalityResiduals N(0,σ2)\sim \mathcal{N}(0,\sigma^2) within groupsShapiro-Wilk, Q-Q plotKruskal-Wallis; transform
Homoscedasticityσ12==σK2\sigma_1^2 = \cdots = \sigma_K^2Levene's, Brown-ForsytheWelch's ANOVA + Games-Howell
IndependenceObservations independent within and across groupsDesign reviewMultilevel model
Interval scaleDV has equal-interval propertiesMeasurement theoryKruskal-Wallis
No outliersNo extreme influential valuesBoxplots, standardised residualsInvestigate; Kruskal-Wallis

5. Variants of One-Way ANOVA

5.1 Standard One-Way ANOVA (Student's F-test)

The default one-way ANOVA assuming equal variances across groups. Uses the pooled MSwithinMS_{within} as the error term. This is appropriate when Levene's test is non-significant AND group sizes are approximately equal.

5.2 Welch's One-Way ANOVA

Welch's F-test (1951) is the recommended default for one-way between-subjects ANOVA. It does not assume homogeneity of variance. The statistic:

FW=j=1Kwj(xˉjx~)2/(K1)1+2(K2)K21j=1K(1wj/wj)2nj1F_W = \frac{\sum_{j=1}^K w_j(\bar{x}_j - \tilde{x})^2/(K-1)}{1 + \frac{2(K-2)}{K^2-1}\sum_{j=1}^K\frac{(1-w_j/\sum w_j)^2}{n_j-1}}

Where wj=nj/sj2w_j = n_j/s_j^2 and x~=wjxˉj/wj\tilde{x} = \sum w_j\bar{x}_j/\sum w_j (weighted grand mean).

Welch-Satterthwaite df:

νW=K213j=1K(1wj/wj)2nj1\nu_W = \frac{K^2-1}{3\sum_{j=1}^K\frac{(1-w_j/\sum w_j)^2}{n_j-1}}

Post-hoc: Use Games-Howell pairwise comparisons when Welch's ANOVA is significant.

💡 Just as Welch's t-test is the recommended default for two groups, Welch's one-way ANOVA is increasingly recommended as the default for K3K \geq 3 independent groups. The power loss when variances are truly equal is negligible, while the Type I error protection when variances differ is substantial. DataStatPro reports both standard and Welch's ANOVA by default.

5.3 Trimmed Mean ANOVA (Robust)

Trimmed mean ANOVA (Wilcox, 2017) replaces standard means with α\alpha-trimmed means (e.g., 20% trimming from each tail). It is substantially more powerful than the Kruskal-Wallis test for symmetric heavy-tailed distributions while controlling Type I error under non-normality. Available in DataStatPro under "Robust ANOVA."

5.4 Brown-Forsythe F-test

An alternative to Welch's ANOVA that is more robust to certain distributional departures. Uses median-centred deviations for variance estimation. DataStatPro provides this as an additional output alongside Welch's F.

5.5 Choosing Between Variants

ConditionRecommended Test
Normal data, equal variances, balancedStandard ANOVA (or Welch's — nearly identical)
Normal data, unequal variances OR unequal njn_jWelch's ANOVA (recommended default)
Mildly non-normal, large nj20n_j \geq 20Either standard or Welch's
Non-normal, small njn_jKruskal-Wallis or Trimmed Mean ANOVA
Severely non-normal with outliersKruskal-Wallis
Ordinal DVKruskal-Wallis

6. Using the One-Way ANOVA Calculator Component

The One-Way ANOVA Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting one-way ANOVA and its alternatives.

Step-by-Step Guide

Step 1 — Select "One-Way Between-Subjects ANOVA"

From the "Test Type" dropdown, choose:

Step 2 — Input Method

Choose how to provide the data:

Step 3 — Specify Group Labels

Enter descriptive names for each group (e.g., "CBT," "BA," "Waitlist"). These labels appear in all output tables, plots, and the auto-generated APA paragraph.

Step 4 — Select Assumption Checks

DataStatPro automatically runs and displays:

Step 5 — Select Post-Hoc Tests

When the omnibus F is significant, select post-hoc tests:

Step 6 — Select Effect Sizes

Step 7 — Select Display Options

Step 8 — Run the Analysis

Click "Run One-Way ANOVA". DataStatPro will:

  1. Compute the full ANOVA source table.
  2. Run all assumption tests and display colour-coded warnings.
  3. Automatically switch to Welch's ANOVA if Levene's test is significant (when "Auto" selected).
  4. Compute all effect sizes with exact non-central F-based CIs.
  5. Run all selected post-hoc tests with adjusted p-values and individual djkd_{jk}.
  6. Generate all visualisations.
  7. Auto-generate the APA-compliant results paragraph.

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

This section walks through every computational step for one-way ANOVA, from raw data to a complete APA-style conclusion.

Given: KK groups with observations xijx_{ij} for i=1,,nji = 1,\ldots,n_j and j=1,,Kj = 1,\ldots,K. Total N=jnjN = \sum_j n_j.


Step 1 — State the Hypotheses

H0:μ1=μ2==μKH_0: \mu_1 = \mu_2 = \cdots = \mu_K

H1:H_1: At least one pair μjμk\mu_j \neq \mu_k for jkj \neq k

Choose α\alpha (default: .05.05).


Step 2 — Compute Descriptive Statistics per Group

For each group jj:

xˉj=1nji=1njxij\bar{x}_j = \frac{1}{n_j}\sum_{i=1}^{n_j}x_{ij}

sj=i=1nj(xijxˉj)2nj1s_j = \sqrt{\frac{\sum_{i=1}^{n_j}(x_{ij}-\bar{x}_j)^2}{n_j-1}}

SEj=sj/njSE_j = s_j/\sqrt{n_j}


Step 3 — Compute the Grand Mean

xˉ..=j=1KnjxˉjN\bar{x}_{..} = \frac{\sum_{j=1}^K n_j\bar{x}_j}{N}


Step 4 — Check Assumptions

Normality: Run Shapiro-Wilk on residuals eij=xijxˉje_{ij} = x_{ij} - \bar{x}_j. If pSW<.05p_{SW} < .05 and nj<30n_j < 30: consider Kruskal-Wallis.

Homoscedasticity: Run Levene's test. If pLevene<.05p_{Levene} < .05: use Welch's ANOVA.

Outliers: Inspect boxplots and standardised residuals.


Step 5 — Compute Sums of Squares

SSbetween=j=1Knj(xˉjxˉ..)2SS_{between} = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2

SSwithin=j=1K(nj1)sj2SS_{within} = \sum_{j=1}^K(n_j-1)s_j^2

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}

Verification: SStotalSS_{total} can also be computed directly as ji(xijxˉ..)2\sum_j\sum_i(x_{ij}-\bar{x}_{..})^2 — both must agree.


Step 6 — Compute Degrees of Freedom

dfbetween=K1,dfwithin=NK,dftotal=N1df_{between} = K-1, \quad df_{within} = N-K, \quad df_{total} = N-1


Step 7 — Compute Mean Squares

MSbetween=SSbetween/(K1)MS_{between} = SS_{between}/(K-1)

MSwithin=SSwithin/(NK)MS_{within} = SS_{within}/(N-K)


Step 8 — Compute the F-Statistic and p-value

F=MSbetween/MSwithinF = MS_{between}/MS_{within}

p=P(FK1,  NKFobs)p = P(F_{K-1,\;N-K} \geq F_{obs})

Reject H0H_0 if pαp \leq \alpha.


Step 9 — Compute Effect Sizes

Eta squared (biased):

η2=SSbetween/SStotal\eta^2 = SS_{between}/SS_{total}

Omega squared (preferred, bias-corrected):

ω2=SSbetween(K1)MSwithinSStotal+MSwithin\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}}

Epsilon squared (alternative correction):

ε2=SSbetween(K1)MSwithinSStotal\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}}

Cohen's ff:

f=ω2/(1ω2)f = \sqrt{\omega^2/(1-\omega^2)}


Step 10 — Compute 95% CI for ω2\omega^2

Using the non-central F-distribution (computed numerically by DataStatPro).

Non-centrality parameter: λ^=(F1)×dfbetween\hat{\lambda} = (F-1) \times df_{between}

Find λL\lambda_L, λU\lambda_U such that P(FdfB,dfW(λ)Fobs)=0.025P(F_{df_B,df_W}(\lambda) \geq F_{obs}) = 0.025 for each bound.

ωL2=λLdfB(λLdfB)+N\omega^2_L = \frac{\lambda_L - df_B}{(\lambda_L - df_B) + N} (approximate)

ωU2=λUdfB(λUdfB)+N\omega^2_U = \frac{\lambda_U - df_B}{(\lambda_U - df_B) + N} (approximate)


Step 11 — Conduct Post-Hoc Tests (if FF significant)

Select the appropriate post-hoc test (Section 9). Compute pairwise differences, standard errors, adjusted p-values, and individual Cohen's djkd_{jk} for each pair.


Step 12 — Interpret and Report

Combine all results into an APA-compliant report (Section 13.7).


8. Effect Sizes for One-Way ANOVA

8.1 Eta Squared (η2\eta^2) — Common but Biased

η2=SSbetweenSStotal\eta^2 = \frac{SS_{between}}{SS_{total}}

η2\eta^2 is the proportion of total sample variance explained by group membership. It is the most commonly reported ANOVA effect size and appears as default output in SPSS.

Critical limitation: η2\eta^2 is positively biased — it systematically overestimates the true population effect size, especially in small samples and when KK is large relative to NN. The bias magnitude:

Bias=η2ω2(K1)(1η2)N1\text{Bias} = \eta^2 - \omega^2 \approx \frac{(K-1)(1-\eta^2)}{N-1}

For K=3K = 3, N=30N = 30: Bias 2(1η2)/29\approx 2(1-\eta^2)/29 — can be several percentage points. For K=3K = 3, N=300N = 300: Bias 2(1η2)/299\approx 2(1-\eta^2)/299 — negligible.

⚠️ Report η2\eta^2 only when explicitly required by a journal or for historical comparison. Always report ω2\omega^2 (or ε2\varepsilon^2) as the primary effect size and label η2\eta^2 as "biased" in your manuscript.

8.2 Omega Squared (ω2\omega^2) — Preferred

ω2=SSbetween(K1)MSwithinSStotal+MSwithin\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}}

ω2\omega^2 is a bias-corrected estimate of the population proportion of variance explained by the IV. It is the recommended primary effect size for one-way ANOVA.

Properties:

From F-statistic and df (approximate):

ω2(F1)(K1)(F1)(K1)+N\omega^2 \approx \frac{(F-1)(K-1)}{(F-1)(K-1) + N}

8.3 Epsilon Squared (ε2\varepsilon^2) — Alternative Correction

ε2=SSbetween(K1)MSwithinSStotal\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}}

ε2\varepsilon^2 uses the same numerator as ω2\omega^2 but divides by SStotalSS_{total} instead of SStotal+MSwithinSS_{total} + MS_{within}.

Properties:

8.4 Cohen's ff — For Power Analysis

f=η21η2f = \sqrt{\frac{\eta^2}{1-\eta^2}} or f=ω21ω2f = \sqrt{\frac{\omega^2}{1-\omega^2}}

Cohen's ff is used as the effect size input for ANOVA power analysis. It represents the ratio of between-groups SD to within-groups SD.

From group means and σ\sigma (when population parameters are known):

f=σμσf = \frac{\sigma_{\mu}}{\sigma}

Where σμ=jnj(μjμ)2/N\sigma_{\mu} = \sqrt{\sum_j n_j(\mu_j-\mu)^2/N} is the SD of the group means.

Benchmarks: Small = 0.10, Medium = 0.25, Large = 0.40 (Cohen, 1988).

8.5 Comparison of Effect Size Estimates

For a dataset with K=3K = 3, N=45N = 45 (15 per group), and F(2,42)=8.50F(2, 42) = 8.50:

η2=8.50×28.50×2+42=17.0059.00=0.288\eta^2 = \frac{8.50 \times 2}{8.50 \times 2 + 42} = \frac{17.00}{59.00} = 0.288

ω2(8.501)×28.50×2+42+1=15.0060.00=0.250\omega^2 \approx \frac{(8.50-1)\times 2}{8.50\times 2 + 42 + 1} = \frac{15.00}{60.00} = 0.250

ε2=SSB(K1)MSWSST\varepsilon^2 = \frac{SS_B - (K-1)MS_W}{SS_T} (requires full SS values)

The difference η2ω2=0.038\eta^2 - \omega^2 = 0.038 (almost 4 percentage points of overestimation from η2\eta^2) — substantial and worth correcting.

8.6 Effect Sizes for Pairwise Comparisons

After the omnibus F-test, report individual effect sizes for each significant pairwise comparison using MSwithin\sqrt{MS_{within}} as the standardiser:

Cohen's djkd_{jk} (using pooled within-groups SD):

djk=xˉjxˉkMSwithind_{jk} = \frac{\bar{x}_j - \bar{x}_k}{\sqrt{MS_{within}}}

Hedges' gjkg_{jk} (bias-corrected):

gjk=djk×(134(NK)1)g_{jk} = d_{jk} \times \left(1 - \frac{3}{4(N-K)-1}\right)

Using MSwithin\sqrt{MS_{within}} from the full ANOVA model (rather than just the two-group pooled SD) is preferred because it is based on all KK groups and is therefore a more stable estimate of the common population SD.

95% CI for the pairwise mean difference:

(xˉjxˉk)±tα/2,  NK×MSwithin(1nj+1nk)(\bar{x}_j - \bar{x}_k) \pm t_{\alpha/2,\;N-K} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

8.7 Omega Squared vs. Partial Omega Squared

In one-way ANOVA with a single IV:

ω2=ωp2\omega^2 = \omega_p^2 (they are identical for one-factor designs)

η2=ηp2\eta^2 = \eta_p^2 (they are identical for one-factor designs)

The partial and non-partial versions diverge only in factorial (multi-factor) designs.


9. Post-Hoc Tests and Planned Contrasts

9.1 The Need for Post-Hoc Testing

A significant omnibus F-test establishes that at least one group mean differs from the others. Post-hoc tests (also called multiple comparison procedures) identify which specific pairs of groups differ, while controlling the FWER.

The key trade-off: Controlling the FWER requires more conservative critical values, which reduces power for individual comparisons. The choice of post-hoc test involves balancing Type I error control and statistical power.

9.2 Tukey's HSD — Standard Pairwise Comparisons

Tukey's Honestly Significant Difference (HSD) is the most widely used post-hoc test for balanced designs with equal variances. It controls the FWER at exactly α\alpha for all pairwise comparisons simultaneously.

Critical value: The studentised range distribution qK,  NK,  αq_{K,\;N-K,\;\alpha}.

Minimum Significant Difference:

HSD=qK,  NK,  α×MSwithinn\text{HSD} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{n}} (balanced)

For unequal group sizes (Tukey-Kramer):

HSDjk=qK,  NK,  α×MSwithin2(1nj+1nk)\text{HSD}_{jk} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{2}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

Declare groups jj and kk significantly different if xˉjxˉk>HSDjk|\bar{x}_j - \bar{x}_k| > \text{HSD}_{jk}.

95% CI for pairwise difference μjμk\mu_j - \mu_k:

(xˉjxˉk)±qK,  NK,  α/2×MSwithin(1nj+1nk)(\bar{x}_j-\bar{x}_k) \pm q_{K,\;N-K,\;\alpha}/\sqrt{2} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

9.3 Games-Howell — Unequal Variances

When Levene's test is significant or Welch's ANOVA is used, Games-Howell is the recommended post-hoc procedure. It uses separate variance estimates per pair:

tjk=xˉjxˉksj2/nj+sk2/nkt_{jk} = \frac{\bar{x}_j-\bar{x}_k}{\sqrt{s_j^2/n_j+s_k^2/n_k}}

Compared against the studentised range distribution with Welch-Satterthwaite df:

νjk=(sj2/nj+sk2/nk)2(sj2/nj)2/(nj1)+(sk2/nk)2/(nk1)\nu_{jk} = \frac{(s_j^2/n_j+s_k^2/n_k)^2}{(s_j^2/n_j)^2/(n_j-1)+(s_k^2/n_k)^2/(n_k-1)}

9.4 Bonferroni and Holm-Bonferroni Corrections

Bonferroni correction (simplest, most conservative):

Compare each pairwise p-value to α=α/m\alpha^* = \alpha/m where m=K(K1)/2m = K(K-1)/2.

Holm-Bonferroni sequential procedure (less conservative, same FWER control):

  1. Sort the mm p-values: p(1)p(2)p(m)p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(m)}.
  2. Compare p(i)p_{(i)} to α/(mi+1)\alpha/(m-i+1).
  3. Reject all H0(i)H_{0(i)} for which p(j)α/(mj+1)p_{(j)} \leq \alpha/(m-j+1) for all jij \leq i.

Holm-Bonferroni is uniformly more powerful than Bonferroni and should be preferred in all cases.

9.5 Dunnett's Test — All Groups vs. One Control

When comparing K1K-1 experimental groups to a single control group (and not making comparisons among experimental groups), Dunnett's test provides optimal power while controlling FWER.

m=K1m = K-1 comparisons (each experimental group vs. control)

Compared against Dunnett's distribution (not the studentised range) with parameters (K1,NK)(K-1, N-K).

9.6 Planned Contrasts — A Priori Comparisons

Planned contrasts are specific, theoretically motivated comparisons formulated before data collection. They are more powerful than post-hoc tests because:

Contrast specification: A contrast is a linear combination ψ=j=1Kcjμj\psi = \sum_{j=1}^K c_j\mu_j with the constraint j=1Kcj=0\sum_{j=1}^K c_j = 0.

Contrast SS and F:

SSψ=(jcjxˉj)2jcj2/njSS_\psi = \frac{\left(\sum_j c_j\bar{x}_j\right)^2}{\sum_j c_j^2/n_j}

Fψ=SSψ/MSwithinF_\psi = SS_\psi/MS_{within}, compared to F1,  NKF_{1,\;N-K}

Orthogonality condition (two contrasts cc and cc' are orthogonal if):

j=1Kcjcjnj=0\sum_{j=1}^K \frac{c_j c_j'}{n_j} = 0

A set of K1K-1 mutually orthogonal contrasts fully partitions SSbetweenSS_{between}:

ψ=1K1SSψ=SSbetween\sum_{\psi=1}^{K-1} SS_\psi = SS_{between}

Example for K=4K = 4 groups (Control, Drug A, Drug B, Drug C):

Contrastc1c_1c2c_2c3c_3c4c_4Comparison
ψ1\psi_1331-11-11-1Control vs. all drugs
ψ2\psi_200221-11-1Drug A vs. Drugs B and C
ψ3\psi_30000111-1Drug B vs. Drug C

These three contrasts are mutually orthogonal (for equal njn_j) and decompose SSbetweenSS_{between} into three orthogonal components — no FWER correction needed.


10. Confidence Intervals

10.1 95% CI for Each Group Mean

The 95% CI for the population mean μj\mu_j:

xˉj±tα/2,  NK×MSwithin/nj\bar{x}_j \pm t_{\alpha/2,\;N-K} \times \sqrt{MS_{within}/n_j}

Note: This CI uses MSwithinMS_{within} from the full ANOVA model (not sjs_j), producing a more stable estimate that borrows strength from all groups (valid under homoscedasticity).

10.2 95% CI for Pairwise Mean Differences

The 95% CI for μjμk\mu_j - \mu_k:

(xˉjxˉk)±tα/2,  NK×MSwithin(1nj+1nk)(\bar{x}_j-\bar{x}_k) \pm t_{\alpha/2,\;N-K} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

Using Tukey-adjusted critical values (simultaneous 95% CI for all pairs):

(xˉjxˉk)±qK,  NK,  α2×MSwithin(1nj+1nk)(\bar{x}_j-\bar{x}_k) \pm \frac{q_{K,\;N-K,\;\alpha}}{\sqrt{2}} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

Tukey-adjusted CIs are wider but simultaneously valid for all (K2)\binom{K}{2} pairs.

10.3 95% CI for ω2\omega^2

The exact CI uses the non-central F-distribution (DataStatPro computes this numerically). The CI communicates the precision of the effect size estimate and is required for complete reporting.

CI width as a function of NN and KK (for ω2=0.10\omega^2 = 0.10):

NN (K=3K=3)Approx. CI Width for ω2\omega^2Precision
300.24Very low
600.17Low
900.14Moderate
1500.11Good
3000.08High
6000.05Very high

⚠️ With only 30 participants (n=10n = 10/group), the 95% CI for ω2=0.10\omega^2 = 0.10 spans approximately [0.00,0.24][0.00, 0.24] — essentially uninformative about the true effect magnitude. Always report the CI alongside the point estimate.


11. Power Analysis and Sample Size Planning

11.1 A Priori Power Analysis

A priori power analysis determines the required sample size before data collection to achieve desired power 1β1-\beta at significance level α\alpha for a hypothesised effect of size ff.

Non-centrality parameter: λ=f2×N=f2×Kn\lambda = f^2 \times N = f^2 \times Kn (balanced design)

Power computation (exact, using non-central F):

Power=P ⁣(FK1,  K(n1)(λ)>Fcrit)\text{Power} = P\!\left(F_{K-1,\;K(n-1)}(\lambda) > F_{crit}\right)

Where Fcrit=Fα,  K1,  K(n1)F_{crit} = F_{\alpha,\;K-1,\;K(n-1)} and λ=f2×Kn\lambda = f^2 \times Kn.

No closed form exists — DataStatPro uses numerical integration of the non-central F-distribution.

Approximate nn per group:

n(z1α/(K1)+z1β)2f2+1n \approx \frac{(z_{1-\alpha/(K-1)} + z_{1-\beta})^2}{f^2} + 1

Required nn per group for 80% power (α=.05\alpha = .05, two-sided):

ffω2\omega^2K=3K=3K=4K=4K=5K=5K=6K=6
0.100.010322274240215
0.150.02214412310796
0.200.03882706155
0.250.05952453935
0.300.08237322825
0.400.13821181614
0.500.20014121110
0.600.26510987

11.2 Determining ff from Prior Literature

When prior studies report η2\eta^2 or ω2\omega^2:

f=ωprior21ωprior2f = \sqrt{\frac{\omega^2_{prior}}{1-\omega^2_{prior}}}

When prior studies report group means and a common SD estimate:

f=SD of group meansestimated within-group SD=j(μjμ)2/Kσf = \frac{\text{SD of group means}}{\text{estimated within-group SD}} = \frac{\sqrt{\sum_j(\mu_j-\mu)^2/K}}{\sigma}

When only a t-statistic from a pilot study with two groups is available:

f=d2f = \frac{|d|}{2} (approximate, for a two-group pilot)

11.3 Sensitivity Analysis

The minimum detectable ff for a given NN, KK, and target power:

fmin(z.975+z.80)2nK7.849Nf_{min} \approx \sqrt{\frac{(z_{.975}+z_{.80})^2}{n \cdot K}} \approx \sqrt{\frac{7.849}{N}}

For total N=90N = 90 (n=30n = 30 per group, K=3K = 3):

fmin7.849/90=0.0872=0.295f_{min} \approx \sqrt{7.849/90} = \sqrt{0.0872} = 0.295

This study can reliably detect only medium-to-large effects (f0.30f \geq 0.30, ω20.082\omega^2 \geq 0.082). Smaller effects may exist but would be missed with 80% power.

⚠️ Report sensitivity analysis for null or inconclusive results. Do not use "observed power" (power computed from the observed effect size) — this is circular and provides no additional information beyond the p-value.

11.4 Planning for Specific Group Contrasts

When the primary research interest is in a specific planned contrast (rather than the omnibus F-test), power analysis should target that contrast:

For a contrast ψ=jcjμj\psi = \sum_j c_j\mu_j with nj=nn_j = n:

λψ=(jcjμj)2σ2jcj2/n\lambda_\psi = \frac{(\sum_j c_j\mu_j)^2}{\sigma^2\sum_j c_j^2/n}

Power for this contrast uses F1,  NKF_{1,\;N-K} and non-centrality λψ\lambda_\psi. Power for planned contrasts is higher than for the omnibus F-test for the same data.


12. Non-Parametric Alternative: Kruskal-Wallis Test

12.1 When to Use the Kruskal-Wallis Test

The Kruskal-Wallis H test is the appropriate alternative to one-way ANOVA when:

12.2 The Kruskal-Wallis Procedure

Step 1 — Rank all observations:

Combine all NN observations and rank from 1 (smallest) to NN (largest). Assign average (mid)ranks for tied values.

Step 2 — Compute rank sums per group:

Wj=i=1njRijW_j = \sum_{i=1}^{n_j}R_{ij} (sum of ranks for group jj)

Step 3 — Compute the H statistic:

H=12N(N+1)j=1KWj2nj3(N+1)H = \frac{12}{N(N+1)}\sum_{j=1}^K\frac{W_j^2}{n_j} - 3(N+1)

Tie correction:

Hc=H1m(tm3tm)/(N3N)H_c = \frac{H}{1 - \sum_m(t_m^3-t_m)/(N^3-N)}

Where tmt_m = number of observations in the mm-th tied group.

Step 4 — p-value:

For small samples with few groups: use exact tables. For nj5n_j \geq 5: HcχK12H_c \sim \chi^2_{K-1} approximately.

p=P(χK12Hc)p = P(\chi^2_{K-1} \geq H_c)

Step 5 — Effect size (ηH2\eta^2_H):

ηH2=HcK+1NK\eta^2_H = \frac{H_c - K + 1}{N - K}

Or equivalently: ηH2=(Hc/(N1))\eta^2_H = (H_c/(N-1))

Cohen's benchmarks apply: small = .01, medium = .06, large = .14.

12.3 Post-Hoc Tests for Kruskal-Wallis

When HH is significant, conduct pairwise comparisons using the Dunn test with Holm-Bonferroni correction:

zjk=RˉjRˉkN(N+1)12(1nj+1nk)z_{jk} = \frac{\bar{R}_j - \bar{R}_k}{\sqrt{\frac{N(N+1)}{12}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}}

Where Rˉj\bar{R}_j and Rˉk\bar{R}_k are the mean ranks for groups jj and kk.

Effect size for each pairwise comparison (rank-biserial rrbr_{rb}):

rrb,jk=2zjknj+nkr_{rb,jk} = \frac{2z_{jk}}{\sqrt{n_j+n_k}}

12.4 Asymptotic Relative Efficiency

For normal data, the Kruskal-Wallis test has ARE =3/π0.955= 3/\pi \approx 0.955 relative to the one-way ANOVA — a negligible efficiency loss. For non-normal data (especially heavy-tailed distributions), the Kruskal-Wallis test can be substantially more powerful than the F-test.


13. Advanced Topics

13.1 ANOVA as a Linear Model

One-way ANOVA is a special case of linear regression with effect-coded (or dummy-coded) predictors. For K=3K = 3 groups using effect coding:

Yi=β0+β1X1i+β2X2i+εiY_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_i

Where:

The FF-statistic for the regression model equals the ANOVA FF-statistic. This equivalence allows ANOVA to be computed using any regression software.

13.2 Trend Analysis for Ordered Groups

When the KK levels represent an ordered quantitative variable (e.g., dose: 0, 10, 20, 40 mg), polynomial trend analysis is more informative than omnibus F and pairwise comparisons. Orthogonal polynomial contrasts test:

Orthogonal polynomial coefficients for K=4K = 4 equally spaced groups:

Trendc1c_1c2c_2c3c_3c4c_4
Linear3-31-11133
Quadratic111-11-111
Cubic1-1333-311

Each trend contrast has df=1df = 1 and jcj2=\sum_j c_j^2 = specified value from tables.

The three trend SS sum to SSbetweenSS_{between}, fully partitioning the between-groups variance.

13.3 Dealing with Unequal Sample Sizes

In unbalanced designs, the grand mean is the weighted (not simple) average of group means. Several practical considerations:

13.4 Bayesian One-Way ANOVA

Bayesian ANOVA (Rouder et al., 2012) computes Bayes Factors comparing models:

BF10=P(dataH1:group effect present)P(dataH0:all groups equal)BF_{10} = \frac{P(\text{data} \mid H_1: \text{group effect present})}{P(\text{data} \mid H_0: \text{all groups equal})}

The prior on standardised group effects under H1H_1 uses a Cauchy distribution with scale r=2/2r = \sqrt{2}/2 (default "medium" effect prior). DataStatPro computes this via the BayesFactor method.

Advantages:

Reporting: "A Bayesian one-way ANOVA (Cauchy prior, r=2/2r = \sqrt{2}/2) provided [strong / moderate / anecdotal] evidence for [the group effect / the null hypothesis], BF10=BF_{10} = [value]."

13.5 Equivalence Testing for ANOVA

To positively establish that group means are negligibly different (equivalence), extend the TOST framework to ANOVA:

Step 1: Specify equivalence bounds Δ\Delta for all pairwise differences (e.g., Δ=0.20×spooled\Delta = 0.20 \times s_{pooled} corresponding to Cohen's d=0.20d = 0.20).

Step 2: For each pair (j,k)(j, k), conduct two one-sided tests:

Step 3: Declare equivalence for pair (j,k)(j, k) when the 90% CI for μjμk\mu_j - \mu_k falls entirely within (Δ,Δ)(-\Delta, \Delta).

Apply Bonferroni correction across all (K2)\binom{K}{2} pairs.

13.6 Robust ANOVA: Trimmed Means

Yuen's trimmed mean F-test (one-way version) uses α\alpha-trimmed means and Winsorised variances for each group:

hj=nj2αnjh_j = n_j - 2\lfloor\alpha n_j\rfloor (effective group size after trimming)

Ftrim=jhj(xˉt,jxˉt,..)2/(K1)jWj/(hj(hj1))F_{trim} = \frac{\sum_j h_j(\bar{x}_{t,j}-\bar{x}_{t,..})^2/(K-1)}{\sum_j W_j/(h_j(h_j-1))}

Where xˉt,j\bar{x}_{t,j} is the 20%-trimmed mean for group jj and WjW_j is the Winsorised sum of squared deviations.

This test is substantially more powerful than Kruskal-Wallis for symmetric heavy-tailed distributions while maintaining nominal Type I error control.

13.7 Reporting One-Way ANOVA According to APA 7th Edition

Full minimum reporting set (APA 7th ed.):

  1. Statement of which test (standard ANOVA or Welch's) and why.
  2. Levene's test result.
  3. F(dfB,dfW)=F(df_B, df_W) = [value], p=p = [value].
  4. ω2=\omega^2 = [value] [95% CI: LB, UB].
  5. Which effect size was computed (ω2\omega^2 not just "effect size").
  6. Group means and SDs for all KK groups.
  7. Post-hoc test results with adjusted p-values and djkd_{jk} per pair.
  8. 95% CI for each significant pairwise mean difference.

14. Worked Examples

Example 1: Therapy Type on Depression — Standard One-Way ANOVA

A clinical researcher randomly assigns N=90N = 90 participants to three therapy conditions: CBT (n1=30n_1 = 30), Behavioural Activation (BA; n2=30n_2 = 30), or Waitlist Control (WL; n3=30n_3 = 30). Post-treatment PHQ-9 depression scores (0–27; lower = less depression) are the dependent variable.

Descriptive statistics:

Groupnjn_jxˉj\bar{x}_jsjs_jSEjSE_j
CBT309.804.200.767
BA3011.404.600.840
WL3016.305.100.931

Assumption checks:

Shapiro-Wilk (residuals): W=0.976W = 0.976, p=.342p = .342 — normality not violated.

Levene's test: F(2,87)=0.82F(2, 87) = 0.82, p=.443p = .443 — homogeneity of variance holds.

→ Standard one-way ANOVA is appropriate.

Step 1 — Grand mean:

xˉ..=(30×9.80+30×11.40+30×16.30)/90=(294+342+489)/90=1125/90=12.500\bar{x}_{..} = (30\times9.80 + 30\times11.40 + 30\times16.30)/90 = (294+342+489)/90 = 1125/90 = 12.500

Step 2 — Sums of squares:

SSbetween=30[(9.8012.50)2+(11.4012.50)2+(16.3012.50)2]SS_{between} = 30[(9.80-12.50)^2 + (11.40-12.50)^2 + (16.30-12.50)^2]

=30[7.290+1.210+14.440]=30×22.940=688.20= 30[7.290 + 1.210 + 14.440] = 30 \times 22.940 = 688.20

SSwithin=(nj1)jsj2=29(17.640)+29(21.160)+29(26.010)SS_{within} = (n_j-1)\sum_j s_j^2 = 29(17.640) + 29(21.160) + 29(26.010)

=511.56+613.64+754.29=1879.49= 511.56 + 613.64 + 754.29 = 1879.49

SStotal=688.20+1879.49=2567.69SS_{total} = 688.20 + 1879.49 = 2567.69

Step 3 — ANOVA source table:

SourceSSdfdfMSFFpp
Between688.20688.2022344.10344.1015.9615.96<.001< .001
Within1879.491879.49878721.6021.60
Total2567.692567.698989

Step 4 — Effect sizes:

η2=688.20/2567.69=0.268\eta^2 = 688.20/2567.69 = 0.268

ω2=(688.202×21.60)/(2567.69+21.60)=645.00/2589.29=0.249\omega^2 = (688.20 - 2\times21.60)/(2567.69+21.60) = 645.00/2589.29 = 0.249

ε2=(688.202×21.60)/2567.69=645.00/2567.69=0.251\varepsilon^2 = (688.20 - 2\times21.60)/2567.69 = 645.00/2567.69 = 0.251

f=0.249/0.751=0.3316=0.576f = \sqrt{0.249/0.751} = \sqrt{0.3316} = 0.576

95% CI for ω2\omega^2 (non-central F, Fobs=15.96F_{obs} = 15.96, df1=2df_1=2, df2=87df_2=87):

λ^=(F1)×dfB=14.96×2=29.92\hat{\lambda} = (F-1)\times df_B = 14.96\times 2 = 29.92

95% CI for λ\lambda: [13.78,50.08][13.78, 50.08] (DataStatPro numerical)

ωL2=0.137\omega^2_L = 0.137; ωU2=0.360\omega^2_U = 0.360

Step 5 — Post-hoc tests (Tukey HSD, balanced):

q3,  87,  .05=3.369q_{3,\;87,\;.05} = 3.369 (studentised range)

HSD=3.369×21.60/30=3.369×0.849=2.860\text{HSD} = 3.369 \times \sqrt{21.60/30} = 3.369 \times 0.849 = 2.860

ComparisonDifferenceSEqqpadjp_{adj}Cohen's djkd_{jk}95% CI
CBT vs. BA1.6001.6000.8490.8491.8841.884.302.3020.3440.344[1.26,4.46][-1.26, 4.46]
CBT vs. WL6.5006.5000.8490.8497.6567.656<.001< .0011.3991.399[3.64,9.36][3.64, 9.36]
BA vs. WL4.9004.9000.8490.8495.7725.772<.001< .0011.0551.055[2.04,7.76][2.04, 7.76]

Where djk=(xˉjxˉk)/MSW=(xˉjxˉk)/21.60=(xˉjxˉk)/4.648d_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_W} = (\bar{x}_j-\bar{x}_k)/\sqrt{21.60} = (\bar{x}_j-\bar{x}_k)/4.648

Interpretation:

Both active therapies (CBT and BA) significantly reduced depression compared to Waitlist Control (p<.001p < .001). CBT and BA did not differ from each other (p=.302p = .302).

APA write-up: "A one-way between-subjects ANOVA examined the effect of therapy type on PHQ-9 depression scores. Levene's test indicated homogeneity of variance (F(2,87)=0.82F(2, 87) = 0.82, p=.443p = .443). The ANOVA revealed a significant effect of therapy type, F(2,87)=15.96F(2, 87) = 15.96, p<.001p < .001, ω2=0.249\omega^2 = 0.249 [95% CI: 0.137, 0.360], indicating a large effect. Tukey HSD post-hoc comparisons revealed that both CBT (M=9.80M = 9.80, SD=4.20SD = 4.20) and Behavioural Activation (M=11.40M = 11.40, SD=4.60SD = 4.60) produced significantly lower depression scores than the Waitlist Control (M=16.30M = 16.30, SD=5.10SD = 5.10), dCBTWL=1.40d_{CBT-WL} = 1.40 [95% CI: 0.78, 2.01] and dBAWL=1.06d_{BA-WL} = 1.06 [95% CI: 0.46, 1.65] (both p<.001p < .001). CBT and BA did not significantly differ, d=0.34d = 0.34 [95% CI: -0.27, 0.95], p=.302p = .302."


Example 2: Welch's One-Way ANOVA — Reaction Time Across Sleep Conditions

A researcher compares simple reaction time (ms) across four sleep conditions: Normal (8h), Mild deprivation (6h), Moderate deprivation (4h), and Severe deprivation (2h), nj=20n_j = 20 per group (N=80N = 80).

Descriptive statistics:

Groupnjn_jxˉj\bar{x}_j (ms)sjs_j (ms)
Normal (8h)20241.318.4
Mild (6h)20268.724.1
Moderate (4h)20312.441.6
Severe (2h)20389.268.3

Assumption checks:

Shapiro-Wilk (residuals): W=0.958W = 0.958, p=.009p = .009 — mild non-normality (but nj=20n_j = 20 per group; CLT provides some protection).

Levene's test: F(3,76)=12.84F(3, 76) = 12.84, p<.001p < .001significant heteroscedasticity.

→ Welch's one-way ANOVA is required.

Welch's F computation:

wj=nj/sj2w_j = n_j/s_j^2: w1=20/338.56=0.0591w_1 = 20/338.56 = 0.0591; w2=20/580.81=0.0344w_2 = 20/580.81 = 0.0344; w3=20/1730.56=0.0116w_3 = 20/1730.56 = 0.0116; w4=20/4664.89=0.0043w_4 = 20/4664.89 = 0.0043

wj=0.0591+0.0344+0.0116+0.0043=0.1094\sum w_j = 0.0591+0.0344+0.0116+0.0043 = 0.1094

x~=(0.0591×241.3+0.0344×268.7+0.0116×312.4+0.0043×389.2)/0.1094\tilde{x} = (0.0591\times241.3+0.0344\times268.7+0.0116\times312.4+0.0043\times389.2)/0.1094

=(14.261+9.243+3.624+1.674)/0.1094=28.802/0.1094=263.3= (14.261+9.243+3.624+1.674)/0.1094 = 28.802/0.1094 = 263.3

FW=jwj(xˉjx~)2/(K1)1+(2(K2)/(K21))j(1wj/wj)2/(nj1)F_W = \frac{\sum_j w_j(\bar{x}_j-\tilde{x})^2/(K-1)}{1+(2(K-2)/(K^2-1))\sum_j(1-w_j/\sum w_j)^2/(n_j-1)}

Numerator: jwj(xˉjx~)2\sum_j w_j(\bar{x}_j-\tilde{x})^2:

=0.0591(241.3263.3)2+0.0344(268.7263.3)2+0.0116(312.4263.3)2+0.0043(389.2263.3)2= 0.0591(241.3-263.3)^2+0.0344(268.7-263.3)^2+0.0116(312.4-263.3)^2+0.0043(389.2-263.3)^2

=0.0591(484)+0.0344(29.16)+0.0116(2410.81)+0.0043(15851.61)= 0.0591(484)+0.0344(29.16)+0.0116(2410.81)+0.0043(15851.61)

=28.604+1.003+27.965+68.162=125.734= 28.604+1.003+27.965+68.162 = 125.734

Numerator /(K1)=125.734/3=41.911/(K-1) = 125.734/3 = 41.911

Denominator correction (computed by DataStatPro): 1.134\approx 1.134

FW=41.911/1.134=36.96F_W = 41.911/1.134 = 36.96; νW40.2\nu_W \approx 40.2

p<.001p < .001

Effect size:

η2=FW×(K1)/(FW×(K1)+νW)=36.96×3/(36.96×3+40.2)=110.88/151.08=0.734\eta^2 = F_W\times(K-1)/(F_W\times(K-1)+\nu_W) = 36.96\times3/(36.96\times3+40.2) = 110.88/151.08 = 0.734

ω2(FW1)(K1)/(FW(K1)+νW+1)=35.96×3/(36.96×3+40.2+1)=107.88/152.08=0.709\omega^2 \approx (F_W-1)(K-1)/(F_W(K-1)+\nu_W+1) = 35.96\times3/(36.96\times3+40.2+1) = 107.88/152.08 = 0.709

Games-Howell post-hoc tests:

ComparisonDiff (ms)SEGHSE_{GH}padjp_{adj}djkd_{jk}
Normal vs. Mild27.427.47.297.29.002.0021.271.27
Normal vs. Mod71.171.111.4311.43<.001< .0012.082.08
Normal vs. Severe147.9147.917.8217.82<.001< .0013.723.72
Mild vs. Mod43.743.712.1812.18.004.0041.281.28
Mild vs. Severe120.5120.518.3918.39<.001< .0012.822.82
Mod vs. Severe76.876.820.2520.25.003.0031.261.26

All six pairwise comparisons are statistically significant, with very large effect sizes. Reaction time increases substantially at each stage of sleep deprivation.

APA write-up: "Levene's test indicated significant heterogeneity of variance (F(3,76)=12.84F(3, 76) = 12.84, p<.001p < .001); therefore, Welch's one-way ANOVA was applied. The test revealed a significant effect of sleep deprivation on reaction time, FW(3,40.2)=36.96F_W(3, 40.2) = 36.96, p<.001p < .001, ω2=0.709\omega^2 = 0.709 [95% CI: 0.581, 0.789], indicating a very large effect. Games-Howell post-hoc comparisons revealed that every level of sleep deprivation produced significantly longer reaction times than all others (all p.004p \leq .004), with effect sizes ranging from large (d=1.26d = 1.26) to very large (d=3.72d = 3.72)."


Example 3: Kruskal-Wallis — Pain Ratings Across Acupuncture Protocols

A researcher compares pain relief (NRS 0–10; ordinal) across five acupuncture protocol variants. Non-normality and ties make one-way ANOVA inappropriate.

nj=12n_j = 12 per group; N=60N = 60; K=5K = 5.

Given: Hc(4)=18.34H_c(4) = 18.34 (tie-corrected Kruskal-Wallis H statistic).

p-value: P(χ4218.34)=.001P(\chi^2_4 \geq 18.34) = .001

Effect size:

ηH2=(18.345+1)/(605)=14.34/55=0.261\eta^2_H = (18.34-5+1)/(60-5) = 14.34/55 = 0.261

Large effect — acupuncture protocol explains approximately 26% of the rank variability.

Dunn post-hoc (Holm-corrected):

After Holm correction, Protocols 1 and 2 differ significantly from Protocols 4 and 5 (rrbr_{rb} ranging from 0.48 to 0.73). Protocols 1 vs. 2 and 4 vs. 5 do not differ significantly.

APA write-up: "Due to ordinal measurement and non-normal distributions, a Kruskal-Wallis test was conducted. There was a significant difference in pain ratings across the five acupuncture protocols, H(4)=18.34H(4) = 18.34, p=.001p = .001, ηH2=0.261\eta^2_H = 0.261 [95% CI: 0.091, 0.421], indicating a large effect. Dunn's pairwise comparisons with Holm correction revealed that Protocols 1 and 2 produced significantly lower pain ratings than Protocols 4 and 5 (all p<.05p < .05, rrbr_{rb} = 0.48–0.73)."


Example 4: Non-Significant Result with Sensitivity Analysis

An educational researcher tests whether three homework formats (Written, Digital, No Homework) affect standardised test scores in nj=25n_j = 25 students per group (N=75N = 75; K=3K = 3).

Results: F(2,72)=1.84F(2, 72) = 1.84, p=.166p = .166, ω2=0.021\omega^2 = 0.021 [95% CI: 0.000, 0.103].

Levene's test: F(2,72)=0.61F(2, 72) = 0.61, p=.547p = .547 — variances equal.

Sensitivity analysis:

fmin=7.849/75=0.1047=0.323f_{min} = \sqrt{7.849/75} = \sqrt{0.1047} = 0.323

Corresponding ωmin2=0.3232/(1+0.3232)=0.104/(1+0.104)=0.094\omega^2_{min} = 0.323^2/(1+0.323^2) = 0.104/(1+0.104) = 0.094

This study had 80% power to detect only medium-to-large effects (ω20.094\omega^2 \geq 0.094). The observed ω2=0.021\omega^2 = 0.021 is a small effect well below this detection threshold.

APA write-up: "A one-way ANOVA revealed no significant effect of homework format on standardised test scores, F(2,72)=1.84F(2, 72) = 1.84, p=.166p = .166, ω2=0.021\omega^2 = 0.021 [95% CI: 0.000, 0.103], indicating a very small and statistically non-significant effect. The study had 80% power to detect effects of f0.32f \geq 0.32 (ω20.094\omega^2 \geq 0.094); effects smaller than this threshold remain undetected. The observed ω2=0.021\omega^2 = 0.021 is below this detection threshold, indicating the study was underpowered for the observed effect size."


15. Common Mistakes and How to Avoid Them

Mistake 1: Reporting η2\eta^2 as the Only Effect Size and Calling It Unbiased

Problem: Reporting η2=0.23\eta^2 = 0.23 as the effect size and implying it represents the true population proportion of variance explained. η2\eta^2 overestimates the population effect, sometimes substantially in small samples with few groups.

Solution: Report ω2\omega^2 (or ε2\varepsilon^2) as the primary effect size. If journals require η2\eta^2 (some do), report it alongside ω2\omega^2 and label η2\eta^2 as a biased estimate. Always compute the 95% CI for ω2\omega^2 using DataStatPro.


Mistake 2: Interpreting the Omnibus F Without Post-Hoc Tests

Problem: Reporting F(3,76)=8.42F(3, 76) = 8.42, p<.001p < .001 and concluding "all groups differ significantly" or "Groups 1 and 4 differ based on their means" without conducting post-hoc tests. The omnibus F tells you only that at least one pair differs.

Solution: Always follow a significant omnibus F with appropriate post-hoc tests or planned contrasts. Report all pairwise comparisons with adjusted p-values and individual effect sizes djkd_{jk}.


Mistake 3: Using Standard ANOVA When Variances Are Unequal

Problem: Running standard ANOVA without checking Levene's test, or ignoring a significant Levene's result, when group sizes are unequal. This produces inflated or deflated Type I error rates and untrustworthy p-values.

Solution: Always run Levene's test before deciding which ANOVA variant to use. When Levene's is significant (especially with unequal njn_j), use Welch's ANOVA with Games-Howell post-hoc tests. Recommend setting DataStatPro to "Auto" mode, which applies Welch's ANOVA automatically when Levene's is significant.


Mistake 4: Running Multiple t-Tests Instead of ANOVA

Problem: Comparing three groups by running three separate pairwise t-tests without correction, inflating FWER to approximately 14%.

Solution: Use one-way ANOVA (or Welch's) for the omnibus test, followed by appropriate post-hoc tests or pre-planned contrasts. If pairwise comparisons are the primary interest, use Tukey HSD or Holm-Bonferroni corrections.


Mistake 5: Not Checking or Reporting Assumption Tests

Problem: Running ANOVA without checking normality and homoscedasticity, and reporting only the F-statistic and p-value without mentioning assumption checks. Readers cannot evaluate the validity of the results.

Solution: Always run and report Levene's test and Shapiro-Wilk (or Q-Q plot inspection). Report these results in the method or results section, and justify the test choice (standard vs. Welch's) based on the assumption check results.


Mistake 6: Using Fisher's LSD Without the Omnibus F Restriction

Problem: Applying Fisher's Least Significant Difference post-hoc test directly as a multiple comparison correction without first confirming the omnibus F is significant. Fisher's LSD does not adequately control FWER when K>3K > 3.

Solution: For K=3K = 3, Fisher's LSD is acceptable after a significant omnibus F (the "protected LSD"). For K4K \geq 4, always use a proper FWER-controlling procedure (Tukey, Holm, Games-Howell). Never report Fisher's LSD without the omnibus F protection.


Mistake 7: Reporting Effect Sizes Without Confidence Intervals

Problem: Reporting ω2=0.15\omega^2 = 0.15 without a CI. With moderate sample sizes, the CI for ω2\omega^2 can be extremely wide, making the point estimate essentially uninformative about the true effect magnitude.

Solution: Always report the 95% CI for ω2\omega^2 (available in DataStatPro via the non-central F-distribution). A point estimate without a CI gives a false sense of precision.


Mistake 8: Applying Post-Hoc Tests When the Omnibus F is Non-Significant

Problem: Running all pairwise post-hoc comparisons regardless of the omnibus F result, and selectively reporting those that happen to be significant. This is p-hacking and inflates the FWER.

Solution: When the omnibus F is non-significant, do not run post-hoc pairwise tests (except for pre-planned contrasts specified before data collection). Report the non-significant omnibus F alongside the effect size and sensitivity analysis, and acknowledge the study's power limitations.


Mistake 9: Confusing "Equal Sample Sizes" with "Equal Variances"

Problem: Assuming that because all groups have equal njn_j, the equal variances assumption is met. Equal sample sizes reduce the consequences of variance heterogeneity but do not eliminate it. Levene's test may still be significant with balanced designs.

Solution: Always run Levene's test regardless of balance. When Levene's is significant, use Welch's ANOVA even for balanced designs (the power loss is negligible).


Mistake 10: Neglecting to Report the Full Descriptive Statistics Table

Problem: Reporting only the ANOVA source table (FF, df, pp) without group means, SDs, and njn_j. Without descriptive statistics, the F-statistic is uninterpretable — readers cannot evaluate the direction, magnitude, or pattern of group differences.

Solution: Always include a descriptive statistics table with njn_j, xˉj\bar{x}_j, SDjSD_j, and SEjSE_j (or 95% CI) for each group. Include a visualisation (raincloud plot or means plot with individual data) whenever possible.


16. Troubleshooting

ProblemLikely CauseSolution
F<1.0F < 1.0MSwithin>MSbetweenMS_{within} > MS_{between}; group means very similarNon-significant result; report ω20\omega^2 \approx 0; inspect group means
ω2\omega^2 or ε2\varepsilon^2 is negativeTrue effect near zero; small sample; correction overshootsReport as 0 (convention); note small effect; increase sample size
η2\eta^2 much larger than ω2\omega^2Small NN with KK groups; large bias correctionThis is expected; always report ω2\omega^2 as primary
Levene's test significant but njn_j are equalUnequal variances exist but balanced design is partially protectiveStill use Welch's ANOVA; equal njn_j reduces but does not eliminate the problem
Post-hoc tests show no significant pairs despite significant FFEffect is spread across many small pairwise differencesReport omnibus and acknowledge no single pair survives correction; consider planned contrasts
Shapiro-Wilk significant with large njn_j (>50> 50)High power of normality test; minor deviations detectedWith large njn_j, CLT protects the t-test; inspect Q-Q for severity; ANOVA likely valid
Games-Howell and Tukey HSD give contradictory resultsHeterogeneous variances affecting the inferenceUse Games-Howell when variances are unequal; report both and note discrepancy
Very large FF with very small ω2\omega^2Very large NN; trivially small differences are statistically significantReport effect size prominently; statistical significance ≠ practical significance
Kruskal-Wallis significant but all Dunn pairwise tests non-significantEffect is distributed; Holm correction too conservativeReport all pairwise zz and rrbr_{rb}; consider reporting without correction for planned pairs
95% CI for ω2\omega^2 includes 0 despite significant FFWide CI due to small NN or KK; possible when pp is marginally significantReport the wide CI; both values are correct; note limited precision
Welch's df is very smallExtreme variance heterogeneity with small njn_jCheck data for errors; if genuine, use permutation ANOVA
One group has nj=1n_j = 1ANOVA cannot estimate sj2s_j^2 from a single observationCollect more data; exclude the singleton group; use a different design
ANOVA gives different result from equivalent regressionCoding scheme issue (dummy vs. effect coding affects only interpretation, not F)Verify coding; F-statistics should match; intercept and slopes will differ by coding
Post-hoc p-values are all exactly 1.0Software error; all group means identicalVerify data; check for data entry errors

17. Quick Reference Cheat Sheet

Core One-Way ANOVA Formulas

FormulaDescription
xˉ..=jnjxˉj/N\bar{x}_{..} = \sum_j n_j\bar{x}_j/NGrand mean (weighted)
SSB=jnj(xˉjxˉ..)2SS_B = \sum_j n_j(\bar{x}_j-\bar{x}_{..})^2Between-groups SS
SSW=j(nj1)sj2SS_W = \sum_j(n_j-1)s_j^2Within-groups SS
SST=SSB+SSWSS_T = SS_B + SS_WTotal SS
dfB=K1df_B = K-1; dfW=NKdf_W = N-K; dfT=N1df_T = N-1Degrees of freedom
MSB=SSB/(K1)MS_B = SS_B/(K-1)Between-groups mean square
MSW=SSW/(NK)MS_W = SS_W/(N-K)Within-groups mean square (error)
F=MSB/MSWF = MS_B/MS_WF-ratio
spooled=MSWs_{pooled} = \sqrt{MS_W}Pooled within-groups SD
p=P(FK1,  NKFobs)p = P(F_{K-1,\;N-K} \geq F_{obs})p-value

Effect Size Formulas

FormulaDescription
η2=SSB/SST\eta^2 = SS_B/SS_TEta squared (biased)
ω2=(SSB(K1)MSW)/(SST+MSW)\omega^2 = (SS_B-(K-1)MS_W)/(SS_T+MS_W)Omega squared (preferred)
ε2=(SSB(K1)MSW)/SST\varepsilon^2 = (SS_B-(K-1)MS_W)/SS_TEpsilon squared (alternative)
f=ω2/(1ω2)f = \sqrt{\omega^2/(1-\omega^2)}Cohen's ff (from ω2\omega^2)
η2=FdfB/(FdfB+dfW)\eta^2 = F\cdot df_B/(F\cdot df_B+df_W)η2\eta^2 from FF
ω2(F1)dfB/(FdfB+dfW+1)\omega^2 \approx (F-1)df_B/(F\cdot df_B+df_W+1)ω2\omega^2 from FF (approx.)
djk=(xˉjxˉk)/MSWd_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_W}Cohen's dd for pairwise
gjk=djk×(13/(4(NK)1))g_{jk} = d_{jk}\times(1-3/(4(N-K)-1))Hedges' gg for pairwise

Welch's ANOVA Formulas

FormulaDescription
wj=nj/sj2w_j = n_j/s_j^2Weight for group jj
x~=wjxˉj/wj\tilde{x} = \sum w_j\bar{x}_j/\sum w_jWeighted grand mean
FW=(weighted SS)/(K1)/correctionF_W = \text{(weighted SS)}/(K-1)/\text{correction}Welch's F (see Section 5.2)
νW=(K21)/(3(1wj/wj)2/(nj1))\nu_W = (K^2-1)/(3\sum(1-w_j/\sum w_j)^2/(n_j-1))Welch-Satterthwaite df

Kruskal-Wallis Formulas

FormulaDescription
Wj=i=1njRijW_j = \sum_{i=1}^{n_j} R_{ij}Rank sum for group jj
H=12N(N+1)jWj2/nj3(N+1)H = \frac{12}{N(N+1)}\sum_j W_j^2/n_j - 3(N+1)Kruskal-Wallis HH
Hc=H/(1m(tm3tm)/(N3N))H_c = H/(1-\sum_m(t_m^3-t_m)/(N^3-N))Tie-corrected HH
ηH2=(HcK+1)/(NK)\eta^2_H = (H_c-K+1)/(N-K)Kruskal-Wallis effect size
zjk=(RˉjRˉk)/SEjkz_{jk} = (\bar{R}_j-\bar{R}_k)/SE_{jk}Dunn's test statistic
rrb,jk=2zjk/nj+nkr_{rb,jk} = 2z_{jk}/\sqrt{n_j+n_k}Rank-biserial rr (pairwise)

ANOVA Source Table Template

SourceSSdfdfMSFFpp
Between groupsSSBSS_BK1K-1MSBMS_BMSB/MSWMS_B/MS_W[value]
Within groups (Error)SSWSS_WNKN-KMSWMS_W
TotalSSTSS_TN1N-1

One-Way ANOVA Reporting Checklist

ItemRequired
FF-statistic with both df✅ Always
Exact p-value (or p<.001p < .001)✅ Always
ω2\omega^2 with 95% CI (primary effect size)✅ Always
η2\eta^2 (labelled as biased)✅ When journals require it
Which effect size was reported✅ Always
Group means and SDs for all groups✅ Always
Sample sizes per group✅ Always
Levene's test result✅ Always for independent designs
Whether standard or Welch's ANOVA was used✅ Always
Shapiro-Wilk result (normality)✅ When nj<50n_j < 50
Post-hoc test name and correction method✅ When omnibus FF significant
All pairwise comparisons with adjusted pp and djkd_{jk}✅ When omnibus FF significant
Planned contrast weights and rationale✅ When planned contrasts used
ε2\varepsilon^2 alongside ω2\omega^2✅ Recommended
Cohen's ff for power analysis reference✅ When reporting power
95% CI for ω2\omega^2 (via non-central FF)✅ Always
95% CI for each pairwise mean difference✅ Recommended
Sensitivity analysis (min detectable effect)✅ For null results
Domain-specific benchmark context✅ Recommended
Raincloud or violin plot✅ Strongly recommended
Whether Games-Howell was used with Welch's✅ When variances unequal
Descriptive statistics table✅ Always

APA 7th Edition Reporting Templates

Standard One-Way ANOVA (significant result):

"A one-way between-subjects ANOVA was conducted to examine the effect of [IV] on [DV]. Levene's test indicated [equal / unequal] variances (F([df1],[df2])=F([df_1], [df_2]) = [value], p=p = [value]). The ANOVA revealed a [significant / non-significant] effect of [IV], F([dfB],[dfW])=F([df_B], [df_W]) = [value], p=p = [value], ω2=\omega^2 = [value] [95% CI: LB, UB], indicating a [small / medium / large] effect. [Post-hoc test name] pairwise comparisons revealed that [group pair(s)] differed significantly (all p<p < [threshold after correction]). [Other pairs] did not differ significantly."

Welch's One-Way ANOVA:

"Due to significant heterogeneity of variance (Levene's F([df1],[df2])=F([df_1], [df_2]) = [value], p=p = [value]), Welch's one-way ANOVA was applied. The test revealed a [significant / non-significant] effect of [IV] on [DV], FW([K1],[νW])=F_W([K-1], [\nu_W]) = [value], p=p = [value], ω2=\omega^2 = [value] [95% CI: LB, UB]. Games-Howell post-hoc comparisons indicated that [describe pairwise results]."

Non-significant result with sensitivity analysis:

"A one-way ANOVA revealed no significant effect of [IV] on [DV], F([dfB],[dfW])=F([df_B], [df_W]) = [value], p=p = [value], ω2=\omega^2 = [value] [95% CI: LB, UB]. Given the sample sizes (nj=n_j = [value] per group), this study had power to detect effects of ff \geq [value] (ω2\omega^2 \geq [value]) at 80% power. Effects smaller than this threshold may exist but remain undetected."

Kruskal-Wallis (non-parametric):

"Due to [non-normality / ordinal measurement], a Kruskal-Wallis test was conducted. The test revealed a [significant / non-significant] difference across groups, H([K1])=H([K-1]) = [value], p=p = [value], ηH2=\eta^2_H = [value]. Dunn's pairwise post-hoc comparisons with Holm correction indicated that [describe pairwise results]."

Conversion Formulas

FromToFormula
FF, dfBdf_B, dfWdf_Wη2\eta^2η2=FdfB/(FdfB+dfW)\eta^2 = F \cdot df_B / (F \cdot df_B + df_W)
FF, dfBdf_B, dfWdf_Wω2\omega^2 (approx.)ω2(F1)dfB/(FdfB+dfW+1)\omega^2 \approx (F-1)\cdot df_B/(F\cdot df_B+df_W+1)
η2\eta^2fff=η2/(1η2)f = \sqrt{\eta^2/(1-\eta^2)}
ω2\omega^2fff=ω2/(1ω2)f = \sqrt{\omega^2/(1-\omega^2)}
ffη2\eta^2η2=f2/(1+f2)\eta^2 = f^2/(1+f^2)
ω2\omega^2η2\eta^2η2=ω2+εbias\eta^2 = \omega^2 + \varepsilon_{bias} (always η2>ω2\eta^2 > \omega^2)
η2\eta^2 (2 groups)Cohen's ddd=2η2/(1η2)d = 2\sqrt{\eta^2/(1-\eta^2)}
Cohen's dd (2 groups)η2\eta^2η2=d2/(d2+4)\eta^2 = d^2/(d^2+4)
HH (Kruskal-Wallis)ηH2\eta^2_HηH2=(HK+1)/(NK)\eta^2_H = (H-K+1)/(N-K)
Pairwise xˉjxˉk\bar{x}_j - \bar{x}_kdjkd_{jk}djk=(xˉjxˉk)/MSwithind_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_{within}}
djkd_{jk}gjkg_{jk} (Hedges')gjk=djk×(13/(4(NK)1))g_{jk} = d_{jk} \times (1-3/(4(N-K)-1))
rrbr_{rb} (Dunn)dd (approx.)d2rrb/1rrb2d \approx 2r_{rb}/\sqrt{1-r_{rb}^2}

Required Sample Size per Group (80% Power, α=.05\alpha = .05, Two-Sided)

Cohen's ffLabelK=3K = 3K=4K = 4K=5K = 5K=6K = 6K=8K = 8
0.10Small322274240215180
0.15Small-Med1441231079680
0.25Medium5245393529
0.35Med-Large2723211916
0.40Large2118161412
0.50Large141211108
0.60Large109876
0.80Very large66554

All values are nn per group. Total NN = n×Kn \times K.

Cohen's Benchmarks — ANOVA Effect Sizes

Labelω2\omega^2ε2\varepsilon^2η2\eta^2ff
Small0.010.010.010.010.010.010.100.10
Medium0.060.060.060.060.060.060.250.25
Large0.140.140.140.140.140.140.400.40

Post-Hoc Test Selection Guide

ConditionRecommended Post-Hoc TestControls FWER
Balanced, equal variancesTukey HSD✅ Exactly
Unbalanced, equal variancesTukey-Kramer✅ Approximately
Unequal variances OR unequal njn_jGames-Howell✅ Approximately
All groups vs. one controlDunnett's✅ Optimal
Any design, conservativeBonferroni✅ Conservative
Any design, less conservativeHolm-Bonferroni✅ Sequential
All contrasts (not just pairwise)Scheffé✅ Most conservative
Non-parametric (Kruskal-Wallis)Dunn + Holm✅ Sequential

Degrees of Freedom Reference

SourcedfdfNotes
Between groupsK1K - 1KK = number of groups
Within groups (Error)NKN - KNN = total observations
TotalN1N - 1
Welch's numeratorK1K - 1Same as standard
Welch's denominatorνW\nu_WAlways NK\leq N - K
Planned contrast11Per orthogonal contrast

Assumption Checks Reference

AssumptionTestAction if Violated
Normality of residualsShapiro-Wilk, Q-QKruskal-Wallis; transform
Homogeneity of varianceLevene's, Brown-ForsytheWelch's ANOVA + Games-Howell
IndependenceDesign reviewMultilevel model
OutliersBoxplots, $z_i
Interval scaleMeasurement theoryKruskal-Wallis

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting One-Way ANOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for applied coverage; Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth; Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for robust alternatives including trimmed mean ANOVA; Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science" (Frontiers in Psychology, 2013) for the ω2\omega^2 vs. η2\eta^2 discussion; Olejnik & Algina (2003) for generalised effect sizes; and Delacre, Lakens & Leys (2017) in the International Review of Social Psychology for the recommendation to default to Welch's ANOVA. For Bayesian ANOVA, see Rouder et al. (2012) in the Journal of Mathematical Psychology. For feature requests or support, contact the DataStatPro team.