Knowledge Base / ANOVA Tests and Alternatives Inferential Statistics 70 min read

ANOVA Tests and Alternatives

Comprehensive reference guide for ANOVA tests and non-parametric alternatives.

ANOVA Tests and Alternatives: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of analysis of variance all the way through one-way, factorial, repeated measures, and mixed ANOVA designs, their non-parametric alternatives, post-hoc testing, effect sizes, and practical usage within the DataStatPro application. Whether you are encountering ANOVA for the first time or deepening your understanding of variance decomposition and group comparison, this guide builds your knowledge systematically from the ground up.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is ANOVA?
  3. The Mathematics Behind ANOVA
  4. Assumptions of ANOVA
  5. Types of ANOVA
  6. Using the ANOVA Calculator Component
  7. One-Way Between-Subjects ANOVA
  8. Factorial Between-Subjects ANOVA
  9. One-Way Repeated Measures ANOVA
  10. Mixed ANOVA
  11. Post-Hoc Tests and Planned Contrasts
  12. Non-Parametric Alternatives
  13. Advanced Topics
  14. Worked Examples
  15. Common Mistakes and How to Avoid Them
  16. Troubleshooting
  17. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into ANOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 The Logic of Variance Decomposition

ANOVA is built upon one foundational insight: total variability in a dataset can be partitioned into components attributable to specific sources. For a one-way design:

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}

If the between-group variance is substantially larger than the within-group variance, it suggests that group membership explains a meaningful portion of the variability in scores — that is, the groups differ. This ratio of variances is the F-statistic.

1.2 The F-Distribution

The F-distribution arises from the ratio of two independent chi-squared distributions divided by their respective degrees of freedom:

F=χ12/ν1χ22/ν2F = \frac{\chi^2_1 / \nu_1}{\chi^2_2 / \nu_2}

In the ANOVA context, this becomes the ratio of two mean squares:

F=MSbetweenMSwithin=SSbetween/dfbetweenSSwithin/dfwithinF = \frac{MS_{between}}{MS_{within}} = \frac{SS_{between}/df_{between}}{SS_{within}/df_{within}}

Under H0H_0 (all group means equal), this ratio equals approximately 1. Values much greater than 1 provide evidence against H0H_0.

The F-distribution is characterised by two parameters:

It is always non-negative and right-skewed.

1.3 Why Not Multiple t-Tests?

A natural question is: why not simply run multiple t-tests to compare groups? With KK groups, this would require (K2)=K(K1)/2\binom{K}{2} = K(K-1)/2 pairwise tests.

The familywise error rate (FWER) inflates:

FWER=1(1α)mFWER = 1 - (1-\alpha)^m

Where mm is the number of tests. For K=4K = 4 groups: m=6m = 6 pairwise tests; FWER=1(0.95)6=.265FWER = 1 - (0.95)^6 = .265 — far above the nominal .05.05.

ANOVA maintains the Type I error at α\alpha for the omnibus test that all group means are simultaneously equal.

1.4 The Relationship Between ANOVA and Regression

ANOVA and regression are mathematically equivalent. Both are special cases of the General Linear Model (GLM):

Yi=Xiβ+εi,εiN(0,σ2)Y_i = \mathbf{X}_i \boldsymbol{\beta} + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, \sigma^2)

In ANOVA, the predictors X\mathbf{X} are categorical group membership variables (dummy or effect coded). Understanding this equivalence is essential for interpreting interactions and for moving to ANCOVA and mixed models.

1.5 Main Effects and Interactions

In factorial designs (multiple independent variables):

⚠️ When a significant interaction is present, main effects must be interpreted with extreme caution — the "average" effect of a variable may be misleading when the effect differs substantially across levels of the other variable.

1.6 Fixed vs. Random Effects

Standard ANOVA assumes all factors are fixed. When factors are random, the denominator of the F-ratio changes.

1.7 Variance Explained: η2\eta^2, ω2\omega^2, and ε2\varepsilon^2

Effect sizes for ANOVA are variance-explained indices — they quantify what proportion of the total (or residual) variance is attributable to a given effect. These are reviewed in detail in the Mathematics section, but the key formulas are:

η2=SSeffectSStotal,ω2=SSeffect(dfeffect)MSerrorSStotal+MSerror,ε2=SSeffect(dfeffect)MSerrorSStotal\eta^2 = \frac{SS_{effect}}{SS_{total}}, \quad \omega^2 = \frac{SS_{effect}-(df_{effect})MS_{error}}{SS_{total}+MS_{error}}, \quad \varepsilon^2 = \frac{SS_{effect}-(df_{effect})MS_{error}}{SS_{total}}

Both ω2\omega^2 and ε2\varepsilon^2 correct for the positive bias of η2\eta^2 in finite samples and are preferred for reporting.


2. What is ANOVA?

2.1 The Core Idea

Analysis of Variance (ANOVA) is a parametric inferential procedure for testing whether three or more population means differ simultaneously. Despite its name, ANOVA tests mean differences by comparing variances — specifically, by assessing whether the variability between groups is larger than expected given the variability within groups.

The general form of the F-statistic:

F=Variance between groups (signal)Variance within groups (noise)=MSbetweenMSwithinF = \frac{\text{Variance between groups (signal)}}{\text{Variance within groups (noise)}} = \frac{MS_{between}}{MS_{within}}

A large F indicates that the between-group differences are large relative to random sampling error — evidence that at least one group mean differs from the others.

2.2 What ANOVA Tests and Does Not Test

ANOVA tells you:

ANOVA does NOT tell you:

2.3 The ANOVA Family

DesignIndependent VariablesParticipantsANOVA Type
One factor, different participants per group1 (between)DifferentOne-way between-subjects
One factor, same participants in all groups1 (within)SameOne-way repeated measures
Two+ factors, different participants per cell2+ (between)DifferentFactorial between-subjects
Two factors: one between, one within1 between, 1 withinMixedMixed (split-plot) ANOVA
Two+ factors, same participants in all cells2+ (within)SameFully within-subjects factorial

2.4 ANOVA in Context

The ANOVA test is one member of a broader family of procedures for comparing group means:

SituationTest
2 groups, independent, normalt-test (Welch's recommended)
3+ groups, independent, normal, equal variancesOne-way ANOVA
3+ groups, independent, normal, unequal variancesWelch's one-way ANOVA
3+ groups, independent, non-normal or ordinalKruskal-Wallis test
3+ conditions, same participants, normalRepeated measures ANOVA
3+ conditions, same participants, non-normalFriedman test
2+ factors (between), normalFactorial ANOVA
1 between + 1 within factorMixed ANOVA
Controlling for a covariateANCOVA
Multiple dependent variables simultaneouslyMANOVA

3. The Mathematics Behind ANOVA

3.1 One-Way ANOVA: Sum of Squares Decomposition

Consider KK groups with njn_j observations in group jj and total N=j=1KnjN = \sum_{j=1}^K n_j observations. Let xˉj\bar{x}_j be the mean of group jj and xˉ..\bar{x}_{..} be the grand mean.

Grand mean:

xˉ..=1Nj=1Ki=1njxij\bar{x}_{..} = \frac{1}{N}\sum_{j=1}^K \sum_{i=1}^{n_j} x_{ij}

Total sum of squares (SStotalSS_{total}): Total variability in the data.

SStotal=j=1Ki=1nj(xijxˉ..)2SS_{total} = \sum_{j=1}^K \sum_{i=1}^{n_j} (x_{ij} - \bar{x}_{..})^2

dftotal=N1df_{total} = N - 1

Between-groups sum of squares (SSbetweenSS_{between}): Variability due to group differences.

SSbetween=j=1Knj(xˉjxˉ..)2SS_{between} = \sum_{j=1}^K n_j (\bar{x}_j - \bar{x}_{..})^2

dfbetween=K1df_{between} = K - 1

Within-groups sum of squares (SSwithinSS_{within}): Variability within groups (error).

SSwithin=j=1Ki=1nj(xijxˉj)2SS_{within} = \sum_{j=1}^K \sum_{i=1}^{n_j} (x_{ij} - \bar{x}_j)^2

dfwithin=NKdf_{within} = N - K

Verification: SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}

3.2 Mean Squares and the F-Statistic

Mean squares are sums of squares divided by their degrees of freedom — they are variance estimates:

MSbetween=SSbetweenK1MS_{between} = \frac{SS_{between}}{K-1}

MSwithin=SSwithinNKMS_{within} = \frac{SS_{within}}{N-K}

The F-statistic:

F=MSbetweenMSwithinF = \frac{MS_{between}}{MS_{within}}

Under H0:μ1=μ2==μKH_0: \mu_1 = \mu_2 = \cdots = \mu_K, the F-statistic follows an F-distribution with (K1,NK)(K-1, N-K) degrees of freedom. The p-value is:

p=P(FK1,  NKFobs)p = P(F_{K-1,\;N-K} \geq F_{obs})

3.3 The Expected Mean Squares

Understanding why the F-ratio works requires examining the expected values of the mean squares under H0H_0 and H1H_1:

E[MSwithin]=σ2E[MS_{within}] = \sigma^2 (always an unbiased estimate of σ2\sigma^2)

E[MSbetween]=σ2+j=1Knj(μjμ)2K1E[MS_{between}] = \sigma^2 + \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{K-1}

Under H0H_0 (all μj\mu_j equal): E[MSbetween]=σ2E[MS_{between}] = \sigma^2, so E[F]1E[F] \approx 1.

Under H1H_1 (some μj\mu_j differ): E[MSbetween]>σ2E[MS_{between}] > \sigma^2, so E[F]>1E[F] > 1.

The non-centrality parameter of the F-distribution:

λ=j=1Knj(μjμ)2σ2\lambda = \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{\sigma^2}

This links the population effect size to the expected F-statistic and is used for power analysis.

3.4 The ANOVA Source Table

The standard ANOVA output is presented in a source table:

SourceSSdfMSFFpp
Between groupsSSBSS_BK1K-1MSB=SSB/(K1)MS_B = SS_B/(K-1)MSB/MSWMS_B/MS_WP(FFobs)P(F \geq F_{obs})
Within groups (Error)SSWSS_WNKN-KMSW=SSW/(NK)MS_W = SS_W/(N-K)
TotalSSTSS_TN1N-1

3.5 Effect Sizes for One-Way ANOVA

Eta squared (η2\eta^2) — biased, but widely reported:

η2=SSbetweenSStotal\eta^2 = \frac{SS_{between}}{SS_{total}}

Omega squared (ω2\omega^2) — bias-corrected, preferred:

ω2=SSbetween(K1)MSwithinSStotal+MSwithin\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}}

Epsilon squared (ε2\varepsilon^2) — alternative bias correction:

ε2=SSbetween(K1)MSwithinSStotal\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}}

Cohen's ff — for power analysis:

f=η21η2orf=ω21ω2f = \sqrt{\frac{\eta^2}{1-\eta^2}} \quad \text{or} \quad f = \sqrt{\frac{\omega^2}{1-\omega^2}}

Relationship between effect sizes: ω2ε2η2\omega^2 \leq \varepsilon^2 \leq \eta^2

Cohen's (1988) benchmarks:

Labelη2\eta^2 or ω2\omega^2ff
Small0.010.010.100.10
Medium0.060.060.250.25
Large0.140.140.400.40

3.6 Confidence Intervals for ANOVA Effect Sizes

CIs for η2\eta^2 and ω2\omega^2 use the non-central F-distribution. The observed F-statistic follows a non-central F-distribution with non-centrality parameter λ\lambda related to the population effect:

λ=η2N1η2\lambda = \frac{\eta^2 \cdot N}{1-\eta^2}

The 95% CI bounds for λ\lambda are found numerically (inverting the non-central F CDF), then converted to η2\eta^2:

ηL2=λLλL+N,ηU2=λUλU+N\eta^2_L = \frac{\lambda_L}{\lambda_L + N}, \qquad \eta^2_U = \frac{\lambda_U}{\lambda_U + N}

DataStatPro computes these exact CIs automatically using numerical iteration.

3.7 Factorial ANOVA: Partitioning Variance for Multiple Factors

For a two-factor (A ×\times B) between-subjects ANOVA with aa levels of A, bb levels of B, and nn observations per cell:

SStotal=SSA+SSB+SSA×B+SSwithinSS_{total} = SS_A + SS_B + SS_{A \times B} + SS_{within}

dftotal=N1=abn1df_{total} = N - 1 = abn - 1

SourceSSdf
A (Main effect)SSASS_Aa1a-1
B (Main effect)SSBSS_Bb1b-1
A×\timesB (Interaction)SSA×BSS_{A\times B}(a1)(b1)(a-1)(b-1)
Within (Error)SSwithinSS_{within}ab(n1)ab(n-1)
TotalSStotalSS_{total}abn1abn-1

Computing each SS:

Let xˉj.\bar{x}_{j.} = mean of level jj of factor A, xˉ.k\bar{x}_{.k} = mean of level kk of factor B, xˉjk\bar{x}_{jk} = cell mean, xˉ..\bar{x}_{..} = grand mean.

SSA=bnj=1a(xˉj.xˉ..)2SS_A = bn\sum_{j=1}^a(\bar{x}_{j.} - \bar{x}_{..})^2

SSB=ank=1b(xˉ.kxˉ..)2SS_B = an\sum_{k=1}^b(\bar{x}_{.k} - \bar{x}_{..})^2

SSA×B=nj=1ak=1b(xˉjkxˉj.xˉ.k+xˉ..)2SS_{A\times B} = n\sum_{j=1}^a\sum_{k=1}^b(\bar{x}_{jk} - \bar{x}_{j.} - \bar{x}_{.k} + \bar{x}_{..})^2

SSwithin=j=1ak=1bi=1n(xijkxˉjk)2SS_{within} = \sum_{j=1}^a\sum_{k=1}^b\sum_{i=1}^n(x_{ijk} - \bar{x}_{jk})^2

F-ratios (fixed effects model, all denominators are MSwithinMS_{within}):

FA=MSAMSwithin,FB=MSBMSwithin,FA×B=MSA×BMSwithinF_A = \frac{MS_A}{MS_{within}}, \quad F_B = \frac{MS_B}{MS_{within}}, \quad F_{A\times B} = \frac{MS_{A\times B}}{MS_{within}}

3.8 Partial Eta Squared (ηp2\eta_p^2) for Factorial Designs

In factorial ANOVA, partial eta squared isolates the effect of one factor after controlling for other effects:

ηp2=SSeffectSSeffect+SSerror\eta_p^2 = \frac{SS_{effect}}{SS_{effect} + SS_{error}}

⚠️ In factorial designs, the sum of all partial ηp2\eta_p^2 values can exceed 1.0. They are NOT proportions of total variance — only η2\eta^2 carries that interpretation. Always label the statistic precisely: write ηp2\eta_p^2, not η2\eta^2, for partial values.

Partial omega squared (preferred — bias-corrected):

ωp2=SSeffectdfeffectMSerrorSStotal+MSerror\omega_p^2 = \frac{SS_{effect} - df_{effect} \cdot MS_{error}}{SS_{total} + MS_{error}}

3.9 Repeated Measures ANOVA: Within-Subjects Decomposition

For a one-way repeated measures design with KK conditions and nn participants:

SStotal=SSbetween-subjects+SSwithin-subjectsSS_{total} = SS_{between\text{-}subjects} + SS_{within\text{-}subjects}

SSwithin-subjects=SSconditions+SSerrorSS_{within\text{-}subjects} = SS_{conditions} + SS_{error}

The key feature: between-subjects variability is removed from the error term, which dramatically increases power when individual differences are large.

SourceSSdf
Between subjectsSSbsSS_{bs}n1n-1
Conditions (Within)SScondSS_{cond}K1K-1
Error (Residual)SSerrorSS_{error}(n1)(K1)(n-1)(K-1)
TotalSStotalSS_{total}nK1nK-1

F=MSconditionsMSerror=SScond/(K1)SSerror/((n1)(K1))F = \frac{MS_{conditions}}{MS_{error}} = \frac{SS_{cond}/(K-1)}{SS_{error}/((n-1)(K-1))}

Generalised eta squared (ηG2\eta_G^2; Olejnik & Algina, 2003) is recommended for repeated measures designs because it is comparable across between-subjects and within- subjects designs:

ηG2=SSconditionsSSconditions+SSbetween-subjects+SSerror\eta_G^2 = \frac{SS_{conditions}}{SS_{conditions} + SS_{between\text{-}subjects} + SS_{error}}

3.10 Sphericity and the Mauchly Test

Repeated measures ANOVA requires the sphericity assumption: the variances of the differences between all pairs of conditions are equal. Formally, for all pairs (j,k)(j, k):

Var(xjxk)=constant\text{Var}(x_j - x_k) = \text{constant}

Mauchly's test evaluates this assumption:

Epsilon (ε\varepsilon) corrections adjust the degrees of freedom when sphericity is violated. Two commonly used corrections:

Greenhouse-Geisser (GG) correction:

εGG=(jσ^jj)2(K1)(jkσ^jk2)\varepsilon_{GG} = \frac{\left(\sum_j \hat{\sigma}_{jj}\right)^2}{(K-1)\left(\sum_j\sum_k \hat{\sigma}_{jk}^2\right)}

0<εGG10 < \varepsilon_{GG} \leq 1; εGG=1\varepsilon_{GG} = 1 means sphericity holds exactly.

Huynh-Feldt (HF) correction (less conservative than GG, preferred when ε>0.75\varepsilon > 0.75):

εHF=n(K1)εGG2(K1)(n1(K1)εGG)\varepsilon_{HF} = \frac{n(K-1)\varepsilon_{GG} - 2}{(K-1)(n - 1 - (K-1)\varepsilon_{GG})}

Corrected degrees of freedom:

dfconditions=ε(K1),dferror=ε(n1)(K1)df_{conditions}^* = \varepsilon \cdot (K-1), \qquad df_{error}^* = \varepsilon \cdot (n-1)(K-1)

Decision rule for epsilon corrections:

εGG\varepsilon_{GG}Recommended Correction
1.0\approx 1.0 (Mauchly p>.05p > .05)None (uncorrected)
0.75<εGG<1.00.75 < \varepsilon_{GG} < 1.0Huynh-Feldt
εGG0.75\varepsilon_{GG} \leq 0.75Greenhouse-Geisser
Severe violationMultivariate approach (MANOVA)

3.11 Mixed ANOVA: Between + Within Factors

A mixed ANOVA (also split-plot ANOVA) includes both between-subjects and within- subjects factors. For a design with one between factor (A, aa levels) and one within factor (B, bb levels) and nn participants per group:

SourceSSdfMSF
A (Between)SSASS_Aa1a-1MSAMS_AMSA/MSS(A)MS_A/MS_{S(A)}
S(A) — Subjects within ASSS(A)SS_{S(A)}a(n1)a(n-1)MSS(A)MS_{S(A)}
B (Within)SSBSS_Bb1b-1MSBMS_BMSB/MSB×S(A)MS_B/MS_{B\times S(A)}
A×\timesBSSA×BSS_{A\times B}(a1)(b1)(a-1)(b-1)MSA×BMS_{A\times B}MSA×B/MSB×S(A)MS_{A\times B}/MS_{B\times S(A)}
B×\timesS(A) — ErrorSSB×S(A)SS_{B\times S(A)}a(n1)(b1)a(n-1)(b-1)MSB×S(A)MS_{B\times S(A)}
TotalSStotalSS_{total}abn1abn-1

Note the two separate error terms:


4. Assumptions of ANOVA

4.1 Normality of Residuals

ANOVA assumes that the residuals (differences between observed values and group means) are normally distributed within each population:

εijN(0,σ2)\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2)

How to check:

Robustness: ANOVA is robust to mild normality violations, particularly when:

When violated: Use the Kruskal-Wallis test (independent groups) or the Friedman test (repeated measures) as non-parametric alternatives. Consider data transformations (log, square root, Box-Cox) for skewed distributions.

4.2 Homogeneity of Variance (Homoscedasticity)

Standard ANOVA assumes that all KK populations have equal variances:

σ12=σ22==σK2\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_K^2

This is the homoscedasticity assumption and is required for MSwithinMS_{within} to serve as a valid pooled estimate of the common population variance σ2\sigma^2.

How to check:

Robustness: ANOVA is relatively robust to heterogeneity when group sizes are equal. When group sizes are unequal AND variances are unequal, ANOVA can have severely inflated or deflated Type I error rates.

When violated: Use Welch's one-way ANOVA (with Games-Howell post-hoc tests), which does not assume equal variances and is recommended as the default for independent designs.

4.3 Independence of Observations

All observations must be independent of each other, both within and across groups. Dependence typically arises from:

When violated: Use repeated measures ANOVA (for within-subjects data), mixed models (for nested or clustered data), or multilevel ANOVA (for hierarchical designs).

4.4 Sphericity (Repeated Measures Only)

As described in Section 3.10, repeated measures ANOVA additionally requires sphericity — that the variances of all pairwise difference scores are equal. This is a stronger assumption than homogeneity of variance.

When violated: Apply Greenhouse-Geisser or Huynh-Feldt corrections to the degrees of freedom, or use the multivariate approach (MANOVA on the repeated measures).

4.5 Interval Scale of Measurement

The dependent variable must be measured on at least an interval scale (equal-spaced intervals). Ordinal data (e.g., Likert scales) technically violate this assumption.

When violated: Use non-parametric alternatives (Kruskal-Wallis, Friedman) or analyse using ordinal regression.

4.6 Absence of Significant Outliers

ANOVA is based on means and is sensitive to extreme outliers, particularly in small samples. Outliers inflate SSwithinSS_{within} and SSbetweenSS_{between} unpredictably.

How to check:

When outliers present: Investigate the cause. Report analyses with and without outliers. Consider trimmed mean ANOVA or Kruskal-Wallis as robust alternatives.

4.7 Assumption Summary Table

AssumptionOne-WayFactorialRepeated MeasuresMixedHow to CheckRemedy
Normality✅ (residuals)Shapiro-Wilk, Q-QKruskal-Wallis / Friedman
Homogeneity of variance✅ (between)Levene'sWelch's ANOVA
Independence✅ (between subjects)Design reviewMixed models
Sphericity✅ (within part)Mauchly's testGG/HF correction
Interval scaleMeasurement theoryNon-parametric
No severe outliersBoxplots, residualsTrimmed means / robust

5. Types of ANOVA

5.1 Classification by Design

By Number of Independent Variables

IVsDesign NameExample
1One-way ANOVAEffect of teaching method (3 levels) on test scores
2Two-way (factorial) ANOVAEffect of drug (3 levels) and sex (2 levels) on pain
3Three-way ANOVADrug ×\times dose ×\times time on response
kkkk-way factorial ANOVAGeneralisation of the above

By Type of Factor

Factor TypeDescriptionDesign
Between-subjectsDifferent participants per levelStandard ANOVA
Within-subjectsSame participants in all levelsRepeated measures ANOVA
MixedCombination of between and withinMixed (split-plot) ANOVA

5.2 Choosing the Correct ANOVA Design

What is the number of independent variables?
├── 1 IV
│   └── Is the same participant in all conditions?
│       ├── NO (between-subjects)  → One-way between-subjects ANOVA
│       └── YES (within-subjects)  → One-way repeated measures ANOVA
└── 2+ IVs
    └── What type are the IVs?
        ├── All between-subjects   → Factorial between-subjects ANOVA
        ├── All within-subjects    → Fully within-subjects factorial ANOVA
        └── Mixed (some between, some within) → Mixed ANOVA

5.3 Type I, II, and III Sums of Squares

In unbalanced designs (unequal cell sizes), the partition of SS depends on the order in which effects are entered. Three conventions exist:

TypeDescriptionWhen to Use
Type I (Sequential)SS for each effect controlling for effects entered earlierWhen the order of entry is theoretically meaningful
Type II (Hierarchical)SS for each effect controlling for all other effects at the same levelWhen there is no significant interaction
Type III (Marginal)SS for each effect controlling for all other effects including interactionsWhen there is a significant interaction; most common default in SPSS

⚠️ For balanced designs (equal cell sizes), all three types give identical results. For unbalanced designs, Type III is the most commonly reported but requires full-rank parameterisation (effect coding or deviation coding, not dummy coding). Always specify which type was used when reporting factorial ANOVA results.


6. Using the ANOVA Calculator Component

The ANOVA Calculator component in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting ANOVA designs and their alternatives.

Step-by-Step Guide

Step 1 — Select the ANOVA Design

Choose from the "ANOVA Type" dropdown:

Step 2 — Input Method

Step 3 — Specify the Design Structure

Step 4 — Select Assumption Tests

DataStatPro automatically runs, with results displayed in a colour-coded panel:

Step 5 — Select Post-Hoc Tests

When the omnibus F is significant, specify post-hoc tests:

Step 6 — Select Effect Sizes

Step 7 — Select Display Options

Step 8 — Run the Analysis

Click "Run ANOVA". DataStatPro will:

  1. Compute the full ANOVA source table.
  2. Apply sphericity corrections automatically if Mauchly's test is significant.
  3. Run all selected post-hoc tests and planned contrasts.
  4. Compute effect sizes with exact CIs.
  5. Generate all visualisations.
  6. Output an APA-compliant results paragraph.

7. One-Way Between-Subjects ANOVA

7.1 Purpose and Design

The one-way between-subjects ANOVA tests whether the means of three or more independent groups differ significantly. It is the generalisation of the independent samples t-test (when K=2K = 2, F=t2F = t^2) to K3K \geq 3 groups.

Common applications:

7.2 Full Procedure

Step 1 — State hypotheses

H0:μ1=μ2==μKH_0: \mu_1 = \mu_2 = \cdots = \mu_K

H1:At least one μj differs from the othersH_1: \text{At least one } \mu_j \text{ differs from the others}

Step 2 — Compute grand mean and group means

xˉ..=j=1Ki=1njxijN,xˉj=1nji=1njxij\bar{x}_{..} = \frac{\sum_{j=1}^K \sum_{i=1}^{n_j} x_{ij}}{N}, \qquad \bar{x}_j = \frac{1}{n_j}\sum_{i=1}^{n_j}x_{ij}

Step 3 — Compute sums of squares

SSB=j=1Knj(xˉjxˉ..)2SS_B = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2

SSW=j=1Ki=1nj(xijxˉj)2SS_W = \sum_{j=1}^K \sum_{i=1}^{n_j}(x_{ij}-\bar{x}_j)^2

SST=SSB+SSWSS_T = SS_B + SS_W

Step 4 — Compute degrees of freedom

dfB=K1,dfW=NK,dfT=N1df_B = K-1, \quad df_W = N-K, \quad df_T = N-1

Step 5 — Compute mean squares and F

MSB=SSB/dfB,MSW=SSW/dfW,F=MSB/MSWMS_B = SS_B/df_B, \quad MS_W = SS_W/df_W, \quad F = MS_B/MS_W

Step 6 — Compute p-value and make a decision

p=P(FK1,  NKFobs)p = P(F_{K-1,\;N-K} \geq F_{obs})

Reject H0H_0 if pαp \leq \alpha.

Step 7 — Compute effect sizes

η2=SSB/SST\eta^2 = SS_B/SS_T

ω2=(SSB(K1)MSW)/(SST+MSW)\omega^2 = (SS_B - (K-1)MS_W)/(SS_T + MS_W)

f=ω2/(1ω2)f = \sqrt{\omega^2/(1-\omega^2)}

Step 8 — Conduct post-hoc tests or planned contrasts

If H0H_0 is rejected, identify which groups differ (Section 11).

7.3 Computing ω2\omega^2 and η2\eta^2 from F

When only the F-statistic, df, and NN are reported:

η2=FdfBFdfB+dfW\eta^2 = \frac{F \cdot df_B}{F \cdot df_B + df_W}

ω2=(F1)dfBFdfB+dfW+1\omega^2 = \frac{(F-1) \cdot df_B}{F \cdot df_B + df_W + 1} (approximate)

7.4 Interpreting the Omnibus F-Test

The omnibus F-test is a global test. A significant result tells you only that:

  1. At least one pair of group means differs significantly.
  2. This difference is unlikely to have arisen by sampling error alone.

It does not tell you which groups differ, by how much, or in what direction. Post-hoc tests (Section 11) are required to answer these questions.

💡 When groups were theoretically predicted to differ in specific ways before data collection, use planned contrasts rather than (or in addition to) the omnibus F-test. Planned contrasts are more powerful and more informative than post-hoc tests.


8. Factorial Between-Subjects ANOVA

8.1 Purpose and Design

Factorial ANOVA simultaneously examines the effects of two or more IVs and their interactions on a continuous DV. It is more efficient than running separate one-way ANOVAs because it:

8.2 The Concept of Interaction

An interaction exists when the effect of one IV differs depending on the level of another IV. Interactions are the most important and often the most theoretically interesting finding in factorial designs.

Types of interactions:

TypeDescriptionPattern
OrdinalLines in interaction plot do not cross; one group always higherParallelism violated but ranking preserved
Disordinal (crossover)Lines cross; one group higher at some levels, lower at othersRanking reverses
SpreadingEffect of A increases (or decreases) with level of BLines fan out

Interpreting an interaction:

When a significant A×\timesB interaction is found:

  1. Do not interpret main effects in isolation — they are averages that may be misleading when the interaction is substantial.
  2. Probe the interaction with simple effects analysis: test the effect of A separately at each level of B (or vice versa).
  3. Plot the interaction with a line plot: this is essential for understanding the pattern.

8.3 Simple Effects Analysis

Simple effects decompose the interaction by examining the effect of one IV at each level of the other IV. For a 2×\times3 design:

Simple effects use MSwithinMS_{within} from the full factorial model as the error term (pooled error), which is more stable than separate-group estimates.

8.4 Effect Sizes in Factorial ANOVA

For factorial designs, report partial effect sizes for each effect:

Partial eta squared (common, biased):

ηp,A2=SSASSA+SSwithin\eta_{p,A}^2 = \frac{SS_A}{SS_A + SS_{within}}

ηp,B2=SSBSSB+SSwithin\eta_{p,B}^2 = \frac{SS_B}{SS_B + SS_{within}}

ηp,A×B2=SSA×BSSA×B+SSwithin\eta_{p,A\times B}^2 = \frac{SS_{A\times B}}{SS_{A\times B} + SS_{within}}

Partial omega squared (preferred, bias-corrected):

ωp,A2=SSAdfAMSwithinSStotal+MSwithin\omega_{p,A}^2 = \frac{SS_A - df_A \cdot MS_{within}}{SS_{total} + MS_{within}}

Generalised eta squared (recommended for between-subjects factorial designs, Olejnik & Algina, 2003):

ηG2=SSeffectSStotal\eta_{G}^2 = \frac{SS_{effect}}{SS_{total}}

For purely between-subjects designs, ηG2=η2\eta_G^2 = \eta^2 for each effect.


9. One-Way Repeated Measures ANOVA

9.1 Purpose and Design

One-way repeated measures ANOVA tests whether means differ across KK conditions when the same participants are measured in all conditions. It is the generalisation of the paired t-test to K3K \geq 3 conditions.

Common applications:

Advantages over one-way between-subjects ANOVA:

Disadvantages:

9.2 Full Procedure

Step 1 — Compute condition means and participant means

xˉ.k\bar{x}_{.k} = mean of condition kk; xˉi.\bar{x}_{i.} = mean of participant ii; xˉ..\bar{x}_{..} = grand mean.

Step 2 — Compute sums of squares

SStotal=i=1nk=1K(xikxˉ..)2SS_{total} = \sum_{i=1}^n\sum_{k=1}^K (x_{ik} - \bar{x}_{..})^2

SSconditions=nk=1K(xˉ.kxˉ..)2SS_{conditions} = n\sum_{k=1}^K(\bar{x}_{.k} - \bar{x}_{..})^2

SSsubjects=Ki=1n(xˉi.xˉ..)2SS_{subjects} = K\sum_{i=1}^n(\bar{x}_{i.} - \bar{x}_{..})^2

SSerror=SStotalSSconditionsSSsubjectsSS_{error} = SS_{total} - SS_{conditions} - SS_{subjects}

Step 3 — Degrees of freedom

dfconditions=K1,dfsubjects=n1,dferror=(K1)(n1)df_{conditions} = K-1, \quad df_{subjects} = n-1, \quad df_{error} = (K-1)(n-1)

Step 4 — Mean squares and F

MSconditions=SSconditions/(K1)MS_{conditions} = SS_{conditions}/(K-1)

MSerror=SSerror/((K1)(n1))MS_{error} = SS_{error}/((K-1)(n-1))

F=MSconditions/MSerrorF = MS_{conditions}/MS_{error}

Step 5 — Sphericity check and correction

Run Mauchly's test. If violated, apply GG or HF correction:

Fcorrected=SSconditions/(ε(K1))SSerror/(ε(n1)(K1))=MSconditionsMSerrorF_{corrected} = \frac{SS_{conditions}/(\varepsilon \cdot (K-1))}{SS_{error}/(\varepsilon \cdot (n-1)(K-1))} = \frac{MS_{conditions}}{MS_{error}}

The F-statistic is unchanged; only the reference distribution (via corrected df) changes.

Step 6 — Effect size

Generalised eta squared (ηG2\eta_G^2) — recommended for repeated measures:

ηG2=SSconditionsSSconditions+SSsubjects+SSerror\eta_G^2 = \frac{SS_{conditions}}{SS_{conditions} + SS_{subjects} + SS_{error}}

Partial eta squared (ηp2\eta_p^2) — common but inflated:

ηp2=SSconditionsSSconditions+SSerror\eta_p^2 = \frac{SS_{conditions}}{SS_{conditions} + SS_{error}}

Partial omega squared (ωp2\omega_p^2) — bias-corrected:

ωp2=SSconditions(K1)MSerrorSStotal+MSerror\omega_p^2 = \frac{SS_{conditions} - (K-1)MS_{error}}{SS_{total} + MS_{error}}

💡 Use ηG2\eta_G^2 for repeated measures when comparing effect sizes across studies using different designs (between-subjects vs. within-subjects), as it is the most design-comparable measure. For purely within-design comparisons, ωp2\omega_p^2 is the least-biased choice.


10. Mixed ANOVA

10.1 Purpose and Design

Mixed ANOVA combines at least one between-subjects factor and at least one within- subjects factor. It is among the most commonly used designs in psychology, medicine, and education because most longitudinal experiments involve:

The mixed ANOVA tests:

  1. Main effect of the between-subjects factor (e.g., treatment vs. control, collapsed across time).
  2. Main effect of the within-subjects factor (e.g., change over time, collapsed across groups).
  3. Interaction (e.g., does the pattern of change over time differ between treatment and control?). This interaction is typically the primary research question.

10.2 The Primary Interaction: Time ×\times Group

In a treatment evaluation study, the Time ×\times Group interaction answers: "Does the treatment group change differently over time compared to the control group?" This is typically the most important test in a mixed ANOVA:

10.3 Probing a Significant Interaction

When the Group ×\times Time interaction is significant:

Option 1 — Simple effects of Time within each Group: Conduct one-way repeated measures ANOVA (or paired t-tests with correction) separately for each group. This answers: "Did each group change significantly over time?"

Option 2 — Simple effects of Group at each Time point: Conduct independent t-tests (or one-way ANOVA) separately at each time point with Bonferroni correction. This answers: "At which time points do the groups differ?"

10.4 Sphericity in Mixed ANOVA

The sphericity assumption applies to the within-subjects factor and any interaction involving the within-subjects factor. Mauchly's test and GG/HF corrections apply specifically to:

The between-subjects main effect (A) does not require sphericity but does require homogeneity of variance across groups (Levene's test).

10.5 Effect Sizes for Mixed ANOVA

For mixed ANOVA, generalised eta squared (ηG2\eta_G^2) is strongly recommended for all effects because it accounts for the different variance structures of between-subjects and within-subjects components:

ηG2=SSeffectSSeffect+SSsubjects+SSerror(between)+SSerror(within)\eta_G^2 = \frac{SS_{effect}}{SS_{effect} + SS_{subjects} + SS_{error(between)} + SS_{error(within)}}

This allows direct comparison of effect sizes from mixed designs with purely between- subjects or purely within-subjects designs.


11. Post-Hoc Tests and Planned Contrasts

11.1 The Need for Post-Hoc Testing

A significant omnibus F-test tells you only that some group means differ. Post-hoc tests are pairwise comparisons conducted after a significant F-test to determine which specific groups differ, while controlling the familywise error rate.

The key trade-off: Controlling the FWER requires more conservative critical values, which reduces power for individual comparisons. Choosing a post-hoc test involves balancing Type I and Type II error control.

11.2 Overview of Post-Hoc Tests

TestFWER ControlAssumes Equal VariancesBest For
Tukey HSD✅ Exact for balanced✅ YesBalanced designs, all pairwise
Tukey-Kramer✅ Approximate✅ YesUnbalanced designs, all pairwise
Bonferroni✅ Conservative❌ NoAny design, any set of comparisons
Holm-Bonferroni✅ Less conservative❌ NoAny design; preferred over Bonferroni
Scheffé✅ Most conservative✅ YesAll possible contrasts (not just pairwise)
Games-Howell✅ Approximate❌ NoUnequal variances or unequal nn
Dunnett✅ Optimal✅ YesAll groups vs. one control group
Fisher LSD❌ No control✅ YesExploratory only; requires significant FF

11.3 Tukey's HSD — Full Procedure

Tukey's Honestly Significant Difference (HSD) is the most commonly used post-hoc test for balanced designs with equal group variances. It controls the FWER at exactly α\alpha for all pairwise comparisons.

Critical value: The studentised range statistic qK,  NK,  αq_{K,\;N-K,\;\alpha}, where KK is the number of groups.

Minimum significant difference (MSD):

HSD=qK,  NK,  α×MSwithinn\text{HSD} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{n}}

For unequal group sizes (Tukey-Kramer method):

HSDjk=qK,  NK,  α×MSwithin2(1nj+1nk)\text{HSD}_{jk} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{2}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

Declare groups jj and kk significantly different if xˉjxˉk>HSDjk|\bar{x}_j - \bar{x}_k| > \text{HSD}_{jk}.

95% CI for the pairwise difference μjμk\mu_j - \mu_k:

(xˉjxˉk)±qK,  NK,  α×MSwithin2(1nj+1nk)(\bar{x}_j - \bar{x}_k) \pm q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{2}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}

11.4 Games-Howell Test — Unequal Variances

Games-Howell is the recommended post-hoc test when variances are unequal (Levene's significant) or group sizes differ substantially. It uses Welch-Satterthwaite df for each pairwise comparison:

Standard error for pair (j,k)(j, k):

SEjk=sj2nj+sk2nkSE_{jk} = \sqrt{\frac{s_j^2}{n_j} + \frac{s_k^2}{n_k}}

Test statistic:

qjk=xˉjxˉkSEjkq_{jk} = \frac{|\bar{x}_j - \bar{x}_k|}{SE_{jk}}

Degrees of freedom (Welch-Satterthwaite):

νjk=(sj2/nj+sk2/nk)2(sj2/nj)2/(nj1)+(sk2/nk)2/(nk1)\nu_{jk} = \frac{(s_j^2/n_j + s_k^2/n_k)^2}{(s_j^2/n_j)^2/(n_j-1) + (s_k^2/n_k)^2/(n_k-1)}

Significance assessed against the studentised range distribution qK,  νjk,  αq_{K,\;\nu_{jk},\;\alpha}.

11.5 Planned Contrasts — A Priori Comparisons

Planned contrasts (a priori comparisons) are specific, theoretically motivated comparisons formulated before data collection. They are more powerful than post-hoc tests because:

Contrast coefficients: A contrast is a weighted sum of group means ψ=jcjμj\psi = \sum_j c_j\mu_j, where jcj=0\sum_j c_j = 0.

Examples for K=4K = 4 groups (Control, Drug A, Drug B, Drug C):

Contrastc1c_1c2c_2c3c_3c4c_4Comparison
Control vs. all treatments331-11-11-1Control vs. treatments
Drug A vs. B and C00221-11-1Drug A vs. B and C
Drug B vs. C0000111-1Drug B vs. C

Orthogonal contrasts are statistically independent (jc1jc2j/nj=0\sum_j c_{1j}c_{2j}/n_j = 0). A set of K1K-1 orthogonal contrasts fully partitions SSbetweenSS_{between} and does not require FWER correction.

Contrast F-statistic:

SSψ=(jcjxˉj)2jcj2/nj,Fψ=SSψMSwithin,ν=(1,NK)SS_\psi = \frac{\left(\sum_j c_j \bar{x}_j\right)^2}{\sum_j c_j^2/n_j}, \qquad F_\psi = \frac{SS_\psi}{MS_{within}}, \quad \nu = (1, N-K)

11.6 Effect Sizes for Pairwise Comparisons

After identifying which pairs of groups differ, report effect sizes for each significant pairwise comparison:

Cohen's dd for the pairwise comparison (j,k)(j, k):

djk=xˉjxˉkspooled,jkd_{jk} = \frac{\bar{x}_j - \bar{x}_k}{s_{pooled,jk}}

Where spooled,jks_{pooled,jk} can be either the two-group pooled SD or MSwithin\sqrt{MS_{within}} from the full ANOVA model (recommended — more stable estimate).

Using MSwithinMS_{within} as the standardiser:

djk=xˉjxˉkMSwithind_{jk} = \frac{\bar{x}_j - \bar{x}_k}{\sqrt{MS_{within}}}

Hedges' gg (bias-corrected):

gjk=djk×(134(NK)1)g_{jk} = d_{jk} \times \left(1 - \frac{3}{4(N-K)-1}\right)


12. Non-Parametric Alternatives

12.1 When Non-Parametric Tests Are Appropriate

Non-parametric ANOVA alternatives are appropriate when:

12.2 Kruskal-Wallis Test — Non-Parametric One-Way ANOVA

The Kruskal-Wallis H test is the non-parametric alternative to one-way between- subjects ANOVA. It tests whether the population distributions of K3K \geq 3 independent groups are identical (or equivalently, under the location-shift assumption, whether the groups have equal medians).

Procedure:

Step 1 — Rank all observations

Combine all NN observations across groups and assign ranks from 1 to NN. Assign average ranks for ties.

Step 2 — Compute rank sums per group

Wj=i=1njRijW_j = \sum_{i=1}^{n_j} R_{ij} (sum of ranks for group jj)

Step 3 — Compute the H statistic

H=12N(N+1)j=1KWj2nj3(N+1)H = \frac{12}{N(N+1)}\sum_{j=1}^K \frac{W_j^2}{n_j} - 3(N+1)

Tie correction:

Hc=H1m(tm3tm)N3NH_c = \frac{H}{1 - \dfrac{\sum_m(t_m^3 - t_m)}{N^3-N}}

Where tmt_m is the number of observations in the mm-th tied group.

Step 4 — p-value

For K=3K = 3 and nj5n_j \leq 5 per group: use exact tables. For larger samples: HχK12H \sim \chi^2_{K-1} approximately.

p=P(χK12Hc)p = P(\chi^2_{K-1} \geq H_c)

Step 5 — Effect size: ηH2\eta^2_H

ηH2=HK+1NK\eta^2_H = \frac{H - K + 1}{N - K}

Or, equivalently: ηH2=(H/χK1,  0.502)/((N1)/(K1))\eta^2_H = (H/\chi^2_{K-1,\;0.50})/((N-1)/(K-1))

Cohen's benchmarks for ηH2\eta^2_H (same as ANOVA η2\eta^2): small = .01, medium = .06, large = .14.

Step 6 — Post-hoc tests for Kruskal-Wallis

When HH is significant, pairwise comparisons use the Dunn test with Bonferroni or Holm correction:

zjk=RˉjRˉkN(N+1)12(1nj+1nk)z_{jk} = \frac{\bar{R}_j - \bar{R}_k}{\sqrt{\dfrac{N(N+1)}{12}\left(\dfrac{1}{n_j}+\dfrac{1}{n_k}\right)}}

Where Rˉj\bar{R}_j and Rˉk\bar{R}_k are the mean ranks for groups jj and kk.

Effect size for each pairwise comparison (rank-biserial correlation):

rrb,jk=2zjknj+nkr_{rb,jk} = \frac{2z_{jk}}{\sqrt{n_j + n_k}}

12.3 Friedman Test — Non-Parametric Repeated Measures ANOVA

The Friedman test is the non-parametric alternative to one-way repeated measures ANOVA. It tests whether KK related conditions (measured on the same participants) have equal population distributions.

Procedure:

Step 1 — Rank within each participant

For each participant ii, rank their KK scores from 1 (lowest) to KK (highest). Assign average ranks for ties within a participant.

Step 2 — Compute column rank sums

Rk=i=1nrikR_k = \sum_{i=1}^n r_{ik} (sum of ranks in condition kk across all nn participants)

Step 3 — Compute Friedman's χr2\chi^2_r statistic

χr2=12nK(K+1)k=1KRk23n(K+1)\chi^2_r = \frac{12}{nK(K+1)}\sum_{k=1}^K R_k^2 - 3n(K+1)

Kendall's concordance correction (more accurate for small samples):

W=χr2n(K1)W = \frac{\chi^2_r}{n(K-1)}

FF=(n1)W1WF_F = \frac{(n-1)W}{1-W}, compared to FK1,  (n1)(K1)F_{K-1,\;(n-1)(K-1)}

Step 4 — p-value

p=P(χK12χr2)p = P(\chi^2_{K-1} \geq \chi^2_r) (large-sample approximation)

Step 5 — Effect size: Kendall's WW

W=χr2n(K1)W = \frac{\chi^2_r}{n(K-1)}

WW ranges from 0 (no agreement across participants) to 1 (perfect agreement):

WWInterpretation
0.000.100.00 - 0.10Very weak concordance
0.100.300.10 - 0.30Weak concordance
0.300.500.30 - 0.50Moderate concordance
>0.50> 0.50Strong concordance

Or report ηF2\eta^2_F:

ηF2=χr2n(K1)=W\eta^2_F = \frac{\chi^2_r}{n(K-1)} = W

Step 6 — Post-hoc tests for Friedman

Pairwise comparisons using Wilcoxon signed-rank tests with Bonferroni or Holm correction, or the Conover test (more powerful):

tjk=zjkN1χr2N(1W),ν=(n1)(K1)t_{jk} = z_{jk}\sqrt{\frac{N-1-\chi^2_r}{N(1-W)}}, \quad \nu = (n-1)(K-1)

Effect size for each pairwise comparison (matched-pairs rank-biserial correlation):

rrb,jk=2zjknr_{rb,jk} = \frac{2z_{jk}}{\sqrt{n}}

12.4 Welch's One-Way ANOVA — Robust to Heteroscedasticity

Welch's F-test (Welch, 1951) is a parametric ANOVA variant that does not assume homogeneity of variance. It is the recommended default for one-way between-subjects ANOVA when group variances may differ.

Weighted group means:

wj=njsj2,x~=jwjxˉjjwjw_j = \frac{n_j}{s_j^2}, \quad \tilde{x} = \frac{\sum_j w_j \bar{x}_j}{\sum_j w_j}

Welch's F-statistic:

FW=j=1Kwj(xˉjx~)2/(K1)1+2(K2)K21j=1K(1wj/jwj)2nj1F_W = \frac{\displaystyle\sum_{j=1}^K w_j(\bar{x}_j - \tilde{x})^2/(K-1)}{1 + \dfrac{2(K-2)}{K^2-1}\displaystyle\sum_{j=1}^K \dfrac{(1-w_j/\sum_j w_j)^2}{n_j-1}}

Degrees of freedom (approximate):

νW=K213j=1K(1wj/jwj)2nj1\nu_W = \frac{K^2-1}{3\displaystyle\sum_{j=1}^K \dfrac{(1-w_j/\sum_j w_j)^2}{n_j-1}}

p=P(FK1,  νWFW)p = P(F_{K-1,\;\nu_W} \geq F_W)

Post-hoc: Use Games-Howell pairwise tests when Welch's ANOVA is significant.

💡 Just as Welch's t-test is the recommended default over Student's t-test for two groups, Welch's one-way ANOVA is increasingly recommended as the default over classical ANOVA for three or more independent groups. The loss of power when variances are truly equal is negligible.


13. Advanced Topics

13.1 ANCOVA — Analysis of Covariance

ANCOVA extends ANOVA by including one or more continuous covariates in the model. It serves two purposes:

  1. Reduce error variance by partialling out variability explained by the covariate, thereby increasing power to detect group differences.
  2. Adjust group means for pre-existing differences in the covariate (important for quasi-experimental designs).

The ANCOVA model:

Yij=μ+αj+β(XijXˉ)+εijY_{ij} = \mu + \alpha_j + \beta(X_{ij} - \bar{X}) + \varepsilon_{ij}

Where αj\alpha_j is the group effect, β\beta is the regression coefficient for covariate XX, and εijN(0,σ2)\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2).

Additional assumptions for ANCOVA:

Adjusted means (estimated marginal means):

yˉadj,j=yˉjβ^(xˉjxˉ..)\bar{y}_{adj,j} = \bar{y}_j - \hat{\beta}(\bar{x}_j - \bar{x}_{..})

These are the group means estimated at the grand mean of the covariate.

Effect size for ANCOVA:

ωp2=SSgroupdfgroupMSerror(ANCOVA)SStotal+MSerror(ANCOVA)\omega_p^2 = \frac{SS_{group} - df_{group} \cdot MS_{error(ANCOVA)}}{SS_{total} + MS_{error(ANCOVA)}}

13.2 Trend Analysis for Ordered Groups

When the levels of the IV represent an ordered quantitative variable (e.g., drug dose: 0, 10, 20, 40 mg), polynomial trend analysis (orthogonal polynomials) is more informative than post-hoc pairwise tests.

Linear trend: Tests whether the means increase or decrease monotonically.

SSlinear=n(jcj(1)xˉj)2j(cj(1))2SS_{linear} = \frac{n\left(\sum_j c_j^{(1)}\bar{x}_j\right)^2}{\sum_j \left(c_j^{(1)}\right)^2}

Quadratic trend: Tests whether the means follow a U-shape (accelerating or decelerating pattern).

SSquadratic=n(jcj(2)xˉj)2j(cj(2))2SS_{quadratic} = \frac{n\left(\sum_j c_j^{(2)}\bar{x}_j\right)^2}{\sum_j \left(c_j^{(2)}\right)^2}

Standard orthogonal polynomial coefficients for K=3,4,5K = 3, 4, 5 groups:

KKLinear (c(1)c^{(1)})Quadratic (c(2)c^{(2)})Cubic (c(3)c^{(3)})
31,0,1-1, 0, 11,2,11, -2, 1
43,1,1,3-3, -1, 1, 31,1,1,11, -1, -1, 11,3,3,1-1, 3, -3, 1
52,1,0,1,2-2, -1, 0, 1, 22,1,2,1,22, -1, -2, -1, 21,2,0,2,1-1, 2, 0, -2, 1

13.3 Power Analysis for ANOVA

A priori power analysis determines the required sample size before data collection. The primary input is Cohen's ff (not f2f^2 — that is for regression):

f=ω21ω2f = \sqrt{\frac{\omega^2}{1-\omega^2}}

For one-way ANOVA (equal group sizes), the non-centrality parameter:

λ=f2N=f2Kn\lambda = f^2 \cdot N = f^2 \cdot Kn

Required nn per group for power 1β1-\beta at two-sided level α\alpha:

Iteratively solve: Power =P(FK1,  K(n1)Fcritλ=f2Kn)1β= P(F_{K-1,\;K(n-1)} \geq F_{crit} \mid \lambda = f^2 Kn) \geq 1-\beta

No closed form exists — DataStatPro uses numerical methods for exact power calculations.

Approximate nn per group using the χ2\chi^2 approximation:

n(z1α/K+z1β)2f2n \approx \frac{(z_{1-\alpha/K} + z_{1-\beta})^2}{f^2}

Required nn per group for common scenarios (80% power, α=.05\alpha = .05, one-way):

Cohen's ffLabelK=3K=3K=4K=4K=5K=5K=6K=6
0.10Small322274240215
0.25Medium52453935
0.40Large21181614
0.50Large14121110

For repeated measures ANOVA, power also depends on the within-subjects correlation ρ\rho (average correlation among repeated measures):

frm=f×K1+(K1)ρ×Kf_{rm} = f \times \sqrt{\frac{K}{1+(K-1)\rho}} \times \sqrt{K}

Higher ρ\rho → more power benefit from the repeated measures design.

13.4 Dealing with Violations: Transformation Strategies

When the normality or homoscedasticity assumptions are violated, data transformations can sometimes restore assumption validity before applying ANOVA:

Distribution ShapeSuggested TransformationFormula
Right skew (positive)LogY=ln(Y)Y' = \ln(Y) or log10(Y)\log_{10}(Y)
Moderate right skewSquare rootY=YY' = \sqrt{Y}
Severe right skewReciprocalY=1/YY' = 1/Y
Proportion dataArcsineY=arcsin(Y)Y' = \arcsin(\sqrt{Y})
Count dataSquare rootY=Y+0.5Y' = \sqrt{Y + 0.5}

⚠️ Transformed means cannot be back-transformed directly to obtain the mean of the original variable. Back-transforming estimates the median (for log), not the mean. Always report descriptive statistics in the original scale alongside transformed results.

13.5 Robust ANOVA: Trimmed Means

Trimmed mean ANOVA (Wilcox, 2017) replaces standard means with α\alpha-trimmed means, dramatically reducing sensitivity to outliers and non-normality while maintaining reasonable power.

Yuen's trimmed mean F-test for one-way ANOVA uses:

xˉt,j\bar{x}_{t,j} = 20%-trimmed mean for group jj

Wj=(nj2hj)×sw,j2/hj2W_j = (n_j - 2h_j) \times s_{w,j}^2 / h_j^2

Where sw,j2s_{w,j}^2 is the Winsorised variance for group jj and hj=nj20.2njh_j = n_j - 2\lfloor 0.2 n_j \rfloor.

The test statistic is compared to an F-distribution with adjusted degrees of freedom.

13.6 Bayesian ANOVA

Bayesian ANOVA (Rouder et al., 2012; implemented via BayesFactor R package) quantifies evidence for and against each effect using Bayes Factors. For each effect:

BF10=P(dataH1:effect present)P(dataH0:effect absent)BF_{10} = \frac{P(\text{data} \mid H_1: \text{effect present})}{P(\text{data} \mid H_0: \text{effect absent})}

The prior on effect sizes under H1H_1 is typically a Cauchy distribution:

δCauchy(0,r),r=2/2\delta \sim \text{Cauchy}(0, r), \quad r = \sqrt{2}/2 (default "medium" prior)

Bayesian ANOVA advantages:

13.7 Reporting ANOVA According to APA 7th Edition

APA Publication Manual (7th ed.) requirements for ANOVA:

  1. Report the F-statistic, both degrees of freedom, and exact p-value: F(df1,df2)=F(df_1, df_2) = [value], p=p = [value]
  2. Report effect size with 95% CI: ω2=\omega^2 = [value] [95% CI: LB, UB] or ηp2=\eta_p^2 = [value] [95% CI: LB, UB]
  3. Specify which effect size was used (state ω2\omega^2 vs. η2\eta^2 explicitly).
  4. Report the sphericity correction used (GG or HF) and ε\varepsilon value.
  5. Report group means and standard deviations.
  6. Report post-hoc test results with adjusted p-values and effect sizes per comparison.
  7. Specify whether equal variances were assumed and which SS Type was used (factorial).

14. Worked Examples

Example 1: One-Way Between-Subjects ANOVA — Effect of Therapy Type on Depression

A clinical researcher assigns n=30n = 30 participants per group to one of three therapy conditions (CBT, Behavioural Activation, Waitlist Control). Post-treatment depression scores (PHQ-9; lower = less depression) are measured.

Group summary statistics:

GroupnnMean PHQ-9SD
CBT309.89.84.24.2
Behavioural Activation (BA)3011.411.44.64.6
Waitlist Control (WL)3016.316.35.15.1

N=90N = 90, K=3K = 3

Grand mean:

xˉ..=(9.8+11.4+16.3)/3=37.5/3=12.500\bar{x}_{..} = (9.8 + 11.4 + 16.3)/3 = 37.5/3 = 12.500

Step 1 — Between-groups SS:

SSB=30[(9.812.5)2+(11.412.5)2+(16.312.5)2]SS_B = 30[(9.8-12.5)^2 + (11.4-12.5)^2 + (16.3-12.5)^2]

=30[7.29+1.21+14.44]=30×22.94=688.2= 30[7.29 + 1.21 + 14.44] = 30 \times 22.94 = 688.2

Step 2 — Within-groups SS:

SSW=(nj1)(sj2)SS_W = (n_j-1)(s_j^2) summed across groups:

=29(4.22)+29(4.62)+29(5.12)=29(17.64)+29(21.16)+29(26.01)= 29(4.2^2) + 29(4.6^2) + 29(5.1^2) = 29(17.64) + 29(21.16) + 29(26.01)

=511.56+613.64+754.29=1879.49= 511.56 + 613.64 + 754.29 = 1879.49

Step 3 — Total SS: SST=688.2+1879.49=2567.69SS_T = 688.2 + 1879.49 = 2567.69

Step 4 — ANOVA source table:

SourceSSdfMSFFpp
Between688.20688.2022344.10344.1016.5216.52<.001< .001
Within1879.491879.49878721.6021.60
Total2567.692567.698989

Step 5 — Levene's test: F(2,87)=0.82F(2, 87) = 0.82, p=.44p = .44 — homogeneity of variance holds; standard ANOVA is appropriate.

Step 6 — Effect sizes:

η2=688.20/2567.69=0.268\eta^2 = 688.20/2567.69 = 0.268

ω2=(688.202×21.60)/(2567.69+21.60)=(688.2043.20)/2589.29=645.00/2589.29=0.249\omega^2 = (688.20 - 2 \times 21.60)/(2567.69 + 21.60) = (688.20 - 43.20)/2589.29 = 645.00/2589.29 = 0.249

f=0.249/(10.249)=0.249/0.751=0.3315=0.576f = \sqrt{0.249/(1-0.249)} = \sqrt{0.249/0.751} = \sqrt{0.3315} = 0.576

95% CI for ω2\omega^2 (via non-central F, Fobs=16.52F_{obs} = 16.52, df1=2df_1 = 2, df2=87df_2 = 87):

Non-centrality λ^=F×df1=16.52×2=33.04\hat{\lambda} = F \times df_1 = 16.52 \times 2 = 33.04

95% CI for λ\lambda: [15.84,54.26][15.84, 54.26] (numerical)

ωL2=15.84/(15.84+90)=0.150,ωU2=54.26/(54.26+90)=0.376\omega^2_L = 15.84/(15.84+90) = 0.150, \quad \omega^2_U = 54.26/(54.26+90) = 0.376

Step 7 — Post-hoc tests (Tukey HSD):

MSwithin=21.60MS_{within} = 21.60; Tukey critical value q3,87,0.05=3.37q_{3,87,0.05} = 3.37

HSD=3.37×21.60/30=3.37×0.849=2.860\text{HSD} = 3.37 \times \sqrt{21.60/30} = 3.37 \times 0.849 = 2.860

ComparisonDifferenceHSDSignificant?Cohen's dd
CBT vs. BA1.6001.6002.8602.860No, p=.310p = .3100.3440.344
CBT vs. WL6.5006.5002.8602.860Yes, p<.001p < .0011.4001.400
BA vs. WL4.9004.9002.8602.860Yes, p<.001p < .0011.0551.055

Cohen's dd for each pair using MSwithin=21.60=4.648\sqrt{MS_{within}} = \sqrt{21.60} = 4.648:

dCBTWL=6.5/4.648=1.399d_{CBT-WL} = 6.5/4.648 = 1.399; dBAWL=4.9/4.648=1.054d_{BA-WL} = 4.9/4.648 = 1.054; dCBTBA=1.6/4.648=0.344d_{CBT-BA} = 1.6/4.648 = 0.344

Summary:

StatisticValue
F(2,87)F(2, 87)16.5216.52
pp<.001< .001
η2\eta^20.2680.268 (Large)
ω2\omega^20.2490.249 [95% CI: 0.150, 0.376] (Large)
Cohen's ff0.5760.576
CBT vs. Controld=1.40d = 1.40 (Large)
BA vs. Controld=1.05d = 1.05 (Large)
CBT vs. BAd=0.34d = 0.34 (Small; ns)

APA write-up: "A one-way between-subjects ANOVA revealed a significant effect of therapy type on post-treatment depression, F(2,87)=16.52F(2, 87) = 16.52, p<.001p < .001, ω2=0.249\omega^2 = 0.249 [95% CI: 0.150, 0.376], indicating a large effect. Tukey HSD post-hoc tests showed that both CBT (M=9.8M = 9.8, SD=4.2SD = 4.2) and Behavioural Activation (M=11.4M = 11.4, SD=4.6SD = 4.6) produced significantly lower depression scores than the Waitlist Control (M=16.3M = 16.3, SD=5.1SD = 5.1), dCBTWL=1.40d_{CBT-WL} = 1.40 and dBAWL=1.05d_{BA-WL} = 1.05 respectively (both p<.001p < .001). CBT and BA did not differ significantly from each other, d=0.34d = 0.34, p=.310p = .310."


Example 2: Two-Way Factorial ANOVA — Drug ×\times Exercise on Anxiety

A researcher uses a 2×32 \times 3 between-subjects design: Drug (Drug A vs. Placebo) ×\times Exercise (None, Moderate, High). n=10n = 10 per cell; DV = anxiety score (lower = less anxious). Total N=60N = 60.

Cell means:

No ExerciseModerateHighRow Mean
Drug A24.124.118.318.314.714.719.03319.033
Placebo27.427.423.823.822.122.124.43324.433
Col Mean25.7525.7521.0521.0518.4018.4021.73321.733

Grand mean: xˉ..=21.733\bar{x}_{..} = 21.733

Step 1 — Compute SS (all cells balanced, n=10n = 10):

SSDrug=3×10×[(19.03321.733)2+(24.43321.733)2]SS_{Drug} = 3 \times 10 \times [(19.033-21.733)^2 + (24.433-21.733)^2] =30×[7.290+7.290]=30×14.580=437.40= 30 \times [7.290 + 7.290] = 30 \times 14.580 = 437.40

SSExercise=2×10×[(25.7521.733)2+(21.0521.733)2+(18.4021.733)2]SS_{Exercise} = 2 \times 10 \times [(25.75-21.733)^2 + (21.05-21.733)^2 + (18.40-21.733)^2] =20×[16.136+0.467+11.109]=20×27.712=554.24= 20 \times [16.136 + 0.467 + 11.109] = 20 \times 27.712 = 554.24

Cell means for interaction SS:

SSD×E=10jk(xˉjkxˉj.xˉ.k+xˉ..)2SS_{D\times E} = 10\sum_j\sum_k(\bar{x}_{jk} - \bar{x}_{j.} - \bar{x}_{.k} + \bar{x}_{..})^2

Deviations from additive model:

Cellxˉjk\bar{x}_{jk}xˉj.\bar{x}_{j.}xˉ.k\bar{x}_{.k}xˉ..\bar{x}_{..}Deviation
Drug A, None24.119.03325.75021.73324.119.03325.750+21.733=1.05024.1 - 19.033 - 25.750 + 21.733 = 1.050
Drug A, Mod18.319.03321.05021.73318.319.03321.050+21.733=0.05018.3 - 19.033 - 21.050 + 21.733 = -0.050
Drug A, High14.719.03318.40021.73314.719.03318.400+21.733=1.00014.7 - 19.033 - 18.400 + 21.733 = -1.000
Placebo, None27.424.43325.75021.73327.424.43325.750+21.733=1.05027.4 - 24.433 - 25.750 + 21.733 = -1.050
Placebo, Mod23.824.43321.05021.73323.824.43321.050+21.733=0.05023.8 - 24.433 - 21.050 + 21.733 = 0.050
Placebo, High22.124.43318.40021.73322.124.43318.400+21.733=1.00022.1 - 24.433 - 18.400 + 21.733 = 1.000

SSD×E=10[(1.050)2+(0.050)2+(1.000)2+(1.050)2+(0.050)2+(1.000)2]SS_{D\times E} = 10[(1.050)^2 + (-0.050)^2 + (-1.000)^2 + (-1.050)^2 + (0.050)^2 + (1.000)^2]

=10[1.1025+0.0025+1.000+1.1025+0.0025+1.000]=10×4.210=42.10= 10[1.1025 + 0.0025 + 1.000 + 1.1025 + 0.0025 + 1.000] = 10 \times 4.210 = 42.10

Pooled within-cells SSwithinSS_{within}: Assume spooled2=16.4s^2_{pooled} = 16.4 (given), so:

SSwithin=(n1)×spooled2×cells=9×16.4×6=885.60SS_{within} = (n-1) \times s^2_{pooled} \times \text{cells} = 9 \times 16.4 \times 6 = 885.60

Step 2 — ANOVA source table:

SourceSSdfMSFFpp
Drug (D)437.40437.4011437.40437.4026.6726.67<.001< .001
Exercise (E)554.24554.2422277.12277.1216.9016.90<.001< .001
D×\timesE42.1042.102221.0521.051.281.28.285.285
Within (Error)885.60885.60545416.4016.40
Total1919.341919.345959

Step 3 — Partial omega squared for each effect:

ωp,D2=(437.401×16.40)/(1919.34+16.40)=421.00/1935.74=0.218\omega_{p,D}^2 = (437.40 - 1 \times 16.40)/(1919.34 + 16.40) = 421.00/1935.74 = 0.218

ωp,E2=(554.242×16.40)/(1919.34+16.40)=521.44/1935.74=0.269\omega_{p,E}^2 = (554.24 - 2 \times 16.40)/(1919.34 + 16.40) = 521.44/1935.74 = 0.269

ωp,D×E2=(42.102×16.40)/(1919.34+16.40)=9.30/1935.74=0.005\omega_{p,D\times E}^2 = (42.10 - 2 \times 16.40)/(1919.34 + 16.40) = 9.30/1935.74 = 0.005

Step 4 — Interpretation:

The interaction is not significant (p=.285p = .285) — the effect of Drug is consistent across all exercise levels. Interpret main effects:

APA write-up: "A 2×32 \times 3 between-subjects ANOVA examined the effects of Drug (Drug A vs. Placebo) and Exercise level (None, Moderate, High) on anxiety scores. The interaction was not significant, F(2,54)=1.28F(2, 54) = 1.28, p=.285p = .285, ωp2=0.005\omega_p^2 = 0.005. There were significant main effects of Drug, F(1,54)=26.67F(1, 54) = 26.67, p<.001p < .001, ωp2=0.218\omega_p^2 = 0.218 [95% CI: 0.102, 0.340], and Exercise, F(2,54)=16.90F(2, 54) = 16.90, p<.001p < .001, ωp2=0.269\omega_p^2 = 0.269 [95% CI: 0.136, 0.395]. Both effects were large."


Example 3: One-Way Repeated Measures ANOVA — Memory Scores Across Four Time Points

A cognitive psychologist measures word recall at four time points (immediate recall, 5-minute delay, 30-minute delay, 24-hour delay) in n=20n = 20 participants.

Condition means and SDs:

Time PointMean RecallSD
Immediate18.418.43.13.1
5 minutes15.715.73.43.4
30 minutes12.312.33.83.8
24 hours9.19.14.24.2

ANOVA results (given):

SourceSSdfMSFFpp
Between subjects1842.601842.60191996.9896.98
Time1104.801104.8033368.27368.2742.1442.14<.001< .001
Error498.20498.2057578.748.74
Total3445.603445.607979

Mauchly's test: W=0.81W = 0.81, χ2(5)=4.12\chi^2(5) = 4.12, p=.532p = .532 — sphericity holds; no correction needed.

Effect sizes:

ηp2=1104.80/(1104.80+498.20)=1104.80/1603.00=0.689\eta_p^2 = 1104.80/(1104.80 + 498.20) = 1104.80/1603.00 = 0.689

ηG2=1104.80/(1104.80+1842.60+498.20)=1104.80/3445.60=0.321\eta_G^2 = 1104.80/(1104.80 + 1842.60 + 498.20) = 1104.80/3445.60 = 0.321

ωp2=(1104.803×8.74)/(3445.60+8.74)=(1104.8026.22)/3454.34=1078.58/3454.34=0.312\omega_p^2 = (1104.80 - 3 \times 8.74)/(3445.60 + 8.74) = (1104.80 - 26.22)/3454.34 = 1078.58/3454.34 = 0.312

Post-hoc tests (Bonferroni-corrected pairwise comparisons):

6 comparisons; adjusted α=.05/6=.0083\alpha = .05/6 = .0083

Using paired t-tests on each pair (or use RM ANOVA contrast framework):

ComparisonMean Difft(19)t(19)padjp_{adj}dzd_z
Imm vs. 5 min2.72.74.234.23.004.0040.950.95
Imm vs. 30 min6.16.17.817.81<.001< .0011.751.75
Imm vs. 24 hr9.39.310.4210.42<.001< .0012.332.33
5 min vs. 30 min3.43.45.175.17<.001< .0011.161.16
5 min vs. 24 hr6.66.68.208.20<.001< .0011.831.83
30 min vs. 24 hr3.23.24.894.89<.001< .0011.091.09

All pairwise comparisons are significant — recall declines significantly at every delay interval.

APA write-up: "A one-way repeated measures ANOVA examined word recall across four time points. Mauchly's test indicated that the sphericity assumption was met, W=0.81W = 0.81, p=.532p = .532. There was a significant effect of time, F(3,57)=42.14F(3, 57) = 42.14, p<.001p < .001, ωp2=0.312\omega_p^2 = 0.312 [95% CI: 0.198, 0.421], ηG2=0.321\eta_G^2 = 0.321, indicating a large effect. Bonferroni-corrected pairwise comparisons revealed that recall declined significantly at each subsequent time point (all p<.008p < .008), with effect sizes ranging from dz=0.95d_z = 0.95 (immediate vs. 5-min) to dz=2.33d_z = 2.33 (immediate vs. 24-hr)."


Example 4: Kruskal-Wallis Test — Non-Parametric Comparison of Pain Ratings

A pain researcher compares pain ratings (0–10 VAS scale, ordinal) across three acupuncture protocols. Shapiro-Wilk tests indicate non-normality in all groups.

Data:

Protocol A (n1=8n_1=8)Protocol B (n2=7n_2=7)Protocol C (n3=6n_3=6)
3,5,2,6,4,5,3,43, 5, 2, 6, 4, 5, 3, 47,8,6,9,7,8,77, 8, 6, 9, 7, 8, 75,6,4,7,5,65, 6, 4, 7, 5, 6

N=21N = 21

Step 1 — Combined ranks:

Sorted values: 2(1), 3(2.5), 3(2.5), 4(5), 4(5), 4(5), 5(8), 5(8), 5(8), 5(8), 6(11.5), 6(11.5), 6(11.5), 7(15), 7(15), 7(15), 8(18.5), 8(18.5), 9(21)

Wait — let me recount: Protocol A: {3,5,2,6,4,5,3,4}, B: {7,8,6,9,7,8,7}, C: {5,6,4,7,5,6}

Combined sorted: 2,3,3,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,9

ValueCountRanksAvg Rank
2111.0
322–32.5
434–65.0
547–108.5
6411–1412.5
7415–1816.5
8219–2019.5
912121.0

Rank assignments:

Check: 45.5+122.0+63.5=231=21×22/245.5 + 122.0 + 63.5 = 231 = 21 \times 22/2

Step 2 — H statistic:

H=1221×22(45.528+122.027+63.526)3(22)H = \frac{12}{21 \times 22}\left(\frac{45.5^2}{8} + \frac{122.0^2}{7} + \frac{63.5^2}{6}\right) - 3(22)

=12462(2070.258+148847+4032.256)66= \frac{12}{462}\left(\frac{2070.25}{8} + \frac{14884}{7} + \frac{4032.25}{6}\right) - 66

=0.02597(258.78+2126.29+672.04)66= 0.02597\left(258.78 + 2126.29 + 672.04\right) - 66

=0.02597×3057.1166=79.3966=13.39= 0.02597 \times 3057.11 - 66 = 79.39 - 66 = 13.39

Tie correction factor:

m(tm3tm)=(131)+(232)+(333)+(434)+(434)+(434)+(232)+(131)\sum_m(t_m^3-t_m) = (1^3-1)+(2^3-2)+(3^3-3)+(4^3-4)+(4^3-4)+(4^3-4)+(2^3-2)+(1^3-1)

=0+6+24+60+60+60+6+0=216= 0+6+24+60+60+60+6+0 = 216

Hc=13.39/(121621321)=13.39/(12169240)=13.39/0.9766=13.71H_c = 13.39 / \left(1 - \frac{216}{21^3-21}\right) = 13.39 / \left(1 - \frac{216}{9240}\right) = 13.39 / 0.9766 = 13.71

Step 3 — p-value:

p=P(χ2213.71)=.001p = P(\chi^2_2 \geq 13.71) = .001

Step 4 — Effect size:

ηH2=(13.713+1)/(213)=11.71/18=0.651\eta^2_H = (13.71 - 3 + 1)/(21 - 3) = 11.71/18 = 0.651

This is a very large effect — protocol membership explains approximately 65% of the rank variability in pain ratings.

Dunn post-hoc tests (Holm-corrected):

RˉA=45.5/8=5.69,RˉB=122.0/7=17.43,RˉC=63.5/6=10.58\bar{R}_A = 45.5/8 = 5.69, \quad \bar{R}_B = 122.0/7 = 17.43, \quad \bar{R}_C = 63.5/6 = 10.58

SEAB=21×22/12×(1/8+1/7)=38.5×0.2679=10.31=3.211SE_{AB} = \sqrt{21 \times 22/12 \times (1/8+1/7)} = \sqrt{38.5 \times 0.2679} = \sqrt{10.31} = 3.211

zAB=(5.6917.43)/3.211=11.74/3.211=3.657,rrb=2(3.657)/15=1.889z_{AB} = (5.69-17.43)/3.211 = -11.74/3.211 = -3.657, \quad r_{rb} = 2(-3.657)/\sqrt{15} = -1.889 (cap at 1-1)

Use: rrb=(UAUB)/(nAnB)r_{rb} = (U_A - U_B)/(n_A n_B) — from rank-based approach:

Comparisonzzpp (Holm-adj)rrbr_{rb}
A vs. B3.657-3.657<.001< .0010.895-0.895
A vs. C1.842-1.842.065.0650.532-0.532
B vs. C1.9141.914.056.0560.4970.497

APA write-up: "Due to significant non-normality, a Kruskal-Wallis H test was conducted to compare pain ratings across three acupuncture protocols. The test revealed a significant difference, H(2)=13.71H(2) = 13.71, p=.001p = .001, ηH2=0.651\eta^2_H = 0.651, indicating a very large effect of protocol. Holm-corrected Dunn post-hoc tests revealed that Protocol A (Mdn = 4.5) produced significantly lower pain ratings than Protocol B (Mdn = 7.0), z=3.66z = -3.66, p<.001p < .001, rrb=.895r_{rb} = -.895. Differences between A and C and between B and C did not survive correction (p=.065p = .065 and p=.056p = .056 respectively)."


15. Common Mistakes and How to Avoid Them

Mistake 1: Interpreting the Omnibus F Without Post-Hoc Tests

Problem: Reporting a significant FF and concluding that all groups differ from each other, or that a specific pair of groups differs, without conducting post-hoc tests. The omnibus F tells you only that at least one difference exists.

Solution: Always follow a significant omnibus F with appropriate post-hoc tests or planned contrasts. Specify which test was used and apply the correct FWER correction. Report all pairwise comparisons with adjusted p-values and individual effect sizes.


Mistake 2: Reporting η2\eta^2 as if It Were ω2\omega^2

Problem: Reporting η2\eta^2 (e.g., η2=0.23\eta^2 = 0.23) and labelling it simply as "effect size" or, worse, confusing it with the less-biased ω2\omega^2. η2\eta^2 is consistently biased upward and overestimates the population effect, sometimes substantially in small samples with few groups.

Solution: Always report ω2\omega^2 (or ωp2\omega_p^2 for factorial designs) as the primary effect size, and label all effect sizes precisely. If η2\eta^2 is reported (e.g., for software compatibility), clearly note that it is biased and report ω2\omega^2 alongside.


Mistake 3: Confusing η2\eta^2 and ηp2\eta_p^2 in Factorial Designs

Problem: In factorial ANOVA with two or more factors, ηp2\eta_p^2 values can sum to more than 1.0 across all effects. Reporting ηp2\eta_p^2 and describing it as "the proportion of total variance explained" is incorrect — it is the proportion of variance explained after removing the other effects.

Solution: Use η2\eta^2 for total-variance proportions and ηp2\eta_p^2 for partial proportions, always labelling them distinctly. Preferably, use ωp2\omega_p^2 or ηG2\eta_G^2 and state which was used.


Mistake 4: Ignoring Significant Interactions and Interpreting Main Effects Alone

Problem: When a significant A×\timesB interaction is present, reporting and interpreting main effects as if the interaction did not exist. The main effect of A is the average effect across all levels of B — if the interaction is disordinal (crossover), this average is actively misleading.

Solution: Test for interactions before interpreting main effects. When an interaction is significant, probe it with simple effects analysis and interaction plots. Describe the pattern of the interaction rather than (or in addition to) the main effects.


Mistake 5: Using One-Way ANOVA When Repeated Measures ANOVA is Needed

Problem: Treating pre-post data from the same participants as independent groups and running a between-subjects one-way ANOVA. This inflates the error term with between- person variability, severely reduces power, and violates the independence assumption.

Solution: Identify whether data come from different participants (between-subjects) or the same participants (within-subjects). Use repeated measures ANOVA when each participant contributes more than one score. If in doubt, check whether the data file has one row per participant.


Mistake 6: Not Checking or Correcting for Sphericity Violations

Problem: Running repeated measures ANOVA in SPSS or R and not checking Mauchly's test, or checking it but ignoring a significant result and reporting uncorrected values.

Solution: Always report Mauchly's test result. When p<.05p < .05, report the Greenhouse-Geisser (or Huynh-Feldt if ε>0.75\varepsilon > 0.75) corrected results. Report both ε\varepsilon and the corrected df alongside the F-statistic.


Mistake 7: Applying Standard ANOVA When Variances Are Unequal

Problem: Using classical ANOVA with unequal group sizes and markedly different group variances (smax2/smin2>4s^2_{max}/s^2_{min} > 4). This produces inflated Type I error rates and untrustworthy p-values.

Solution: When Levene's test is significant (especially with unequal nn), use Welch's one-way ANOVA with Games-Howell post-hoc tests. Report Levene's test result in the method section and justify the choice of test.


Mistake 8: Running Multiple Pairwise t-Tests After ANOVA Without Correction

Problem: After a significant F, running all pairwise t-tests without applying a multiple comparisons correction, effectively using α=.05\alpha = .05 per comparison and inflating FWER.

Solution: Use a proper post-hoc procedure (Tukey HSD, Games-Howell, Holm-Bonferroni) that controls the FWER. Fisher's LSD (uncorrected pairwise tests) is not appropriate as a standalone post-hoc procedure unless there are only K=3K = 3 groups.


Mistake 9: Interpreting Non-Significant Interactions as Absence of Interaction

Problem: Concluding that "there is no interaction" based solely on p>.05p > .05 for the interaction term. A non-significant interaction test only indicates insufficient evidence for an interaction, not evidence of its absence. Underpowered studies routinely fail to detect real interactions.

Solution: Report the effect size for the interaction (ωp2\omega_p^2, ηG2\eta_G^2) and its 95% CI alongside the p-value. If the CI is wide, acknowledge low precision. Consider equivalence testing for the interaction if absence of interaction is the primary claim.


Mistake 10: Failing to Report Descriptive Statistics and Visualisations for Factorial Designs

Problem: In factorial and repeated measures ANOVA, reporting only the omnibus F- statistics without cell means, standard deviations, and interaction plots. Statistical significance alone is uninterpretable without the pattern of means.

Solution: Always report means and standard deviations (or standard errors) for every cell. For factorial designs, always include an interaction plot. For repeated measures, include a profile plot. Raincloud plots (half violin + box + scatter) are increasingly recommended for transparent reporting of individual data.


16. Troubleshooting

ProblemLikely CauseSolution
F<1.0F < 1.0MSwithin>MSbetweenMS_{within} > MS_{between}; no treatment effect or large within-group variabilityReport as non-significant; consider power; inspect within-group variability
ω2\omega^2 or ε2\varepsilon^2 is negativeTrue effect near zero; MSbetween<(K1)MSwithinMS_{between} < (K-1)MS_{within}Report as 0 (convention); increase sample size; note small effect
ηp2\eta_p^2 values sum to >1.0> 1.0Expected in factorial ANOVA; ηp2\eta_p^2 is not a total-variance proportionSwitch to η2\eta^2 or ηG2\eta_G^2 for total-variance interpretation
Mauchly's test is significantSphericity violated (common in K3K \geq 3 repeated measures)Apply GG correction (if ε0.75\varepsilon \leq 0.75) or HF (if ε>0.75\varepsilon > 0.75); report ε\varepsilon
Levene's test is significantHeterogeneous variances across groupsUse Welch's ANOVA with Games-Howell post-hoc
Interaction is significant but interaction plot looks parallelScaling issue on plot axes; small but real interactionRescale y-axis to start at true minimum; report ωp2\omega_p^2 for interaction
Post-hoc tests reveal no significant pairs despite significant FFEffect is driven by small differences across many pairs; no single large pairReport omnibus and note no individual pair survives correction; reduce FWER burden with planned contrasts
Planned contrasts do not sum to zeroContrast coding errorRe-specify: jcj=0\sum_j c_j = 0 for all contrasts
ANOVA gives different result to multiple t-testsANOVA uses pooled error term; t-tests use only two groupsTrust ANOVA; the pooled error is more stable
Repeated measures ANOVA has very different ηp2\eta_p^2 vs. ηG2\eta_G^2Large between-subjects variance inflating ηG2\eta_G^2 denominatorReport both; ηG2\eta_G^2 is preferred for cross-design comparison
Very large FF with very small ω2\omega^2Large NN; even tiny mean differences are statistically significantReport effect size — statistical significance does not imply practical significance
Cell size is 0 for some factorial cellsEmpty cells in designEmpty cells break standard ANOVA; use regression approach or multilevel modelling
Significant ANCOVA result changes after adding covariate×\timesgroup interactionHomogeneity of regression slopes violatedStandard ANCOVA is inappropriate; use moderated regression instead
Kruskal-Wallis is significant but Dunn tests show no significant pairsConservative Bonferroni correction; effect spread across many pairsUse Holm correction instead; report Dunn tests without correction if all planned
Friedman test statistic is 0Identical rankings across all participantsVerify data; check for data entry errors or insufficient variability

17. Quick Reference Cheat Sheet

Core ANOVA Equations

FormulaDescription
SSB=jnj(xˉjxˉ..)2SS_B = \sum_j n_j(\bar{x}_j - \bar{x}_{..})^2Between-groups SS (one-way)
SSW=ji(xijxˉj)2SS_W = \sum_j\sum_i(x_{ij}-\bar{x}_j)^2Within-groups SS (one-way)
SST=SSB+SSWSS_T = SS_B + SS_WTotal SS decomposition
MSB=SSB/(K1)MS_B = SS_B/(K-1)Between-groups mean square
MSW=SSW/(NK)MS_W = SS_W/(N-K)Within-groups mean square (error)
F=MSB/MSWF = MS_B/MS_WF-ratio (one-way ANOVA)
p=P(FK1,  NKFobs)p = P(F_{K-1,\;N-K} \geq F_{obs})One-way ANOVA p-value
SSA=bnj(xˉj.xˉ..)2SS_A = bn\sum_j(\bar{x}_{j.}-\bar{x}_{..})^2Factor A SS (factorial, balanced)
SSA×B=SScellsSSASSBSS_{A\times B} = SS_{cells} - SS_A - SS_BInteraction SS
FA=MSA/MSwithinF_A = MS_A/MS_{within}Factorial F for main effect A
SScond=nk(xˉ.kxˉ..)2SS_{cond} = n\sum_k(\bar{x}_{.k}-\bar{x}_{..})^2Conditions SS (repeated measures)
SSsubj=Ki(xˉi.xˉ..)2SS_{subj} = K\sum_i(\bar{x}_{i.}-\bar{x}_{..})^2Subjects SS (repeated measures)
SSerror=SSTSScondSSsubjSS_{error} = SS_T - SS_{cond} - SS_{subj}Error SS (repeated measures)

Effect Size Formulas

FormulaDescription
η2=SSeffect/SStotal\eta^2 = SS_{effect}/SS_{total}Eta squared (one-way; biased)
ηp2=SSeffect/(SSeffect+SSerror)\eta_p^2 = SS_{effect}/(SS_{effect}+SS_{error})Partial eta squared (factorial)
ηG2=SScond/(SScond+SSsubj+SSerror)\eta_G^2 = SS_{cond}/(SS_{cond}+SS_{subj}+SS_{error})Generalised eta squared (RM/mixed)
ω2=(SSB(K1)MSW)/(SST+MSW)\omega^2 = (SS_B-(K-1)MS_W)/(SS_T+MS_W)Omega squared (one-way; preferred)
ωp2=(SSeffdfeffMSerr)/(SST+MSerr)\omega_p^2 = (SS_{eff}-df_{eff}MS_{err})/(SS_T+MS_{err})Partial omega squared (factorial)
ε2=(SSB(K1)MSW)/SST\varepsilon^2 = (SS_B-(K-1)MS_W)/SS_TEpsilon squared (one-way)
f=η2/(1η2)f = \sqrt{\eta^2/(1-\eta^2)}Cohen's ff (from η2\eta^2)
f=ω2/(1ω2)f = \sqrt{\omega^2/(1-\omega^2)}Cohen's ff (from ω2\omega^2; preferred)
η2=FdfB/(FdfB+dfW)\eta^2 = F\cdot df_B/(F\cdot df_B + df_W)η2\eta^2 from FF-statistic
ω2(F1)dfB/(FdfB+dfW+1)\omega^2 \approx (F-1)\cdot df_B/(F\cdot df_B+df_W+1)ω2\omega^2 from FF-statistic (approx)
djk=(xˉjxˉk)/MSwithind_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_{within}}Cohen's dd for post-hoc pairwise

Non-Parametric Formulas

FormulaDescription
H=12N(N+1)jWj2/nj3(N+1)H = \frac{12}{N(N+1)}\sum_j W_j^2/n_j - 3(N+1)Kruskal-Wallis HH
ηH2=(HK+1)/(NK)\eta^2_H = (H-K+1)/(N-K)Effect size for Kruskal-Wallis
χr2=12nK(K+1)kRk23n(K+1)\chi^2_r = \frac{12}{nK(K+1)}\sum_k R_k^2 - 3n(K+1)Friedman χr2\chi^2_r
W=χr2/(n(K1))W = \chi^2_r/(n(K-1))Kendall's WW (Friedman effect size)
zjk=(RˉjRˉk)/SEjkz_{jk} = (\bar{R}_j-\bar{R}_k)/SE_{jk}Dunn's test statistic
rrb,jk=2zjk/nj+nkr_{rb,jk} = 2z_{jk}/\sqrt{n_j+n_k}Rank-biserial rr (pairwise)
FW=weighted SSbetween/correctionF_W = \text{weighted SS}_{between}/\text{correction}Welch's one-way ANOVA

Sphericity Corrections

FormulaDescription
εGG=(jσ^jj)2/((K1)jkσ^jk2)\varepsilon_{GG} = (\sum_j\hat\sigma_{jj})^2/((K-1)\sum_j\sum_k\hat\sigma_{jk}^2)Greenhouse-Geisser epsilon
εHF=n(K1)εGG2(K1)(n1(K1)εGG)\varepsilon_{HF} = \frac{n(K-1)\varepsilon_{GG}-2}{(K-1)(n-1-(K-1)\varepsilon_{GG})}Huynh-Feldt epsilon
df=εdfuncorrecteddf^* = \varepsilon \cdot df_{uncorrected}Corrected degrees of freedom

Post-Hoc Test Selection Guide

ConditionRecommended Post-Hoc Test
Balanced design, equal variancesTukey HSD
Unbalanced design, equal variancesTukey-Kramer
Unequal variances or group sizesGames-Howell
All groups vs. one controlDunnett's test
All possible contrasts (not just pairwise)Scheffé
Any design, conservativeBonferroni
Any design, less conservative than BonferroniHolm-Bonferroni
Non-parametric (Kruskal-Wallis)Dunn test with Holm correction
Non-parametric (Friedman)Wilcoxon signed-rank + Holm, or Conover

APA 7th Edition Reporting Templates

One-Way Between-Subjects ANOVA: "A one-way between-subjects ANOVA revealed [a significant / no significant] effect of [IV] on [DV], F(dfB,dfW)=F(df_B, df_W) = [value], p=p = [value], ω2=\omega^2 = [value] [95% CI: LB, UB]. [Post-hoc results if significant.]"

Factorial Between-Subjects ANOVA: "A a×ba\times b between-subjects ANOVA was conducted. The [IVA×_A \times IVB_B] interaction [was / was not] significant, F(dfAxB,dfW)=F(df_{AxB}, df_W) = [value], p=p = [value], ωp2=\omega_p^2 = [value] [95% CI: LB, UB]. [Describe interaction pattern or, if not significant, main effects:] There was a significant main effect of [IVA_A], F(dfA,dfW)=F(df_A, df_W) = [value], p=p = [value], ωp2=\omega_p^2 = [value] [95% CI: LB, UB], and [of IVB_B], F(dfB,dfW)=F(df_B, df_W) = [value], p=p = [value], ωp2=\omega_p^2 = [value] [95% CI: LB, UB]."

One-Way Repeated Measures ANOVA: "A one-way repeated measures ANOVA was conducted. Mauchly's test [indicated / did not indicate] a violation of sphericity, W=W = [value], p=p = [value][; consequently, Greenhouse-Geisser / Huynh-Feldt corrected values are reported, ε=\varepsilon = [value]]. There was a significant effect of [condition], F(εdfcond,  εdferror)=F(\varepsilon\cdot df_{cond},\; \varepsilon\cdot df_{error}) = [value], p=p = [value], ωp2=\omega_p^2 = [value] [95% CI: LB, UB], ηG2=\eta_G^2 = [value]."

Mixed ANOVA: "A aa-level (between) ×\times bb-level (within) mixed ANOVA was conducted. Mauchly's test [was / was not] significant for the within-subjects factor, W=W = [value], p=p = [value][; GG correction applied, ε=\varepsilon = [value]]. The [between ×\times within] interaction [was / was not] significant, F(df,df)=F(df, df) = [value], p=p = [value], ηG2=\eta_G^2 = [value] [95% CI: LB, UB]. [Describe simple effects if significant.]"

Kruskal-Wallis: "A Kruskal-Wallis H test was conducted due to [non-normality / ordinal data]. The test revealed [a significant / no significant] difference across groups, H(df)=H(df) = [value], p=p = [value], ηH2=\eta^2_H = [value]. [Dunn pairwise post-hoc results if significant.]"

Friedman Test: "A Friedman test was conducted. There was [a significant / no significant] difference across conditions, χr2(df)=\chi^2_r(df) = [value], p=p = [value], W=W = [value]."

Welch's One-Way ANOVA: "Due to significant heterogeneity of variance (Levene's F(df1,df2)=F(df_1, df_2) = [value], p=p = [value]), Welch's one-way ANOVA was applied. Results indicated [a significant / no significant] effect of [IV] on [DV], FW(K1,νW)=F_W(K-1, \nu_W) = [value], p=p = [value], ω2=\omega^2 = [value] [95% CI: LB, UB]. Games-Howell post-hoc tests were used."

Required Sample Size — One-Way ANOVA (80% Power, α=.05\alpha = .05)

Cohen's ffLabelK=3K = 3K=4K = 4K=5K = 5K=6K = 6
0.10Small322274240215
0.15Small-Med14412310796
0.25Medium52453935
0.35Med-Large27232119
0.40Large21181614
0.50Large14121110

All values are nn per group. Multiply by KK for total NN.

Cohen's Benchmarks — ANOVA Effect Sizes

Labelη2\eta^2 / ω2\omega^2ffηp2\eta_p^2 (approx)
Small0.010.010.100.100.010.01
Medium0.060.060.250.250.060.06
Large0.140.140.400.400.140.14

Note: Cohen's benchmarks for η2\eta^2 apply approximately to ω2\omega^2 and ηp2\eta_p^2. Always prioritise domain-specific benchmarks over these generic conventions.

Degrees of Freedom Reference

DesignSourcedf
One-way betweenBetweenK1K-1
WithinNKN-K
TotalN1N-1
Factorial (a×ba\times b)Aa1a-1
Bb1b-1
A×\timesB(a1)(b1)(a-1)(b-1)
Withinab(n1)ab(n-1)
One-way RMConditionsK1K-1
Subjectsn1n-1
Error(K1)(n1)(K-1)(n-1)
Mixed (aa between, bb within)A (between)a1a-1
S(A) (between error)a(n1)a(n-1)
B (within)b1b-1
A×\timesB(a1)(b1)(a-1)(b-1)
B×\timesS(A) (within error)a(n1)(b1)a(n-1)(b-1)

Assumption Checks Reference

AssumptionTestAction if Violated
Normality of residualsShapiro-Wilk, Q-Q plotKruskal-Wallis / Friedman; transform data
Homogeneity of varianceLevene's, Brown-ForsytheWelch's ANOVA + Games-Howell
Sphericity (RM designs)Mauchly's test (WW, ε\varepsilon)GG correction (ε0.75\varepsilon \leq 0.75), HF (ε>0.75\varepsilon > 0.75)
Homogeneity of regression slopes (ANCOVA)Group×\timesCovariate interaction testUse moderated regression instead
IndependenceDesign reviewMixed models / multilevel ANOVA
OutliersBoxplots, standardised residuals ($z
Interval scaleMeasurement theoryNon-parametric alternatives

ANOVA Reporting Checklist

ItemRequired
FF-statistic with both df✅ Always
Exact p-value (or p<.001p < .001)✅ Always
ω2\omega^2 or ωp2\omega_p^2 with 95% CI✅ Always (preferred over η2\eta^2)
Which effect size was reported (η2\eta^2 vs. ω2\omega^2 etc.)✅ Always
Group means and SDs for all groups/cells✅ Always
Sample sizes per group/cell✅ Always
Levene's test result (between-subjects)✅ For independent designs
Mauchly's test and ε\varepsilon (within-subjects)✅ For RM and mixed designs
Which sphericity correction applied (GG or HF)✅ When Mauchly's significant
ηG2\eta_G^2 for repeated measures / mixed✅ Recommended
Post-hoc test name and FWER correction method✅ When omnibus F significant
Post-hoc pairwise differences with adjusted pp and djkd_{jk}✅ When omnibus F significant
Interaction plot for factorial/mixed designs✅ When interaction significant
Simple effects for significant interactions✅ When interaction significant
SS Type for unbalanced factorial designs✅ For unbalanced factorial
Power analysis or sensitivity analysis✅ For null results
Whether Welch's ANOVA was used✅ If variances are unequal
Domain-specific benchmark context✅ Recommended

Conversion Formulas

FromToFormula
η2\eta^2fff=η2/(1η2)f = \sqrt{\eta^2/(1-\eta^2)}
ω2\omega^2fff=ω2/(1ω2)f = \sqrt{\omega^2/(1-\omega^2)}
ffη2\eta^2η2=f2/(1+f2)\eta^2 = f^2/(1+f^2)
FF, dfBdf_B, dfWdf_Wη2\eta^2η2=FdfB/(FdfB+dfW)\eta^2 = F\cdot df_B/(F\cdot df_B+df_W)
FF, dfBdf_B, dfWdf_Wω2\omega^2 (approx)ω2=(F1)dfB/(FdfB+dfW+1)\omega^2 = (F-1)\cdot df_B/(F\cdot df_B+df_W+1)
η2\eta^2 (2 groups)Cohen's ddd=2η2/(1η2)d = 2\sqrt{\eta^2/(1-\eta^2)}
Cohen's dd (2 groups)η2\eta^2η2=d2/(d2+4)\eta^2 = d^2/(d^2+4)
η2\eta^2rr (2 groups)r=η2r = \sqrt{\eta^2}
WW (Kendall's)rr (avg pairwise)r=(nW1)/(n1)r = (nW-1)/(n-1)
χr2\chi^2_rηF2\eta^2_FηF2=χr2/(n(K1))\eta^2_F = \chi^2_r/(n(K-1))

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting ANOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for applied coverage, Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth, Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for robust alternatives, Olejnik & Algina's (2003) "Generalized Eta and Omega Squared Statistics" (Educational and Psychological Measurement) for effect size recommendations in repeated measures designs, and Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science" (Frontiers in Psychology, 2013) for practical effect size guidance. For Bayesian ANOVA, see Rouder et al. (2012) in the Journal of Mathematical Psychology. For feature requests or support, contact the DataStatPro team.