ANOVA Tests and Alternatives: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of analysis of variance all the way through one-way, factorial, repeated measures, and mixed ANOVA designs, their non-parametric alternatives, post-hoc testing, effect sizes, and practical usage within the DataStatPro application. Whether you are encountering ANOVA for the first time or deepening your understanding of variance decomposition and group comparison, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is ANOVA?
The Mathematics Behind ANOVA
Assumptions of ANOVA
Types of ANOVA
Using the ANOVA Calculator Component
One-Way Between-Subjects ANOVA
Factorial Between-Subjects ANOVA
One-Way Repeated Measures ANOVA
Mixed ANOVA
Post-Hoc Tests and Planned Contrasts
Non-Parametric Alternatives
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into ANOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 The Logic of Variance Decomposition

ANOVA is built upon one foundational insight: total variability in a dataset can be partitioned into components attributable to specific sources. For a one-way design:

$SS_{total} = SS_{between} + SS_{within}$

If the between-group variance is substantially larger than the within-group variance, it suggests that group membership explains a meaningful portion of the variability in scores — that is, the groups differ. This ratio of variances is the F-statistic.

1.2 The F-Distribution

The F-distribution arises from the ratio of two independent chi-squared distributions divided by their respective degrees of freedom:

$F = \frac{\chi^2_1 / \nu_1}{\chi^2_2 / \nu_2}$

In the ANOVA context, this becomes the ratio of two mean squares:

$F = \frac{MS_{between}}{MS_{within}} = \frac{SS_{between}/df_{between}}{SS_{within}/df_{within}}$

Under $H_0$ (all group means equal), this ratio equals approximately 1. Values much greater than 1 provide evidence against $H_0$ .

The F-distribution is characterised by two parameters:

$df_1$ = numerator degrees of freedom (between-groups)
$df_2$ = denominator degrees of freedom (within-groups/error)

It is always non-negative and right-skewed.

1.3 Why Not Multiple t-Tests?

A natural question is: why not simply run multiple t-tests to compare groups? With $K$ groups, this would require $\binom{K}{2} = K(K-1)/2$ pairwise tests.

The familywise error rate (FWER) inflates:

$FWER = 1 - (1-\alpha)^m$

Where $m$ is the number of tests. For $K = 4$ groups: $m = 6$ pairwise tests; $FWER = 1 - (0.95)^6 = .265$ — far above the nominal $.05$ .

ANOVA maintains the Type I error at $\alpha$ for the omnibus test that all group means are simultaneously equal.

1.4 The Relationship Between ANOVA and Regression

ANOVA and regression are mathematically equivalent. Both are special cases of the General Linear Model (GLM):

$Y_i = \mathbf{X}_i \boldsymbol{\beta} + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, \sigma^2)$

In ANOVA, the predictors $\mathbf{X}$ are categorical group membership variables (dummy or effect coded). Understanding this equivalence is essential for interpreting interactions and for moving to ANCOVA and mixed models.

1.5 Main Effects and Interactions

In factorial designs (multiple independent variables):

A main effect is the effect of one independent variable (IV) averaged across all levels of the other IV(s).
An interaction effect occurs when the effect of one IV depends on the level of another IV.

⚠️ When a significant interaction is present, main effects must be interpreted with extreme caution — the "average" effect of a variable may be misleading when the effect differs substantially across levels of the other variable.

1.6 Fixed vs. Random Effects

Fixed effects: The levels of the IV are specifically chosen and are the only levels of interest (e.g., three specific drug dosages). Conclusions apply only to those levels.
Random effects: The levels of the IV are a random sample from a larger population of possible levels (e.g., 10 randomly selected schools). Conclusions generalise to the population of levels.
Mixed effects: Some factors are fixed, others are random. This is the basis of mixed models (also called hierarchical linear models).

Standard ANOVA assumes all factors are fixed. When factors are random, the denominator of the F-ratio changes.

1.7 Variance Explained: $\eta^2$ , $\omega^2$ , and $\varepsilon^2$

Effect sizes for ANOVA are variance-explained indices — they quantify what proportion of the total (or residual) variance is attributable to a given effect. These are reviewed in detail in the Mathematics section, but the key formulas are:

$\eta^2 = \frac{SS_{effect}}{SS_{total}}, \quad \omega^2 = \frac{SS_{effect}-(df_{effect})MS_{error}}{SS_{total}+MS_{error}}, \quad \varepsilon^2 = \frac{SS_{effect}-(df_{effect})MS_{error}}{SS_{total}}$

Both $\omega^2$ and $\varepsilon^2$ correct for the positive bias of $\eta^2$ in finite samples and are preferred for reporting.

2. What is ANOVA?

2.1 The Core Idea

Analysis of Variance (ANOVA) is a parametric inferential procedure for testing whether three or more population means differ simultaneously. Despite its name, ANOVA tests mean differences by comparing variances — specifically, by assessing whether the variability between groups is larger than expected given the variability within groups.

The general form of the F-statistic:

$F = \frac{\text{Variance between groups (signal)}}{\text{Variance within groups (noise)}} = \frac{MS_{between}}{MS_{within}}$

A large F indicates that the between-group differences are large relative to random sampling error — evidence that at least one group mean differs from the others.

2.2 What ANOVA Tests and Does Not Test

ANOVA tells you:

Whether at least one group mean differs from the others (omnibus test).
How much variance in the outcome is explained by group membership ( $\eta^2$ , $\omega^2$ ).

ANOVA does NOT tell you:

Which specific groups differ from each other (requires post-hoc tests or planned contrasts).
The direction or magnitude of specific pairwise differences.
Whether the omnibus difference is practically meaningful (requires effect sizes with CIs).

2.3 The ANOVA Family

Design	Independent Variables	Participants	ANOVA Type
One factor, different participants per group	1 (between)	Different	One-way between-subjects
One factor, same participants in all groups	1 (within)	Same	One-way repeated measures
Two+ factors, different participants per cell	2+ (between)	Different	Factorial between-subjects
Two factors: one between, one within	1 between, 1 within	Mixed	Mixed (split-plot) ANOVA
Two+ factors, same participants in all cells	2+ (within)	Same	Fully within-subjects factorial

2.4 ANOVA in Context

The ANOVA test is one member of a broader family of procedures for comparing group means:

Situation	Test
2 groups, independent, normal	t-test (Welch's recommended)
3+ groups, independent, normal, equal variances	One-way ANOVA
3+ groups, independent, normal, unequal variances	Welch's one-way ANOVA
3+ groups, independent, non-normal or ordinal	Kruskal-Wallis test
3+ conditions, same participants, normal	Repeated measures ANOVA
3+ conditions, same participants, non-normal	Friedman test
2+ factors (between), normal	Factorial ANOVA
1 between + 1 within factor	Mixed ANOVA
Controlling for a covariate	ANCOVA
Multiple dependent variables simultaneously	MANOVA

3. The Mathematics Behind ANOVA

3.1 One-Way ANOVA: Sum of Squares Decomposition

Consider $K$ groups with $n_j$ observations in group $j$ and total $N = \sum_{j=1}^K n_j$ observations. Let $\bar{x}_j$ be the mean of group $j$ and $\bar{x}_{..}$ be the grand mean.

Grand mean:

$\bar{x}_{..} = \frac{1}{N}\sum_{j=1}^K \sum_{i=1}^{n_j} x_{ij}$

Total sum of squares ( $SS_{total}$ ): Total variability in the data.

$SS_{total} = \sum_{j=1}^K \sum_{i=1}^{n_j} (x_{ij} - \bar{x}_{..})^2$

$df_{total} = N - 1$

Between-groups sum of squares ( $SS_{between}$ ): Variability due to group differences.

$SS_{between} = \sum_{j=1}^K n_j (\bar{x}_j - \bar{x}_{..})^2$

$df_{between} = K - 1$

Within-groups sum of squares ( $SS_{within}$ ): Variability within groups (error).

$SS_{within} = \sum_{j=1}^K \sum_{i=1}^{n_j} (x_{ij} - \bar{x}_j)^2$

$df_{within} = N - K$

Verification: $SS_{total} = SS_{between} + SS_{within}$

3.2 Mean Squares and the F-Statistic

Mean squares are sums of squares divided by their degrees of freedom — they are variance estimates:

$MS_{between} = \frac{SS_{between}}{K-1}$

$MS_{within} = \frac{SS_{within}}{N-K}$

The F-statistic:

$F = \frac{MS_{between}}{MS_{within}}$

Under $H_0: \mu_1 = \mu_2 = \cdots = \mu_K$ , the F-statistic follows an F-distribution with $(K-1, N-K)$ degrees of freedom. The p-value is:

$p = P(F_{K-1,\;N-K} \geq F_{obs})$

3.3 The Expected Mean Squares

Understanding why the F-ratio works requires examining the expected values of the mean squares under $H_0$ and $H_1$ :

$E[MS_{within}] = \sigma^2$ (always an unbiased estimate of $\sigma^2$ )

$E[MS_{between}] = \sigma^2 + \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{K-1}$

Under $H_0$ (all $\mu_j$ equal): $E[MS_{between}] = \sigma^2$ , so $E[F] \approx 1$ .

Under $H_1$ (some $\mu_j$ differ): $E[MS_{between}] > \sigma^2$ , so $E[F] > 1$ .

The non-centrality parameter of the F-distribution:

$\lambda = \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{\sigma^2}$

This links the population effect size to the expected F-statistic and is used for power analysis.

3.4 The ANOVA Source Table

The standard ANOVA output is presented in a source table:

Source	SS	df	MS	$F$	$p$
Between groups	$SS_B$	$K-1$	$MS_B = SS_B/(K-1)$	$MS_B/MS_W$	$P(F \geq F_{obs})$
Within groups (Error)	$SS_W$	$N-K$	$MS_W = SS_W/(N-K)$
Total	$SS_T$	$N-1$

3.5 Effect Sizes for One-Way ANOVA

Eta squared ( $\eta^2$ ) — biased, but widely reported:

$\eta^2 = \frac{SS_{between}}{SS_{total}}$

Omega squared ( $\omega^2$ ) — bias-corrected, preferred:

$\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}}$

Epsilon squared ( $\varepsilon^2$ ) — alternative bias correction:

$\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}}$

Cohen's $f$ — for power analysis:

$f = \sqrt{\frac{\eta^2}{1-\eta^2}} \quad \text{or} \quad f = \sqrt{\frac{\omega^2}{1-\omega^2}}$

Relationship between effect sizes: $\omega^2 \leq \varepsilon^2 \leq \eta^2$

Cohen's (1988) benchmarks:

Label	$\eta^2$ or $\omega^2$	$f$
Small	$0.01$	$0.10$
Medium	$0.06$	$0.25$
Large	$0.14$	$0.40$

3.6 Confidence Intervals for ANOVA Effect Sizes

CIs for $\eta^2$ and $\omega^2$ use the non-central F-distribution. The observed F-statistic follows a non-central F-distribution with non-centrality parameter $\lambda$ related to the population effect:

$\lambda = \frac{\eta^2 \cdot N}{1-\eta^2}$

The 95% CI bounds for $\lambda$ are found numerically (inverting the non-central F CDF), then converted to $\eta^2$ :

$\eta^2_L = \frac{\lambda_L}{\lambda_L + N}, \qquad \eta^2_U = \frac{\lambda_U}{\lambda_U + N}$

DataStatPro computes these exact CIs automatically using numerical iteration.

3.7 Factorial ANOVA: Partitioning Variance for Multiple Factors

For a two-factor (A $\times$ B) between-subjects ANOVA with $a$ levels of A, $b$ levels of B, and $n$ observations per cell:

$SS_{total} = SS_A + SS_B + SS_{A \times B} + SS_{within}$

$df_{total} = N - 1 = abn - 1$

Source	SS	df
A (Main effect)	$SS_A$	$a-1$
B (Main effect)	$SS_B$	$b-1$
A $\times$ B (Interaction)	$SS_{A\times B}$	$(a-1)(b-1)$
Within (Error)	$SS_{within}$	$ab(n-1)$
Total	$SS_{total}$	$abn-1$

Computing each SS:

Let $\bar{x}_{j.}$ = mean of level $j$ of factor A, $\bar{x}_{.k}$ = mean of level $k$ of factor B, $\bar{x}_{jk}$ = cell mean, $\bar{x}_{..}$ = grand mean.

$SS_A = bn\sum_{j=1}^a(\bar{x}_{j.} - \bar{x}_{..})^2$

$SS_B = an\sum_{k=1}^b(\bar{x}_{.k} - \bar{x}_{..})^2$

$SS_{A\times B} = n\sum_{j=1}^a\sum_{k=1}^b(\bar{x}_{jk} - \bar{x}_{j.} - \bar{x}_{.k} + \bar{x}_{..})^2$

$SS_{within} = \sum_{j=1}^a\sum_{k=1}^b\sum_{i=1}^n(x_{ijk} - \bar{x}_{jk})^2$

F-ratios (fixed effects model, all denominators are $MS_{within}$ ):

$F_A = \frac{MS_A}{MS_{within}}, \quad F_B = \frac{MS_B}{MS_{within}}, \quad F_{A\times B} = \frac{MS_{A\times B}}{MS_{within}}$

3.8 Partial Eta Squared ( $\eta_p^2$ ) for Factorial Designs

In factorial ANOVA, partial eta squared isolates the effect of one factor after controlling for other effects:

$\eta_p^2 = \frac{SS_{effect}}{SS_{effect} + SS_{error}}$

⚠️ In factorial designs, the sum of all partial $\eta_p^2$ values can exceed 1.0. They are NOT proportions of total variance — only $\eta^2$ carries that interpretation. Always label the statistic precisely: write $\eta_p^2$ , not $\eta^2$ , for partial values.

Partial omega squared (preferred — bias-corrected):

$\omega_p^2 = \frac{SS_{effect} - df_{effect} \cdot MS_{error}}{SS_{total} + MS_{error}}$

3.9 Repeated Measures ANOVA: Within-Subjects Decomposition

For a one-way repeated measures design with $K$ conditions and $n$ participants:

$SS_{total} = SS_{between\text{-}subjects} + SS_{within\text{-}subjects}$

$SS_{within\text{-}subjects} = SS_{conditions} + SS_{error}$

The key feature: between-subjects variability is removed from the error term, which dramatically increases power when individual differences are large.

Source	SS	df
Between subjects	$SS_{bs}$	$n-1$
Conditions (Within)	$SS_{cond}$	$K-1$
Error (Residual)	$SS_{error}$	$(n-1)(K-1)$
Total	$SS_{total}$	$nK-1$

$F = \frac{MS_{conditions}}{MS_{error}} = \frac{SS_{cond}/(K-1)}{SS_{error}/((n-1)(K-1))}$

Generalised eta squared ( $\eta_G^2$ ; Olejnik & Algina, 2003) is recommended for repeated measures designs because it is comparable across between-subjects and within- subjects designs:

$\eta_G^2 = \frac{SS_{conditions}}{SS_{conditions} + SS_{between\text{-}subjects} + SS_{error}}$

3.10 Sphericity and the Mauchly Test

Repeated measures ANOVA requires the sphericity assumption: the variances of the differences between all pairs of conditions are equal. Formally, for all pairs $(j, k)$ :

$\text{Var}(x_j - x_k) = \text{constant}$

Mauchly's test evaluates this assumption:

$H_0$ : the sphericity assumption holds.
A significant result ( $p < .05$ ) indicates sphericity violation.

Epsilon ( $\varepsilon$ ) corrections adjust the degrees of freedom when sphericity is violated. Two commonly used corrections:

Greenhouse-Geisser (GG) correction:

$\varepsilon_{GG} = \frac{\left(\sum_j \hat{\sigma}_{jj}\right)^2}{(K-1)\left(\sum_j\sum_k \hat{\sigma}_{jk}^2\right)}$

$0 < \varepsilon_{GG} \leq 1$ ; $\varepsilon_{GG} = 1$ means sphericity holds exactly.

Huynh-Feldt (HF) correction (less conservative than GG, preferred when $\varepsilon > 0.75$ ):

$\varepsilon_{HF} = \frac{n(K-1)\varepsilon_{GG} - 2}{(K-1)(n - 1 - (K-1)\varepsilon_{GG})}$

Corrected degrees of freedom:

$df_{conditions}^* = \varepsilon \cdot (K-1), \qquad df_{error}^* = \varepsilon \cdot (n-1)(K-1)$

Decision rule for epsilon corrections:

$\varepsilon_{GG}$	Recommended Correction
$\approx 1.0$ (Mauchly $p > .05$ )	None (uncorrected)
$0.75 < \varepsilon_{GG} < 1.0$	Huynh-Feldt
$\varepsilon_{GG} \leq 0.75$	Greenhouse-Geisser
Severe violation	Multivariate approach (MANOVA)

3.11 Mixed ANOVA: Between + Within Factors

A mixed ANOVA (also split-plot ANOVA) includes both between-subjects and within- subjects factors. For a design with one between factor (A, $a$ levels) and one within factor (B, $b$ levels) and $n$ participants per group:

Source	SS	df	MS	F
A (Between)	$SS_A$	$a-1$	$MS_A$	$MS_A/MS_{S(A)}$
S(A) — Subjects within A	$SS_{S(A)}$	$a(n-1)$	$MS_{S(A)}$	—
B (Within)	$SS_B$	$b-1$	$MS_B$	$MS_B/MS_{B\times S(A)}$
A $\times$ B	$SS_{A\times B}$	$(a-1)(b-1)$	$MS_{A\times B}$	$MS_{A\times B}/MS_{B\times S(A)}$
B $\times$ S(A) — Error	$SS_{B\times S(A)}$	$a(n-1)(b-1)$	$MS_{B\times S(A)}$	—
Total	$SS_{total}$	$abn-1$

Note the two separate error terms:

Between-subjects effects (A) are tested against $MS_{S(A)}$ (between-subjects error).
Within-subjects effects (B, A $\times$ B) are tested against $MS_{B\times S(A)}$ (within- subjects error).

4. Assumptions of ANOVA

4.1 Normality of Residuals

ANOVA assumes that the residuals (differences between observed values and group means) are normally distributed within each population:

$\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2)$

How to check:

Shapiro-Wilk test on residuals (most powerful for $n < 50$ per group).
Q-Q plots of residuals: points should follow the diagonal.
Histograms of residuals: should be approximately bell-shaped.
Skewness ( $|z| < 2$ ) and kurtosis ( $|z| < 7$ ) of residuals.

Robustness: ANOVA is robust to mild normality violations, particularly when:

Group sizes are equal (balanced design).
$n \geq 15$ – $20$ per group (CLT applies).
Distributions are symmetric even if non-normal.

When violated: Use the Kruskal-Wallis test (independent groups) or the Friedman test (repeated measures) as non-parametric alternatives. Consider data transformations (log, square root, Box-Cox) for skewed distributions.

4.2 Homogeneity of Variance (Homoscedasticity)

Standard ANOVA assumes that all $K$ populations have equal variances:

$\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_K^2$

This is the homoscedasticity assumption and is required for $MS_{within}$ to serve as a valid pooled estimate of the common population variance $\sigma^2$ .

How to check:

Levene's test (preferred — robust to non-normality): $H_0$ : all group variances equal.
Brown-Forsythe test (more robust, uses median rather than mean).
Bartlett's test (powerful but sensitive to non-normality — avoid for non-normal data).
Variance ratio rule: if $s^2_{max}/s^2_{min} > 4$ , heterogeneity is concerning.

Robustness: ANOVA is relatively robust to heterogeneity when group sizes are equal. When group sizes are unequal AND variances are unequal, ANOVA can have severely inflated or deflated Type I error rates.

When violated: Use Welch's one-way ANOVA (with Games-Howell post-hoc tests), which does not assume equal variances and is recommended as the default for independent designs.

4.3 Independence of Observations

All observations must be independent of each other, both within and across groups. Dependence typically arises from:

Clustered or nested data (students in classrooms, patients in hospitals).
Repeated measurements on the same participant (use repeated measures ANOVA instead).
Time series data.
Family or matched data.

When violated: Use repeated measures ANOVA (for within-subjects data), mixed models (for nested or clustered data), or multilevel ANOVA (for hierarchical designs).

4.4 Sphericity (Repeated Measures Only)

As described in Section 3.10, repeated measures ANOVA additionally requires sphericity — that the variances of all pairwise difference scores are equal. This is a stronger assumption than homogeneity of variance.

When violated: Apply Greenhouse-Geisser or Huynh-Feldt corrections to the degrees of freedom, or use the multivariate approach (MANOVA on the repeated measures).

4.5 Interval Scale of Measurement

The dependent variable must be measured on at least an interval scale (equal-spaced intervals). Ordinal data (e.g., Likert scales) technically violate this assumption.

When violated: Use non-parametric alternatives (Kruskal-Wallis, Friedman) or analyse using ordinal regression.

4.6 Absence of Significant Outliers

ANOVA is based on means and is sensitive to extreme outliers, particularly in small samples. Outliers inflate $SS_{within}$ and $SS_{between}$ unpredictably.

How to check:

Boxplots per group: values beyond $1.5 \times IQR$ are mild outliers; $3 \times IQR$ are extreme.
Standardised residuals: $|z_i| > 3$ flags potential outliers.
Studentised residuals from the ANOVA model.

When outliers present: Investigate the cause. Report analyses with and without outliers. Consider trimmed mean ANOVA or Kruskal-Wallis as robust alternatives.

4.7 Assumption Summary Table

Assumption	One-Way	Factorial	Repeated Measures	Mixed	How to Check	Remedy
Normality	✅	✅	✅ (residuals)	✅	Shapiro-Wilk, Q-Q	Kruskal-Wallis / Friedman
Homogeneity of variance	✅	✅	—	✅ (between)	Levene's	Welch's ANOVA
Independence	✅	✅	✅ (between subjects)	✅	Design review	Mixed models
Sphericity	—	—	✅	✅ (within part)	Mauchly's test	GG/HF correction
Interval scale	✅	✅	✅	✅	Measurement theory	Non-parametric
No severe outliers	✅	✅	✅	✅	Boxplots, residuals	Trimmed means / robust

5. Types of ANOVA

5.1 Classification by Design

By Number of Independent Variables

IVs	Design Name	Example
1	One-way ANOVA	Effect of teaching method (3 levels) on test scores
2	Two-way (factorial) ANOVA	Effect of drug (3 levels) and sex (2 levels) on pain
3	Three-way ANOVA	Drug $\times$ dose $\times$ time on response
$k$	$k$ -way factorial ANOVA	Generalisation of the above

By Type of Factor

Factor Type	Description	Design
Between-subjects	Different participants per level	Standard ANOVA
Within-subjects	Same participants in all levels	Repeated measures ANOVA
Mixed	Combination of between and within	Mixed (split-plot) ANOVA

5.2 Choosing the Correct ANOVA Design

What is the number of independent variables?
├── 1 IV
│   └── Is the same participant in all conditions?
│       ├── NO (between-subjects)  → One-way between-subjects ANOVA
│       └── YES (within-subjects)  → One-way repeated measures ANOVA
└── 2+ IVs
    └── What type are the IVs?
        ├── All between-subjects   → Factorial between-subjects ANOVA
        ├── All within-subjects    → Fully within-subjects factorial ANOVA
        └── Mixed (some between, some within) → Mixed ANOVA

5.3 Type I, II, and III Sums of Squares

In unbalanced designs (unequal cell sizes), the partition of SS depends on the order in which effects are entered. Three conventions exist:

Type	Description	When to Use
Type I (Sequential)	SS for each effect controlling for effects entered earlier	When the order of entry is theoretically meaningful
Type II (Hierarchical)	SS for each effect controlling for all other effects at the same level	When there is no significant interaction
Type III (Marginal)	SS for each effect controlling for all other effects including interactions	When there is a significant interaction; most common default in SPSS

⚠️ For balanced designs (equal cell sizes), all three types give identical results. For unbalanced designs, Type III is the most commonly reported but requires full-rank parameterisation (effect coding or deviation coding, not dummy coding). Always specify which type was used when reporting factorial ANOVA results.

6. Using the ANOVA Calculator Component

The ANOVA Calculator component in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting ANOVA designs and their alternatives.

Step-by-Step Guide

Step 1 — Select the ANOVA Design

Choose from the "ANOVA Type" dropdown:

One-Way Between-Subjects ANOVA: One IV, independent groups.
Factorial Between-Subjects ANOVA: Two or more IVs, independent groups.
One-Way Repeated Measures ANOVA: One IV, same participants in all conditions.
Mixed ANOVA: One or more between-subjects IVs and one or more within-subjects IVs.
Welch's One-Way ANOVA: Robust to heterogeneity of variance.
Kruskal-Wallis Test: Non-parametric one-way.
Friedman Test: Non-parametric repeated measures.

Step 2 — Input Method

Raw data: Upload or paste the dataset. DataStatPro performs all assumption checks automatically, computes effect sizes, and generates visualisations.
Summary statistics: Enter group means, SDs, and $n$ values. Full assumption checks are not available but all inferential statistics and effect sizes are computed.
ANOVA table values: Enter SS, df, and MS values from a published table to compute effect sizes, power, and CIs.

Step 3 — Specify the Design Structure

Number of groups/levels for each factor.
Factor names and level labels for clear output labelling.
Cell sizes (equal or unequal — DataStatPro auto-detects balance).
SS Type for factorial designs (Type I, II, or III — default: Type III).

Step 4 — Select Assumption Tests

DataStatPro automatically runs, with results displayed in a colour-coded panel:

✅ Shapiro-Wilk normality test on residuals (per group for small samples).
✅ Levene's test for homogeneity of variance (between-subjects designs).
✅ Mauchly's test for sphericity (repeated measures designs) with $\varepsilon_{GG}$ and $\varepsilon_{HF}$ automatically applied when sphericity is violated.
✅ Boxplots per group for outlier detection.

Step 5 — Select Post-Hoc Tests

When the omnibus F is significant, specify post-hoc tests:

Tukey HSD — balanced designs, equal variances (controls FWER).
Bonferroni — conservative; any design.
Holm-Bonferroni — less conservative sequential procedure.
Scheffé — most conservative; allows all possible contrasts.
Games-Howell — unequal variances or unequal $n$ (recommended with Welch's ANOVA).
Dunnett's test — comparing all groups to a single control group.
Custom planned contrasts — specify weights for specific a priori comparisons.

Step 6 — Select Effect Sizes

$\omega^2$ (preferred): Bias-corrected estimate for one-way ANOVA.
$\omega_p^2$ (preferred for factorial): Partial omega squared.
$\eta^2$ (common): Biased; provided for comparison.
$\eta_p^2$ (common for factorial): Partial eta squared.
$\eta_G^2$ (recommended for repeated measures): Generalised eta squared.
Cohen's $f$ : For power analysis.
95% CIs for all effect sizes via non-central F-distribution.

Step 7 — Select Display Options

✅ Full ANOVA source table with F-statistics and p-values.
✅ Descriptive statistics (mean, SD, SE, 95% CI) per group/cell.
✅ Effect size estimates with 95% CIs for each effect.
✅ Assumption test results panel.
✅ Post-hoc pairwise comparison table with adjusted p-values and effect sizes.
✅ Interaction plot (line plot of cell means) for factorial designs.
✅ Profile plots for repeated measures.
✅ Raincloud plots (half violin + boxplot + raw data) per group.
✅ Power analysis and required $n$ for each effect.
✅ APA 7th edition results paragraph (auto-generated).

Step 8 — Run the Analysis

Click "Run ANOVA". DataStatPro will:

Compute the full ANOVA source table.
Apply sphericity corrections automatically if Mauchly's test is significant.
Run all selected post-hoc tests and planned contrasts.
Compute effect sizes with exact CIs.
Generate all visualisations.
Output an APA-compliant results paragraph.

7. One-Way Between-Subjects ANOVA

7.1 Purpose and Design

The one-way between-subjects ANOVA tests whether the means of three or more independent groups differ significantly. It is the generalisation of the independent samples t-test (when $K = 2$ , $F = t^2$ ) to $K \geq 3$ groups.

Common applications:

Comparing exam scores across three teaching methods (Lecture, Flipped, Project-Based).
Evaluating the effect of four drug dosage levels on pain rating.
Assessing anxiety differences across five diagnostic categories.
Comparing productivity across three management styles.

7.2 Full Procedure

Step 1 — State hypotheses

$H_0: \mu_1 = \mu_2 = \cdots = \mu_K$

$H_1: \text{At least one } \mu_j \text{ differs from the others}$

Step 2 — Compute grand mean and group means

$\bar{x}_{..} = \frac{\sum_{j=1}^K \sum_{i=1}^{n_j} x_{ij}}{N}, \qquad \bar{x}_j = \frac{1}{n_j}\sum_{i=1}^{n_j}x_{ij}$

Step 3 — Compute sums of squares

$SS_B = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2$

$SS_W = \sum_{j=1}^K \sum_{i=1}^{n_j}(x_{ij}-\bar{x}_j)^2$

$SS_T = SS_B + SS_W$

Step 4 — Compute degrees of freedom

$df_B = K-1, \quad df_W = N-K, \quad df_T = N-1$

Step 5 — Compute mean squares and F

$MS_B = SS_B/df_B, \quad MS_W = SS_W/df_W, \quad F = MS_B/MS_W$

Step 6 — Compute p-value and make a decision

$p = P(F_{K-1,\;N-K} \geq F_{obs})$

Reject $H_0$ if $p \leq \alpha$ .

Step 7 — Compute effect sizes

$\eta^2 = SS_B/SS_T$

$\omega^2 = (SS_B - (K-1)MS_W)/(SS_T + MS_W)$

$f = \sqrt{\omega^2/(1-\omega^2)}$

Step 8 — Conduct post-hoc tests or planned contrasts

If $H_0$ is rejected, identify which groups differ (Section 11).

7.3 Computing $\omega^2$ and $\eta^2$ from F

When only the F-statistic, df, and $N$ are reported:

$\eta^2 = \frac{F \cdot df_B}{F \cdot df_B + df_W}$

$\omega^2 = \frac{(F-1) \cdot df_B}{F \cdot df_B + df_W + 1}$ (approximate)

7.4 Interpreting the Omnibus F-Test

The omnibus F-test is a global test. A significant result tells you only that:

At least one pair of group means differs significantly.
This difference is unlikely to have arisen by sampling error alone.

It does not tell you which groups differ, by how much, or in what direction. Post-hoc tests (Section 11) are required to answer these questions.

💡 When groups were theoretically predicted to differ in specific ways before data collection, use planned contrasts rather than (or in addition to) the omnibus F-test. Planned contrasts are more powerful and more informative than post-hoc tests.

8. Factorial Between-Subjects ANOVA

8.1 Purpose and Design

Factorial ANOVA simultaneously examines the effects of two or more IVs and their interactions on a continuous DV. It is more efficient than running separate one-way ANOVAs because it:

Tests all main effects and interactions simultaneously.
Controls FWER across all tests.
Reveals interaction effects that separate one-way analyses cannot detect.
Requires fewer participants than running separate experiments for each factor.

8.2 The Concept of Interaction

An interaction exists when the effect of one IV differs depending on the level of another IV. Interactions are the most important and often the most theoretically interesting finding in factorial designs.

Types of interactions:

Type	Description	Pattern
Ordinal	Lines in interaction plot do not cross; one group always higher	Parallelism violated but ranking preserved
Disordinal (crossover)	Lines cross; one group higher at some levels, lower at others	Ranking reverses
Spreading	Effect of A increases (or decreases) with level of B	Lines fan out

Interpreting an interaction:

When a significant A $\times$ B interaction is found:

Do not interpret main effects in isolation — they are averages that may be misleading when the interaction is substantial.
Probe the interaction with simple effects analysis: test the effect of A separately at each level of B (or vice versa).
Plot the interaction with a line plot: this is essential for understanding the pattern.

8.3 Simple Effects Analysis

Simple effects decompose the interaction by examining the effect of one IV at each level of the other IV. For a 2 $\times$ 3 design:

Simple effect of A at $B_1$ : compare $\bar{x}_{A_1B_1}$ vs. $\bar{x}_{A_2B_1}$
Simple effect of A at $B_2$ : compare $\bar{x}_{A_1B_2}$ vs. $\bar{x}_{A_2B_2}$
Simple effect of A at $B_3$ : compare $\bar{x}_{A_1B_3}$ vs. $\bar{x}_{A_2B_3}$

Simple effects use $MS_{within}$ from the full factorial model as the error term (pooled error), which is more stable than separate-group estimates.

8.4 Effect Sizes in Factorial ANOVA

For factorial designs, report partial effect sizes for each effect:

Partial eta squared (common, biased):

$\eta_{p,A}^2 = \frac{SS_A}{SS_A + SS_{within}}$

$\eta_{p,B}^2 = \frac{SS_B}{SS_B + SS_{within}}$

$\eta_{p,A\times B}^2 = \frac{SS_{A\times B}}{SS_{A\times B} + SS_{within}}$

Partial omega squared (preferred, bias-corrected):

$\omega_{p,A}^2 = \frac{SS_A - df_A \cdot MS_{within}}{SS_{total} + MS_{within}}$

Generalised eta squared (recommended for between-subjects factorial designs, Olejnik & Algina, 2003):

$\eta_{G}^2 = \frac{SS_{effect}}{SS_{total}}$

For purely between-subjects designs, $\eta_G^2 = \eta^2$ for each effect.

9. One-Way Repeated Measures ANOVA

9.1 Purpose and Design

One-way repeated measures ANOVA tests whether means differ across $K$ conditions when the same participants are measured in all conditions. It is the generalisation of the paired t-test to $K \geq 3$ conditions.

Common applications:

Measuring depression at three time points (pre, post, follow-up).
Comparing cognitive performance across four task difficulty levels.
Evaluating preference ratings for five product variants.
Assessing physiological response across six stimulus intensities.

Advantages over one-way between-subjects ANOVA:

Greater statistical power: between-subjects variability is removed from the error term.
Fewer participants needed: each participant contributes $K$ observations.
Controls individual difference confounds: the same person is compared across conditions.

Disadvantages:

Carryover effects: experiencing one condition may affect performance in others (counterbalance with randomised condition order or include adequate wash-out periods).
Sphericity assumption: more complex than the equal-variance assumption.
Attrition: losing participants eliminates all their data from all conditions.

9.2 Full Procedure

Step 1 — Compute condition means and participant means

$\bar{x}_{.k}$ = mean of condition $k$ ; $\bar{x}_{i.}$ = mean of participant $i$ ; $\bar{x}_{..}$ = grand mean.

Step 2 — Compute sums of squares

$SS_{total} = \sum_{i=1}^n\sum_{k=1}^K (x_{ik} - \bar{x}_{..})^2$

$SS_{conditions} = n\sum_{k=1}^K(\bar{x}_{.k} - \bar{x}_{..})^2$

$SS_{subjects} = K\sum_{i=1}^n(\bar{x}_{i.} - \bar{x}_{..})^2$

$SS_{error} = SS_{total} - SS_{conditions} - SS_{subjects}$

Step 3 — Degrees of freedom

$df_{conditions} = K-1, \quad df_{subjects} = n-1, \quad df_{error} = (K-1)(n-1)$

Step 4 — Mean squares and F

$MS_{conditions} = SS_{conditions}/(K-1)$

$MS_{error} = SS_{error}/((K-1)(n-1))$

$F = MS_{conditions}/MS_{error}$

Step 5 — Sphericity check and correction

Run Mauchly's test. If violated, apply GG or HF correction:

$F_{corrected} = \frac{SS_{conditions}/(\varepsilon \cdot (K-1))}{SS_{error}/(\varepsilon \cdot (n-1)(K-1))} = \frac{MS_{conditions}}{MS_{error}}$

The F-statistic is unchanged; only the reference distribution (via corrected df) changes.

Step 6 — Effect size

Generalised eta squared ( $\eta_G^2$ ) — recommended for repeated measures:

$\eta_G^2 = \frac{SS_{conditions}}{SS_{conditions} + SS_{subjects} + SS_{error}}$

Partial eta squared ( $\eta_p^2$ ) — common but inflated:

$\eta_p^2 = \frac{SS_{conditions}}{SS_{conditions} + SS_{error}}$

Partial omega squared ( $\omega_p^2$ ) — bias-corrected:

$\omega_p^2 = \frac{SS_{conditions} - (K-1)MS_{error}}{SS_{total} + MS_{error}}$

💡 Use $\eta_G^2$ for repeated measures when comparing effect sizes across studies using different designs (between-subjects vs. within-subjects), as it is the most design-comparable measure. For purely within-design comparisons, $\omega_p^2$ is the least-biased choice.

10. Mixed ANOVA

10.1 Purpose and Design

Mixed ANOVA combines at least one between-subjects factor and at least one within- subjects factor. It is among the most commonly used designs in psychology, medicine, and education because most longitudinal experiments involve:

A between-subjects factor: Treatment group vs. control group.
A within-subjects factor: Time (pre, post, follow-up).

The mixed ANOVA tests:

Main effect of the between-subjects factor (e.g., treatment vs. control, collapsed across time).
Main effect of the within-subjects factor (e.g., change over time, collapsed across groups).
Interaction (e.g., does the pattern of change over time differ between treatment and control?). This interaction is typically the primary research question.

10.2 The Primary Interaction: Time $\times$ Group

In a treatment evaluation study, the Time $\times$ Group interaction answers: "Does the treatment group change differently over time compared to the control group?" This is typically the most important test in a mixed ANOVA:

If the interaction is significant: the time trajectories differ between groups — strong evidence of a treatment effect beyond any change in the control group.
If the interaction is non-significant: the pattern of change does not differ between groups — the treatment does not appear to differentially affect change over time.

10.3 Probing a Significant Interaction

When the Group $\times$ Time interaction is significant:

Option 1 — Simple effects of Time within each Group: Conduct one-way repeated measures ANOVA (or paired t-tests with correction) separately for each group. This answers: "Did each group change significantly over time?"

Option 2 — Simple effects of Group at each Time point: Conduct independent t-tests (or one-way ANOVA) separately at each time point with Bonferroni correction. This answers: "At which time points do the groups differ?"

10.4 Sphericity in Mixed ANOVA

The sphericity assumption applies to the within-subjects factor and any interaction involving the within-subjects factor. Mauchly's test and GG/HF corrections apply specifically to:

The main effect of the within-subjects factor (B).
The interaction A $\times$ B.

The between-subjects main effect (A) does not require sphericity but does require homogeneity of variance across groups (Levene's test).

10.5 Effect Sizes for Mixed ANOVA

For mixed ANOVA, generalised eta squared ( $\eta_G^2$ ) is strongly recommended for all effects because it accounts for the different variance structures of between-subjects and within-subjects components:

$\eta_G^2 = \frac{SS_{effect}}{SS_{effect} + SS_{subjects} + SS_{error(between)} + SS_{error(within)}}$

This allows direct comparison of effect sizes from mixed designs with purely between- subjects or purely within-subjects designs.

11. Post-Hoc Tests and Planned Contrasts

11.1 The Need for Post-Hoc Testing

A significant omnibus F-test tells you only that some group means differ. Post-hoc tests are pairwise comparisons conducted after a significant F-test to determine which specific groups differ, while controlling the familywise error rate.

The key trade-off: Controlling the FWER requires more conservative critical values, which reduces power for individual comparisons. Choosing a post-hoc test involves balancing Type I and Type II error control.

11.2 Overview of Post-Hoc Tests

Test	FWER Control	Assumes Equal Variances	Best For
Tukey HSD	✅ Exact for balanced	✅ Yes	Balanced designs, all pairwise
Tukey-Kramer	✅ Approximate	✅ Yes	Unbalanced designs, all pairwise
Bonferroni	✅ Conservative	❌ No	Any design, any set of comparisons
Holm-Bonferroni	✅ Less conservative	❌ No	Any design; preferred over Bonferroni
Scheffé	✅ Most conservative	✅ Yes	All possible contrasts (not just pairwise)
Games-Howell	✅ Approximate	❌ No	Unequal variances or unequal $n$
Dunnett	✅ Optimal	✅ Yes	All groups vs. one control group
Fisher LSD	❌ No control	✅ Yes	Exploratory only; requires significant $F$

11.3 Tukey's HSD — Full Procedure

Tukey's Honestly Significant Difference (HSD) is the most commonly used post-hoc test for balanced designs with equal group variances. It controls the FWER at exactly $\alpha$ for all pairwise comparisons.

Critical value: The studentised range statistic $q_{K,\;N-K,\;\alpha}$ , where $K$ is the number of groups.

Minimum significant difference (MSD):

$\text{HSD} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{n}}$

For unequal group sizes (Tukey-Kramer method):

$\text{HSD}_{jk} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{2}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

Declare groups $j$ and $k$ significantly different if $|\bar{x}_j - \bar{x}_k| > \text{HSD}_{jk}$ .

95% CI for the pairwise difference $\mu_j - \mu_k$ :

$(\bar{x}_j - \bar{x}_k) \pm q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{2}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

11.4 Games-Howell Test — Unequal Variances

Games-Howell is the recommended post-hoc test when variances are unequal (Levene's significant) or group sizes differ substantially. It uses Welch-Satterthwaite df for each pairwise comparison:

Standard error for pair $(j, k)$ :

$SE_{jk} = \sqrt{\frac{s_j^2}{n_j} + \frac{s_k^2}{n_k}}$

Test statistic:

$q_{jk} = \frac{|\bar{x}_j - \bar{x}_k|}{SE_{jk}}$

Degrees of freedom (Welch-Satterthwaite):

$\nu_{jk} = \frac{(s_j^2/n_j + s_k^2/n_k)^2}{(s_j^2/n_j)^2/(n_j-1) + (s_k^2/n_k)^2/(n_k-1)}$

Significance assessed against the studentised range distribution $q_{K,\;\nu_{jk},\;\alpha}$ .

11.5 Planned Contrasts — A Priori Comparisons

Planned contrasts (a priori comparisons) are specific, theoretically motivated comparisons formulated before data collection. They are more powerful than post-hoc tests because:

They do not require a significant omnibus F-test.
They allow targeted tests of specific hypotheses.
Fewer comparisons means a less severe FWER correction (or none, for orthogonal contrasts).

Contrast coefficients: A contrast is a weighted sum of group means $\psi = \sum_j c_j\mu_j$ , where $\sum_j c_j = 0$ .

Examples for $K = 4$ groups (Control, Drug A, Drug B, Drug C):

Contrast	$c_1$	$c_2$	$c_3$	$c_4$	Comparison
Control vs. all treatments	$3$	$-1$	$-1$	$-1$	Control vs. treatments
Drug A vs. B and C	$0$	$2$	$-1$	$-1$	Drug A vs. B and C
Drug B vs. C	$0$	$0$	$1$	$-1$	Drug B vs. C

Orthogonal contrasts are statistically independent ( $\sum_j c_{1j}c_{2j}/n_j = 0$ ). A set of $K-1$ orthogonal contrasts fully partitions $SS_{between}$ and does not require FWER correction.

Contrast F-statistic:

$SS_\psi = \frac{\left(\sum_j c_j \bar{x}_j\right)^2}{\sum_j c_j^2/n_j}, \qquad F_\psi = \frac{SS_\psi}{MS_{within}}, \quad \nu = (1, N-K)$

11.6 Effect Sizes for Pairwise Comparisons

After identifying which pairs of groups differ, report effect sizes for each significant pairwise comparison:

Cohen's $d$ for the pairwise comparison $(j, k)$ :

$d_{jk} = \frac{\bar{x}_j - \bar{x}_k}{s_{pooled,jk}}$

Where $s_{pooled,jk}$ can be either the two-group pooled SD or $\sqrt{MS_{within}}$ from the full ANOVA model (recommended — more stable estimate).

Using $MS_{within}$ as the standardiser:

$d_{jk} = \frac{\bar{x}_j - \bar{x}_k}{\sqrt{MS_{within}}}$

Hedges' $g$ (bias-corrected):

$g_{jk} = d_{jk} \times \left(1 - \frac{3}{4(N-K)-1}\right)$

12. Non-Parametric Alternatives

12.1 When Non-Parametric Tests Are Appropriate

Non-parametric ANOVA alternatives are appropriate when:

Data are ordinal (ranked or Likert-type treated as ordinal).
Data are severely non-normally distributed and sample sizes are small.
There are extreme outliers that distort mean-based statistics.
The homogeneity of variance assumption is severely violated.

12.2 Kruskal-Wallis Test — Non-Parametric One-Way ANOVA

The Kruskal-Wallis H test is the non-parametric alternative to one-way between- subjects ANOVA. It tests whether the population distributions of $K \geq 3$ independent groups are identical (or equivalently, under the location-shift assumption, whether the groups have equal medians).

Procedure:

Step 1 — Rank all observations

Combine all $N$ observations across groups and assign ranks from 1 to $N$ . Assign average ranks for ties.

Step 2 — Compute rank sums per group

$W_j = \sum_{i=1}^{n_j} R_{ij}$ (sum of ranks for group $j$ )

Step 3 — Compute the H statistic

$H = \frac{12}{N(N+1)}\sum_{j=1}^K \frac{W_j^2}{n_j} - 3(N+1)$

Tie correction:

$H_c = \frac{H}{1 - \dfrac{\sum_m(t_m^3 - t_m)}{N^3-N}}$

Where $t_m$ is the number of observations in the $m$ -th tied group.

Step 4 — p-value

For $K = 3$ and $n_j \leq 5$ per group: use exact tables. For larger samples: $H \sim \chi^2_{K-1}$ approximately.

$p = P(\chi^2_{K-1} \geq H_c)$

Step 5 — Effect size: $\eta^2_H$

$\eta^2_H = \frac{H - K + 1}{N - K}$

Or, equivalently: $\eta^2_H = (H/\chi^2_{K-1,\;0.50})/((N-1)/(K-1))$

Cohen's benchmarks for $\eta^2_H$ (same as ANOVA $\eta^2$ ): small = .01, medium = .06, large = .14.

Step 6 — Post-hoc tests for Kruskal-Wallis

When $H$ is significant, pairwise comparisons use the Dunn test with Bonferroni or Holm correction:

$z_{jk} = \frac{\bar{R}_j - \bar{R}_k}{\sqrt{\dfrac{N(N+1)}{12}\left(\dfrac{1}{n_j}+\dfrac{1}{n_k}\right)}}$

Where $\bar{R}_j$ and $\bar{R}_k$ are the mean ranks for groups $j$ and $k$ .

Effect size for each pairwise comparison (rank-biserial correlation):

$r_{rb,jk} = \frac{2z_{jk}}{\sqrt{n_j + n_k}}$

12.3 Friedman Test — Non-Parametric Repeated Measures ANOVA

The Friedman test is the non-parametric alternative to one-way repeated measures ANOVA. It tests whether $K$ related conditions (measured on the same participants) have equal population distributions.

Procedure:

Step 1 — Rank within each participant

For each participant $i$ , rank their $K$ scores from 1 (lowest) to $K$ (highest). Assign average ranks for ties within a participant.

Step 2 — Compute column rank sums

$R_k = \sum_{i=1}^n r_{ik}$ (sum of ranks in condition $k$ across all $n$ participants)

Step 3 — Compute Friedman's $\chi^2_r$ statistic

$\chi^2_r = \frac{12}{nK(K+1)}\sum_{k=1}^K R_k^2 - 3n(K+1)$

Kendall's concordance correction (more accurate for small samples):

$W = \frac{\chi^2_r}{n(K-1)}$

$F_F = \frac{(n-1)W}{1-W}$ , compared to $F_{K-1,\;(n-1)(K-1)}$

Step 4 — p-value

$p = P(\chi^2_{K-1} \geq \chi^2_r)$ (large-sample approximation)

Step 5 — Effect size: Kendall's $W$

$W = \frac{\chi^2_r}{n(K-1)}$

$W$ ranges from 0 (no agreement across participants) to 1 (perfect agreement):

$W$	Interpretation
$0.00 - 0.10$	Very weak concordance
$0.10 - 0.30$	Weak concordance
$0.30 - 0.50$	Moderate concordance
$> 0.50$	Strong concordance

Or report $\eta^2_F$ :

$\eta^2_F = \frac{\chi^2_r}{n(K-1)} = W$

Step 6 — Post-hoc tests for Friedman

Pairwise comparisons using Wilcoxon signed-rank tests with Bonferroni or Holm correction, or the Conover test (more powerful):

$t_{jk} = z_{jk}\sqrt{\frac{N-1-\chi^2_r}{N(1-W)}}, \quad \nu = (n-1)(K-1)$

Effect size for each pairwise comparison (matched-pairs rank-biserial correlation):

$r_{rb,jk} = \frac{2z_{jk}}{\sqrt{n}}$

12.4 Welch's One-Way ANOVA — Robust to Heteroscedasticity

Welch's F-test (Welch, 1951) is a parametric ANOVA variant that does not assume homogeneity of variance. It is the recommended default for one-way between-subjects ANOVA when group variances may differ.

Weighted group means:

$w_j = \frac{n_j}{s_j^2}, \quad \tilde{x} = \frac{\sum_j w_j \bar{x}_j}{\sum_j w_j}$

Welch's F-statistic:

$F_W = \frac{\displaystyle\sum_{j=1}^K w_j(\bar{x}_j - \tilde{x})^2/(K-1)}{1 + \dfrac{2(K-2)}{K^2-1}\displaystyle\sum_{j=1}^K \dfrac{(1-w_j/\sum_j w_j)^2}{n_j-1}}$

Degrees of freedom (approximate):

$\nu_W = \frac{K^2-1}{3\displaystyle\sum_{j=1}^K \dfrac{(1-w_j/\sum_j w_j)^2}{n_j-1}}$

$p = P(F_{K-1,\;\nu_W} \geq F_W)$

Post-hoc: Use Games-Howell pairwise tests when Welch's ANOVA is significant.

💡 Just as Welch's t-test is the recommended default over Student's t-test for two groups, Welch's one-way ANOVA is increasingly recommended as the default over classical ANOVA for three or more independent groups. The loss of power when variances are truly equal is negligible.

13. Advanced Topics

13.1 ANCOVA — Analysis of Covariance

ANCOVA extends ANOVA by including one or more continuous covariates in the model. It serves two purposes:

Reduce error variance by partialling out variability explained by the covariate, thereby increasing power to detect group differences.
Adjust group means for pre-existing differences in the covariate (important for quasi-experimental designs).

The ANCOVA model:

$Y_{ij} = \mu + \alpha_j + \beta(X_{ij} - \bar{X}) + \varepsilon_{ij}$

Where $\alpha_j$ is the group effect, $\beta$ is the regression coefficient for covariate $X$ , and $\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2)$ .

Additional assumptions for ANCOVA:

Homogeneity of regression slopes: The relationship between the covariate and DV is the same (parallel) across groups. Test with the Group $\times$ Covariate interaction term; if significant, standard ANCOVA is inappropriate.
Independence of covariate and treatment: In experimental designs, the covariate (e.g., pre-test) should not be affected by the treatment itself.
Linear relationship between covariate and DV.

Adjusted means (estimated marginal means):

$\bar{y}_{adj,j} = \bar{y}_j - \hat{\beta}(\bar{x}_j - \bar{x}_{..})$

These are the group means estimated at the grand mean of the covariate.

Effect size for ANCOVA:

$\omega_p^2 = \frac{SS_{group} - df_{group} \cdot MS_{error(ANCOVA)}}{SS_{total} + MS_{error(ANCOVA)}}$

13.2 Trend Analysis for Ordered Groups

When the levels of the IV represent an ordered quantitative variable (e.g., drug dose: 0, 10, 20, 40 mg), polynomial trend analysis (orthogonal polynomials) is more informative than post-hoc pairwise tests.

Linear trend: Tests whether the means increase or decrease monotonically.

$SS_{linear} = \frac{n\left(\sum_j c_j^{(1)}\bar{x}_j\right)^2}{\sum_j \left(c_j^{(1)}\right)^2}$

Quadratic trend: Tests whether the means follow a U-shape (accelerating or decelerating pattern).

$SS_{quadratic} = \frac{n\left(\sum_j c_j^{(2)}\bar{x}_j\right)^2}{\sum_j \left(c_j^{(2)}\right)^2}$

Standard orthogonal polynomial coefficients for $K = 3, 4, 5$ groups:

$K$	Linear ( $c^{(1)}$ )	Quadratic ( $c^{(2)}$ )	Cubic ( $c^{(3)}$ )
3	$-1, 0, 1$	$1, -2, 1$	—
4	$-3, -1, 1, 3$	$1, -1, -1, 1$	$-1, 3, -3, 1$
5	$-2, -1, 0, 1, 2$	$2, -1, -2, -1, 2$	$-1, 2, 0, -2, 1$

13.3 Power Analysis for ANOVA

A priori power analysis determines the required sample size before data collection. The primary input is Cohen's $f$ (not $f^2$ — that is for regression):

$f = \sqrt{\frac{\omega^2}{1-\omega^2}}$

For one-way ANOVA (equal group sizes), the non-centrality parameter:

$\lambda = f^2 \cdot N = f^2 \cdot Kn$

Required $n$ per group for power $1-\beta$ at two-sided level $\alpha$ :

Iteratively solve: Power $= P(F_{K-1,\;K(n-1)} \geq F_{crit} \mid \lambda = f^2 Kn) \geq 1-\beta$

No closed form exists — DataStatPro uses numerical methods for exact power calculations.

Approximate $n$ per group using the $\chi^2$ approximation:

$n \approx \frac{(z_{1-\alpha/K} + z_{1-\beta})^2}{f^2}$

Required $n$ per group for common scenarios (80% power, $\alpha = .05$ , one-way):

Cohen's $f$	Label	$K=3$	$K=4$	$K=5$	$K=6$
0.10	Small	322	274	240	215
0.25	Medium	52	45	39	35
0.40	Large	21	18	16	14
0.50	Large	14	12	11	10

For repeated measures ANOVA, power also depends on the within-subjects correlation $\rho$ (average correlation among repeated measures):

$f_{rm} = f \times \sqrt{\frac{K}{1+(K-1)\rho}} \times \sqrt{K}$

Higher $\rho$ → more power benefit from the repeated measures design.

13.4 Dealing with Violations: Transformation Strategies

When the normality or homoscedasticity assumptions are violated, data transformations can sometimes restore assumption validity before applying ANOVA:

Distribution Shape	Suggested Transformation	Formula
Right skew (positive)	Log	$Y' = \ln(Y)$ or $\log_{10}(Y)$
Moderate right skew	Square root	$Y' = \sqrt{Y}$
Severe right skew	Reciprocal	$Y' = 1/Y$
Proportion data	Arcsine	$Y' = \arcsin(\sqrt{Y})$
Count data	Square root	$Y' = \sqrt{Y + 0.5}$

⚠️ Transformed means cannot be back-transformed directly to obtain the mean of the original variable. Back-transforming estimates the median (for log), not the mean. Always report descriptive statistics in the original scale alongside transformed results.

13.5 Robust ANOVA: Trimmed Means

Trimmed mean ANOVA (Wilcox, 2017) replaces standard means with $\alpha$ -trimmed means, dramatically reducing sensitivity to outliers and non-normality while maintaining reasonable power.

Yuen's trimmed mean F-test for one-way ANOVA uses:

$\bar{x}_{t,j}$ = 20%-trimmed mean for group $j$

$W_j = (n_j - 2h_j) \times s_{w,j}^2 / h_j^2$

Where $s_{w,j}^2$ is the Winsorised variance for group $j$ and $h_j = n_j - 2\lfloor 0.2 n_j \rfloor$ .

The test statistic is compared to an F-distribution with adjusted degrees of freedom.

13.6 Bayesian ANOVA

Bayesian ANOVA (Rouder et al., 2012; implemented via BayesFactor R package) quantifies evidence for and against each effect using Bayes Factors. For each effect:

$BF_{10} = \frac{P(\text{data} \mid H_1: \text{effect present})}{P(\text{data} \mid H_0: \text{effect absent})}$

The prior on effect sizes under $H_1$ is typically a Cauchy distribution:

$\delta \sim \text{Cauchy}(0, r), \quad r = \sqrt{2}/2$ (default "medium" prior)

Bayesian ANOVA advantages:

Quantifies evidence for null effects (not just failure to reject $H_0$ ).
Allows continuous evidence monitoring without inflating Type I error.
Produces posterior distributions for effect sizes.
Avoids the all-or-nothing dichotomy of significance testing.

13.7 Reporting ANOVA According to APA 7th Edition

APA Publication Manual (7th ed.) requirements for ANOVA:

Report the F-statistic, both degrees of freedom, and exact p-value: $F(df_1, df_2) =$ [value], $p =$ [value]
Report effect size with 95% CI: $\omega^2 =$ [value] [95% CI: LB, UB] or $\eta_p^2 =$ [value] [95% CI: LB, UB]
Specify which effect size was used (state $\omega^2$ vs. $\eta^2$ explicitly).
Report the sphericity correction used (GG or HF) and $\varepsilon$ value.
Report group means and standard deviations.
Report post-hoc test results with adjusted p-values and effect sizes per comparison.
Specify whether equal variances were assumed and which SS Type was used (factorial).

14. Worked Examples

Example 1: One-Way Between-Subjects ANOVA — Effect of Therapy Type on Depression

A clinical researcher assigns $n = 30$ participants per group to one of three therapy conditions (CBT, Behavioural Activation, Waitlist Control). Post-treatment depression scores (PHQ-9; lower = less depression) are measured.

Group summary statistics:

Group	$n$	Mean PHQ-9	SD
CBT	30	$9.8$	$4.2$
Behavioural Activation (BA)	30	$11.4$	$4.6$
Waitlist Control (WL)	30	$16.3$	$5.1$

$N = 90$ , $K = 3$

Grand mean:

$\bar{x}_{..} = (9.8 + 11.4 + 16.3)/3 = 37.5/3 = 12.500$

Step 1 — Between-groups SS:

$SS_B = 30[(9.8-12.5)^2 + (11.4-12.5)^2 + (16.3-12.5)^2]$

$= 30[7.29 + 1.21 + 14.44] = 30 \times 22.94 = 688.2$

Step 2 — Within-groups SS:

$SS_W = (n_j-1)(s_j^2)$ summed across groups:

$= 29(4.2^2) + 29(4.6^2) + 29(5.1^2) = 29(17.64) + 29(21.16) + 29(26.01)$

$= 511.56 + 613.64 + 754.29 = 1879.49$

Step 3 — Total SS: $SS_T = 688.2 + 1879.49 = 2567.69$

Step 4 — ANOVA source table:

Source	SS	df	MS	$F$	$p$
Between	$688.20$	$2$	$344.10$	$16.52$	$< .001$
Within	$1879.49$	$87$	$21.60$
Total	$2567.69$	$89$

Step 5 — Levene's test: $F(2, 87) = 0.82$ , $p = .44$ — homogeneity of variance holds; standard ANOVA is appropriate.

Step 6 — Effect sizes:

$\eta^2 = 688.20/2567.69 = 0.268$

$\omega^2 = (688.20 - 2 \times 21.60)/(2567.69 + 21.60) = (688.20 - 43.20)/2589.29 = 645.00/2589.29 = 0.249$

$f = \sqrt{0.249/(1-0.249)} = \sqrt{0.249/0.751} = \sqrt{0.3315} = 0.576$

95% CI for $\omega^2$ (via non-central F, $F_{obs} = 16.52$ , $df_1 = 2$ , $df_2 = 87$ ):

Non-centrality $\hat{\lambda} = F \times df_1 = 16.52 \times 2 = 33.04$

95% CI for $\lambda$ : $[15.84, 54.26]$ (numerical)

$\omega^2_L = 15.84/(15.84+90) = 0.150, \quad \omega^2_U = 54.26/(54.26+90) = 0.376$

Step 7 — Post-hoc tests (Tukey HSD):

$MS_{within} = 21.60$ ; Tukey critical value $q_{3,87,0.05} = 3.37$

$\text{HSD} = 3.37 \times \sqrt{21.60/30} = 3.37 \times 0.849 = 2.860$

Comparison	Difference	HSD	Significant?	Cohen's $d$
CBT vs. BA	$1.600$	$2.860$	No, $p = .310$	$0.344$
CBT vs. WL	$6.500$	$2.860$	Yes, $p < .001$	$1.400$
BA vs. WL	$4.900$	$2.860$	Yes, $p < .001$	$1.055$

Cohen's $d$ for each pair using $\sqrt{MS_{within}} = \sqrt{21.60} = 4.648$ :

$d_{CBT-WL} = 6.5/4.648 = 1.399$ ; $d_{BA-WL} = 4.9/4.648 = 1.054$ ; $d_{CBT-BA} = 1.6/4.648 = 0.344$

Summary:

Statistic	Value
$F(2, 87)$	$16.52$
$p$	$< .001$
$\eta^2$	$0.268$ (Large)
$\omega^2$	$0.249$ [95% CI: 0.150, 0.376] (Large)
Cohen's $f$	$0.576$
CBT vs. Control	$d = 1.40$ (Large)
BA vs. Control	$d = 1.05$ (Large)
CBT vs. BA	$d = 0.34$ (Small; ns)

APA write-up: "A one-way between-subjects ANOVA revealed a significant effect of therapy type on post-treatment depression, $F(2, 87) = 16.52$ , $p < .001$ , $\omega^2 = 0.249$ [95% CI: 0.150, 0.376], indicating a large effect. Tukey HSD post-hoc tests showed that both CBT ( $M = 9.8$ , $SD = 4.2$ ) and Behavioural Activation ( $M = 11.4$ , $SD = 4.6$ ) produced significantly lower depression scores than the Waitlist Control ( $M = 16.3$ , $SD = 5.1$ ), $d_{CBT-WL} = 1.40$ and $d_{BA-WL} = 1.05$ respectively (both $p < .001$ ). CBT and BA did not differ significantly from each other, $d = 0.34$ , $p = .310$ ."

Example 2: Two-Way Factorial ANOVA — Drug $\times$ Exercise on Anxiety

A researcher uses a $2 \times 3$ between-subjects design: Drug (Drug A vs. Placebo) $\times$ Exercise (None, Moderate, High). $n = 10$ per cell; DV = anxiety score (lower = less anxious). Total $N = 60$ .

Cell means:

	No Exercise	Moderate	High	Row Mean
Drug A	$24.1$	$18.3$	$14.7$	$19.033$
Placebo	$27.4$	$23.8$	$22.1$	$24.433$
Col Mean	$25.75$	$21.05$	$18.40$	$21.733$

Grand mean: $\bar{x}_{..} = 21.733$

Step 1 — Compute SS (all cells balanced, $n = 10$ ):

$SS_{Drug} = 3 \times 10 \times [(19.033-21.733)^2 + (24.433-21.733)^2]$ $= 30 \times [7.290 + 7.290] = 30 \times 14.580 = 437.40$

$SS_{Exercise} = 2 \times 10 \times [(25.75-21.733)^2 + (21.05-21.733)^2 + (18.40-21.733)^2]$ $= 20 \times [16.136 + 0.467 + 11.109] = 20 \times 27.712 = 554.24$

Cell means for interaction SS:

$SS_{D\times E} = 10\sum_j\sum_k(\bar{x}_{jk} - \bar{x}_{j.} - \bar{x}_{.k} + \bar{x}_{..})^2$

Deviations from additive model:

Cell	$\bar{x}_{jk}$	$\bar{x}_{j.}$	$\bar{x}_{.k}$	$\bar{x}_{..}$	Deviation
Drug A, None	24.1	19.033	25.750	21.733	$24.1 - 19.033 - 25.750 + 21.733 = 1.050$
Drug A, Mod	18.3	19.033	21.050	21.733	$18.3 - 19.033 - 21.050 + 21.733 = -0.050$
Drug A, High	14.7	19.033	18.400	21.733	$14.7 - 19.033 - 18.400 + 21.733 = -1.000$
Placebo, None	27.4	24.433	25.750	21.733	$27.4 - 24.433 - 25.750 + 21.733 = -1.050$
Placebo, Mod	23.8	24.433	21.050	21.733	$23.8 - 24.433 - 21.050 + 21.733 = 0.050$
Placebo, High	22.1	24.433	18.400	21.733	$22.1 - 24.433 - 18.400 + 21.733 = 1.000$

$SS_{D\times E} = 10[(1.050)^2 + (-0.050)^2 + (-1.000)^2 + (-1.050)^2 + (0.050)^2 + (1.000)^2]$

$= 10[1.1025 + 0.0025 + 1.000 + 1.1025 + 0.0025 + 1.000] = 10 \times 4.210 = 42.10$

Pooled within-cells $SS_{within}$ : Assume $s^2_{pooled} = 16.4$ (given), so:

$SS_{within} = (n-1) \times s^2_{pooled} \times \text{cells} = 9 \times 16.4 \times 6 = 885.60$

Step 2 — ANOVA source table:

Source	SS	df	MS	$F$	$p$
Drug (D)	$437.40$	$1$	$437.40$	$26.67$	$< .001$
Exercise (E)	$554.24$	$2$	$277.12$	$16.90$	$< .001$
D $\times$ E	$42.10$	$2$	$21.05$	$1.28$	$.285$
Within (Error)	$885.60$	$54$	$16.40$
Total	$1919.34$	$59$

Step 3 — Partial omega squared for each effect:

$\omega_{p,D}^2 = (437.40 - 1 \times 16.40)/(1919.34 + 16.40) = 421.00/1935.74 = 0.218$

$\omega_{p,E}^2 = (554.24 - 2 \times 16.40)/(1919.34 + 16.40) = 521.44/1935.74 = 0.269$

$\omega_{p,D\times E}^2 = (42.10 - 2 \times 16.40)/(1919.34 + 16.40) = 9.30/1935.74 = 0.005$

Step 4 — Interpretation:

The interaction is not significant ( $p = .285$ ) — the effect of Drug is consistent across all exercise levels. Interpret main effects:

Drug main effect: Drug A ( $M = 19.03$ ) produces significantly lower anxiety than Placebo ( $M = 24.43$ ), $\omega_p^2 = 0.218$ — large effect.
Exercise main effect: Higher exercise is associated with lower anxiety. Tukey HSD post-hoc tests would reveal which exercise levels differ.

APA write-up: "A $2 \times 3$ between-subjects ANOVA examined the effects of Drug (Drug A vs. Placebo) and Exercise level (None, Moderate, High) on anxiety scores. The interaction was not significant, $F(2, 54) = 1.28$ , $p = .285$ , $\omega_p^2 = 0.005$ . There were significant main effects of Drug, $F(1, 54) = 26.67$ , $p < .001$ , $\omega_p^2 = 0.218$ [95% CI: 0.102, 0.340], and Exercise, $F(2, 54) = 16.90$ , $p < .001$ , $\omega_p^2 = 0.269$ [95% CI: 0.136, 0.395]. Both effects were large."

Example 3: One-Way Repeated Measures ANOVA — Memory Scores Across Four Time Points

A cognitive psychologist measures word recall at four time points (immediate recall, 5-minute delay, 30-minute delay, 24-hour delay) in $n = 20$ participants.

Condition means and SDs:

Time Point	Mean Recall	SD
Immediate	$18.4$	$3.1$
5 minutes	$15.7$	$3.4$
30 minutes	$12.3$	$3.8$
24 hours	$9.1$	$4.2$

ANOVA results (given):

Source	SS	df	MS	$F$	$p$
Between subjects	$1842.60$	$19$	$96.98$
Time	$1104.80$	$3$	$368.27$	$42.14$	$< .001$
Error	$498.20$	$57$	$8.74$
Total	$3445.60$	$79$

Mauchly's test: $W = 0.81$ , $\chi^2(5) = 4.12$ , $p = .532$ — sphericity holds; no correction needed.

Effect sizes:

$\eta_p^2 = 1104.80/(1104.80 + 498.20) = 1104.80/1603.00 = 0.689$

$\eta_G^2 = 1104.80/(1104.80 + 1842.60 + 498.20) = 1104.80/3445.60 = 0.321$

$\omega_p^2 = (1104.80 - 3 \times 8.74)/(3445.60 + 8.74) = (1104.80 - 26.22)/3454.34 = 1078.58/3454.34 = 0.312$

Post-hoc tests (Bonferroni-corrected pairwise comparisons):

6 comparisons; adjusted $\alpha = .05/6 = .0083$

Using paired t-tests on each pair (or use RM ANOVA contrast framework):

Comparison	Mean Diff	$t(19)$	$p_{adj}$	$d_z$
Imm vs. 5 min	$2.7$	$4.23$	$.004$	$0.95$
Imm vs. 30 min	$6.1$	$7.81$	$< .001$	$1.75$
Imm vs. 24 hr	$9.3$	$10.42$	$< .001$	$2.33$
5 min vs. 30 min	$3.4$	$5.17$	$< .001$	$1.16$
5 min vs. 24 hr	$6.6$	$8.20$	$< .001$	$1.83$
30 min vs. 24 hr	$3.2$	$4.89$	$< .001$	$1.09$

All pairwise comparisons are significant — recall declines significantly at every delay interval.

APA write-up: "A one-way repeated measures ANOVA examined word recall across four time points. Mauchly's test indicated that the sphericity assumption was met, $W = 0.81$ , $p = .532$ . There was a significant effect of time, $F(3, 57) = 42.14$ , $p < .001$ , $\omega_p^2 = 0.312$ [95% CI: 0.198, 0.421], $\eta_G^2 = 0.321$ , indicating a large effect. Bonferroni-corrected pairwise comparisons revealed that recall declined significantly at each subsequent time point (all $p < .008$ ), with effect sizes ranging from $d_z = 0.95$ (immediate vs. 5-min) to $d_z = 2.33$ (immediate vs. 24-hr)."

Example 4: Kruskal-Wallis Test — Non-Parametric Comparison of Pain Ratings

A pain researcher compares pain ratings (0–10 VAS scale, ordinal) across three acupuncture protocols. Shapiro-Wilk tests indicate non-normality in all groups.

Data:

Protocol A ( $n_1=8$ )	Protocol B ( $n_2=7$ )	Protocol C ( $n_3=6$ )
$3, 5, 2, 6, 4, 5, 3, 4$	$7, 8, 6, 9, 7, 8, 7$	$5, 6, 4, 7, 5, 6$

$N = 21$

Step 1 — Combined ranks:

Sorted values: 2(1), 3(2.5), 3(2.5), 4(5), 4(5), 4(5), 5(8), 5(8), 5(8), 5(8), 6(11.5), 6(11.5), 6(11.5), 7(15), 7(15), 7(15), 8(18.5), 8(18.5), 9(21)

Wait — let me recount: Protocol A: {3,5,2,6,4,5,3,4}, B: {7,8,6,9,7,8,7}, C: {5,6,4,7,5,6}

Combined sorted: 2,3,3,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,9

Value	Count	Ranks	Avg Rank
2	1	1	1.0
3	2	2–3	2.5
4	3	4–6	5.0
5	4	7–10	8.5
6	4	11–14	12.5
7	4	15–18	16.5
8	2	19–20	19.5
9	1	21	21.0

Rank assignments:

Protocol A: 1.0, 8.5, 2.5, 12.5, 5.0, 8.5, 2.5, 5.0 → $W_A = 45.5$
Protocol B: 16.5, 19.5, 12.5, 21.0, 16.5, 19.5, 16.5 → $W_B = 122.0$
Protocol C: 8.5, 12.5, 5.0, 16.5, 8.5, 12.5 → $W_C = 63.5$

Check: $45.5 + 122.0 + 63.5 = 231 = 21 \times 22/2$ ✅

Step 2 — H statistic:

$H = \frac{12}{21 \times 22}\left(\frac{45.5^2}{8} + \frac{122.0^2}{7} + \frac{63.5^2}{6}\right) - 3(22)$

$= \frac{12}{462}\left(\frac{2070.25}{8} + \frac{14884}{7} + \frac{4032.25}{6}\right) - 66$

$= 0.02597\left(258.78 + 2126.29 + 672.04\right) - 66$

$= 0.02597 \times 3057.11 - 66 = 79.39 - 66 = 13.39$

Tie correction factor:

$\sum_m(t_m^3-t_m) = (1^3-1)+(2^3-2)+(3^3-3)+(4^3-4)+(4^3-4)+(4^3-4)+(2^3-2)+(1^3-1)$

$= 0+6+24+60+60+60+6+0 = 216$

$H_c = 13.39 / \left(1 - \frac{216}{21^3-21}\right) = 13.39 / \left(1 - \frac{216}{9240}\right) = 13.39 / 0.9766 = 13.71$

Step 3 — p-value:

$p = P(\chi^2_2 \geq 13.71) = .001$

Step 4 — Effect size:

$\eta^2_H = (13.71 - 3 + 1)/(21 - 3) = 11.71/18 = 0.651$

This is a very large effect — protocol membership explains approximately 65% of the rank variability in pain ratings.

Dunn post-hoc tests (Holm-corrected):

$\bar{R}_A = 45.5/8 = 5.69, \quad \bar{R}_B = 122.0/7 = 17.43, \quad \bar{R}_C = 63.5/6 = 10.58$

$SE_{AB} = \sqrt{21 \times 22/12 \times (1/8+1/7)} = \sqrt{38.5 \times 0.2679} = \sqrt{10.31} = 3.211$

$z_{AB} = (5.69-17.43)/3.211 = -11.74/3.211 = -3.657, \quad r_{rb} = 2(-3.657)/\sqrt{15} = -1.889$ (cap at $-1$ )

Use: $r_{rb} = (U_A - U_B)/(n_A n_B)$ — from rank-based approach:

Comparison	$z$	$p$ (Holm-adj)	$r_{rb}$
A vs. B	$-3.657$	$< .001$	$-0.895$
A vs. C	$-1.842$	$.065$	$-0.532$
B vs. C	$1.914$	$.056$	$0.497$

APA write-up: "Due to significant non-normality, a Kruskal-Wallis H test was conducted to compare pain ratings across three acupuncture protocols. The test revealed a significant difference, $H(2) = 13.71$ , $p = .001$ , $\eta^2_H = 0.651$ , indicating a very large effect of protocol. Holm-corrected Dunn post-hoc tests revealed that Protocol A (Mdn = 4.5) produced significantly lower pain ratings than Protocol B (Mdn = 7.0), $z = -3.66$ , $p < .001$ , $r_{rb} = -.895$ . Differences between A and C and between B and C did not survive correction ( $p = .065$ and $p = .056$ respectively)."

15. Common Mistakes and How to Avoid Them

Mistake 1: Interpreting the Omnibus F Without Post-Hoc Tests

Problem: Reporting a significant $F$ and concluding that all groups differ from each other, or that a specific pair of groups differs, without conducting post-hoc tests. The omnibus F tells you only that at least one difference exists.

Solution: Always follow a significant omnibus F with appropriate post-hoc tests or planned contrasts. Specify which test was used and apply the correct FWER correction. Report all pairwise comparisons with adjusted p-values and individual effect sizes.

Mistake 2: Reporting $\eta^2$ as if It Were $\omega^2$

Problem: Reporting $\eta^2$ (e.g., $\eta^2 = 0.23$ ) and labelling it simply as "effect size" or, worse, confusing it with the less-biased $\omega^2$ . $\eta^2$ is consistently biased upward and overestimates the population effect, sometimes substantially in small samples with few groups.

Solution: Always report $\omega^2$ (or $\omega_p^2$ for factorial designs) as the primary effect size, and label all effect sizes precisely. If $\eta^2$ is reported (e.g., for software compatibility), clearly note that it is biased and report $\omega^2$ alongside.

Mistake 3: Confusing $\eta^2$ and $\eta_p^2$ in Factorial Designs

Problem: In factorial ANOVA with two or more factors, $\eta_p^2$ values can sum to more than 1.0 across all effects. Reporting $\eta_p^2$ and describing it as "the proportion of total variance explained" is incorrect — it is the proportion of variance explained after removing the other effects.

Solution: Use $\eta^2$ for total-variance proportions and $\eta_p^2$ for partial proportions, always labelling them distinctly. Preferably, use $\omega_p^2$ or $\eta_G^2$ and state which was used.

Mistake 4: Ignoring Significant Interactions and Interpreting Main Effects Alone

Problem: When a significant A $\times$ B interaction is present, reporting and interpreting main effects as if the interaction did not exist. The main effect of A is the average effect across all levels of B — if the interaction is disordinal (crossover), this average is actively misleading.

Solution: Test for interactions before interpreting main effects. When an interaction is significant, probe it with simple effects analysis and interaction plots. Describe the pattern of the interaction rather than (or in addition to) the main effects.

Mistake 5: Using One-Way ANOVA When Repeated Measures ANOVA is Needed

Problem: Treating pre-post data from the same participants as independent groups and running a between-subjects one-way ANOVA. This inflates the error term with between- person variability, severely reduces power, and violates the independence assumption.

Solution: Identify whether data come from different participants (between-subjects) or the same participants (within-subjects). Use repeated measures ANOVA when each participant contributes more than one score. If in doubt, check whether the data file has one row per participant.

Mistake 6: Not Checking or Correcting for Sphericity Violations

Problem: Running repeated measures ANOVA in SPSS or R and not checking Mauchly's test, or checking it but ignoring a significant result and reporting uncorrected values.

Solution: Always report Mauchly's test result. When $p < .05$ , report the Greenhouse-Geisser (or Huynh-Feldt if $\varepsilon > 0.75$ ) corrected results. Report both $\varepsilon$ and the corrected df alongside the F-statistic.

Mistake 7: Applying Standard ANOVA When Variances Are Unequal

Problem: Using classical ANOVA with unequal group sizes and markedly different group variances ( $s^2_{max}/s^2_{min} > 4$ ). This produces inflated Type I error rates and untrustworthy p-values.

Solution: When Levene's test is significant (especially with unequal $n$ ), use Welch's one-way ANOVA with Games-Howell post-hoc tests. Report Levene's test result in the method section and justify the choice of test.

Mistake 8: Running Multiple Pairwise t-Tests After ANOVA Without Correction

Problem: After a significant F, running all pairwise t-tests without applying a multiple comparisons correction, effectively using $\alpha = .05$ per comparison and inflating FWER.

Solution: Use a proper post-hoc procedure (Tukey HSD, Games-Howell, Holm-Bonferroni) that controls the FWER. Fisher's LSD (uncorrected pairwise tests) is not appropriate as a standalone post-hoc procedure unless there are only $K = 3$ groups.

Mistake 9: Interpreting Non-Significant Interactions as Absence of Interaction

Problem: Concluding that "there is no interaction" based solely on $p > .05$ for the interaction term. A non-significant interaction test only indicates insufficient evidence for an interaction, not evidence of its absence. Underpowered studies routinely fail to detect real interactions.

Solution: Report the effect size for the interaction ( $\omega_p^2$ , $\eta_G^2$ ) and its 95% CI alongside the p-value. If the CI is wide, acknowledge low precision. Consider equivalence testing for the interaction if absence of interaction is the primary claim.

Mistake 10: Failing to Report Descriptive Statistics and Visualisations for Factorial Designs

Problem: In factorial and repeated measures ANOVA, reporting only the omnibus F- statistics without cell means, standard deviations, and interaction plots. Statistical significance alone is uninterpretable without the pattern of means.

Solution: Always report means and standard deviations (or standard errors) for every cell. For factorial designs, always include an interaction plot. For repeated measures, include a profile plot. Raincloud plots (half violin + box + scatter) are increasingly recommended for transparent reporting of individual data.

16. Troubleshooting

Problem	Likely Cause	Solution
$F < 1.0$	$MS_{within} > MS_{between}$ ; no treatment effect or large within-group variability	Report as non-significant; consider power; inspect within-group variability
$\omega^2$ or $\varepsilon^2$ is negative	True effect near zero; $MS_{between} < (K-1)MS_{within}$	Report as 0 (convention); increase sample size; note small effect
$\eta_p^2$ values sum to $> 1.0$	Expected in factorial ANOVA; $\eta_p^2$ is not a total-variance proportion	Switch to $\eta^2$ or $\eta_G^2$ for total-variance interpretation
Mauchly's test is significant	Sphericity violated (common in $K \geq 3$ repeated measures)	Apply GG correction (if $\varepsilon \leq 0.75$ ) or HF (if $\varepsilon > 0.75$ ); report $\varepsilon$
Levene's test is significant	Heterogeneous variances across groups	Use Welch's ANOVA with Games-Howell post-hoc
Interaction is significant but interaction plot looks parallel	Scaling issue on plot axes; small but real interaction	Rescale y-axis to start at true minimum; report $\omega_p^2$ for interaction
Post-hoc tests reveal no significant pairs despite significant $F$	Effect is driven by small differences across many pairs; no single large pair	Report omnibus and note no individual pair survives correction; reduce FWER burden with planned contrasts
Planned contrasts do not sum to zero	Contrast coding error	Re-specify: $\sum_j c_j = 0$ for all contrasts
ANOVA gives different result to multiple t-tests	ANOVA uses pooled error term; t-tests use only two groups	Trust ANOVA; the pooled error is more stable
Repeated measures ANOVA has very different $\eta_p^2$ vs. $\eta_G^2$	Large between-subjects variance inflating $\eta_G^2$ denominator	Report both; $\eta_G^2$ is preferred for cross-design comparison
Very large $F$ with very small $\omega^2$	Large $N$ ; even tiny mean differences are statistically significant	Report effect size — statistical significance does not imply practical significance
Cell size is 0 for some factorial cells	Empty cells in design	Empty cells break standard ANOVA; use regression approach or multilevel modelling
Significant ANCOVA result changes after adding covariate $\times$ group interaction	Homogeneity of regression slopes violated	Standard ANCOVA is inappropriate; use moderated regression instead
Kruskal-Wallis is significant but Dunn tests show no significant pairs	Conservative Bonferroni correction; effect spread across many pairs	Use Holm correction instead; report Dunn tests without correction if all planned
Friedman test statistic is 0	Identical rankings across all participants	Verify data; check for data entry errors or insufficient variability

17. Quick Reference Cheat Sheet

Core ANOVA Equations

Formula	Description
$SS_B = \sum_j n_j(\bar{x}_j - \bar{x}_{..})^2$	Between-groups SS (one-way)
$SS_W = \sum_j\sum_i(x_{ij}-\bar{x}_j)^2$	Within-groups SS (one-way)
$SS_T = SS_B + SS_W$	Total SS decomposition
$MS_B = SS_B/(K-1)$	Between-groups mean square
$MS_W = SS_W/(N-K)$	Within-groups mean square (error)
$F = MS_B/MS_W$	F-ratio (one-way ANOVA)
$p = P(F_{K-1,\;N-K} \geq F_{obs})$	One-way ANOVA p-value
$SS_A = bn\sum_j(\bar{x}_{j.}-\bar{x}_{..})^2$	Factor A SS (factorial, balanced)
$SS_{A\times B} = SS_{cells} - SS_A - SS_B$	Interaction SS
$F_A = MS_A/MS_{within}$	Factorial F for main effect A
$SS_{cond} = n\sum_k(\bar{x}_{.k}-\bar{x}_{..})^2$	Conditions SS (repeated measures)
$SS_{subj} = K\sum_i(\bar{x}_{i.}-\bar{x}_{..})^2$	Subjects SS (repeated measures)
$SS_{error} = SS_T - SS_{cond} - SS_{subj}$	Error SS (repeated measures)

Effect Size Formulas

Formula	Description
$\eta^2 = SS_{effect}/SS_{total}$	Eta squared (one-way; biased)
$\eta_p^2 = SS_{effect}/(SS_{effect}+SS_{error})$	Partial eta squared (factorial)
$\eta_G^2 = SS_{cond}/(SS_{cond}+SS_{subj}+SS_{error})$	Generalised eta squared (RM/mixed)
$\omega^2 = (SS_B-(K-1)MS_W)/(SS_T+MS_W)$	Omega squared (one-way; preferred)
$\omega_p^2 = (SS_{eff}-df_{eff}MS_{err})/(SS_T+MS_{err})$	Partial omega squared (factorial)
$\varepsilon^2 = (SS_B-(K-1)MS_W)/SS_T$	Epsilon squared (one-way)
$f = \sqrt{\eta^2/(1-\eta^2)}$	Cohen's $f$ (from $\eta^2$ )
$f = \sqrt{\omega^2/(1-\omega^2)}$	Cohen's $f$ (from $\omega^2$ ; preferred)
$\eta^2 = F\cdot df_B/(F\cdot df_B + df_W)$	$\eta^2$ from $F$ -statistic
$\omega^2 \approx (F-1)\cdot df_B/(F\cdot df_B+df_W+1)$	$\omega^2$ from $F$ -statistic (approx)
$d_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_{within}}$	Cohen's $d$ for post-hoc pairwise

Non-Parametric Formulas

Formula	Description
$H = \frac{12}{N(N+1)}\sum_j W_j^2/n_j - 3(N+1)$	Kruskal-Wallis $H$
$\eta^2_H = (H-K+1)/(N-K)$	Effect size for Kruskal-Wallis
$\chi^2_r = \frac{12}{nK(K+1)}\sum_k R_k^2 - 3n(K+1)$	Friedman $\chi^2_r$
$W = \chi^2_r/(n(K-1))$	Kendall's $W$ (Friedman effect size)
$z_{jk} = (\bar{R}_j-\bar{R}_k)/SE_{jk}$	Dunn's test statistic
$r_{rb,jk} = 2z_{jk}/\sqrt{n_j+n_k}$	Rank-biserial $r$ (pairwise)
$F_W = \text{weighted SS}_{between}/\text{correction}$	Welch's one-way ANOVA

Sphericity Corrections

Formula	Description
$\varepsilon_{GG} = (\sum_j\hat\sigma_{jj})^2/((K-1)\sum_j\sum_k\hat\sigma_{jk}^2)$	Greenhouse-Geisser epsilon
$\varepsilon_{HF} = \frac{n(K-1)\varepsilon_{GG}-2}{(K-1)(n-1-(K-1)\varepsilon_{GG})}$	Huynh-Feldt epsilon
$df^* = \varepsilon \cdot df_{uncorrected}$	Corrected degrees of freedom

Post-Hoc Test Selection Guide

Condition	Recommended Post-Hoc Test
Balanced design, equal variances	Tukey HSD
Unbalanced design, equal variances	Tukey-Kramer
Unequal variances or group sizes	Games-Howell
All groups vs. one control	Dunnett's test
All possible contrasts (not just pairwise)	Scheffé
Any design, conservative	Bonferroni
Any design, less conservative than Bonferroni	Holm-Bonferroni
Non-parametric (Kruskal-Wallis)	Dunn test with Holm correction
Non-parametric (Friedman)	Wilcoxon signed-rank + Holm, or Conover

APA 7th Edition Reporting Templates

One-Way Between-Subjects ANOVA: "A one-way between-subjects ANOVA revealed [a significant / no significant] effect of [IV] on [DV], $F(df_B, df_W) =$ [value], $p =$ [value], $\omega^2 =$ [value] [95% CI: LB, UB]. [Post-hoc results if significant.]"

Factorial Between-Subjects ANOVA: "A $a\times b$ between-subjects ANOVA was conducted. The [IV $_A \times$ IV $_B$ ] interaction [was / was not] significant, $F(df_{AxB}, df_W) =$ [value], $p =$ [value], $\omega_p^2 =$ [value] [95% CI: LB, UB]. [Describe interaction pattern or, if not significant, main effects:] There was a significant main effect of [IV $_A$ ], $F(df_A, df_W) =$ [value], $p =$ [value], $\omega_p^2 =$ [value] [95% CI: LB, UB], and [of IV $_B$ ], $F(df_B, df_W) =$ [value], $p =$ [value], $\omega_p^2 =$ [value] [95% CI: LB, UB]."

One-Way Repeated Measures ANOVA: "A one-way repeated measures ANOVA was conducted. Mauchly's test [indicated / did not indicate] a violation of sphericity, $W =$ [value], $p =$ [value][; consequently, Greenhouse-Geisser / Huynh-Feldt corrected values are reported, $\varepsilon =$ [value]]. There was a significant effect of [condition], $F(\varepsilon\cdot df_{cond},\; \varepsilon\cdot df_{error}) =$ [value], $p =$ [value], $\omega_p^2 =$ [value] [95% CI: LB, UB], $\eta_G^2 =$ [value]."

Mixed ANOVA: "A $a$ -level (between) $\times$ $b$ -level (within) mixed ANOVA was conducted. Mauchly's test [was / was not] significant for the within-subjects factor, $W =$ [value], $p =$ [value][; GG correction applied, $\varepsilon =$ [value]]. The [between $\times$ within] interaction [was / was not] significant, $F(df, df) =$ [value], $p =$ [value], $\eta_G^2 =$ [value] [95% CI: LB, UB]. [Describe simple effects if significant.]"

Kruskal-Wallis: "A Kruskal-Wallis H test was conducted due to [non-normality / ordinal data]. The test revealed [a significant / no significant] difference across groups, $H(df) =$ [value], $p =$ [value], $\eta^2_H =$ [value]. [Dunn pairwise post-hoc results if significant.]"

Friedman Test: "A Friedman test was conducted. There was [a significant / no significant] difference across conditions, $\chi^2_r(df) =$ [value], $p =$ [value], $W =$ [value]."

Welch's One-Way ANOVA: "Due to significant heterogeneity of variance (Levene's $F(df_1, df_2) =$ [value], $p =$ [value]), Welch's one-way ANOVA was applied. Results indicated [a significant / no significant] effect of [IV] on [DV], $F_W(K-1, \nu_W) =$ [value], $p =$ [value], $\omega^2 =$ [value] [95% CI: LB, UB]. Games-Howell post-hoc tests were used."

Required Sample Size — One-Way ANOVA (80% Power, $\alpha = .05$ )

Cohen's $f$	Label	$K = 3$	$K = 4$	$K = 5$	$K = 6$
0.10	Small	322	274	240	215
0.15	Small-Med	144	123	107	96
0.25	Medium	52	45	39	35
0.35	Med-Large	27	23	21	19
0.40	Large	21	18	16	14
0.50	Large	14	12	11	10

All values are $n$ per group. Multiply by $K$ for total $N$ .

Cohen's Benchmarks — ANOVA Effect Sizes

Label	$\eta^2$ / $\omega^2$	$f$	$\eta_p^2$ (approx)
Small	$0.01$	$0.10$	$0.01$
Medium	$0.06$	$0.25$	$0.06$
Large	$0.14$	$0.40$	$0.14$

Note: Cohen's benchmarks for $\eta^2$ apply approximately to $\omega^2$ and $\eta_p^2$ . Always prioritise domain-specific benchmarks over these generic conventions.

Degrees of Freedom Reference

Design	Source	df
One-way between	Between	$K-1$
	Within	$N-K$
	Total	$N-1$
Factorial ( $a\times b$ )	A	$a-1$
	B	$b-1$
	A $\times$ B	$(a-1)(b-1)$
	Within	$ab(n-1)$
One-way RM	Conditions	$K-1$
	Subjects	$n-1$
	Error	$(K-1)(n-1)$
Mixed ( $a$ between, $b$ within)	A (between)	$a-1$
	S(A) (between error)	$a(n-1)$
	B (within)	$b-1$
	A $\times$ B	$(a-1)(b-1)$
	B $\times$ S(A) (within error)	$a(n-1)(b-1)$

Assumption Checks Reference

Assumption	Test	Action if Violated
Normality of residuals	Shapiro-Wilk, Q-Q plot	Kruskal-Wallis / Friedman; transform data
Homogeneity of variance	Levene's, Brown-Forsythe	Welch's ANOVA + Games-Howell
Sphericity (RM designs)	Mauchly's test ( $W$ , $\varepsilon$ )	GG correction ( $\varepsilon \leq 0.75$ ), HF ( $\varepsilon > 0.75$ )
Homogeneity of regression slopes (ANCOVA)	Group $\times$ Covariate interaction test	Use moderated regression instead
Independence	Design review	Mixed models / multilevel ANOVA
Outliers	Boxplots, standardised residuals ($	z
Interval scale	Measurement theory	Non-parametric alternatives

ANOVA Reporting Checklist

Item	Required
$F$ -statistic with both df	✅ Always
Exact p-value (or $p < .001$ )	✅ Always
$\omega^2$ or $\omega_p^2$ with 95% CI	✅ Always (preferred over $\eta^2$ )
Which effect size was reported ( $\eta^2$ vs. $\omega^2$ etc.)	✅ Always
Group means and SDs for all groups/cells	✅ Always
Sample sizes per group/cell	✅ Always
Levene's test result (between-subjects)	✅ For independent designs
Mauchly's test and $\varepsilon$ (within-subjects)	✅ For RM and mixed designs
Which sphericity correction applied (GG or HF)	✅ When Mauchly's significant
$\eta_G^2$ for repeated measures / mixed	✅ Recommended
Post-hoc test name and FWER correction method	✅ When omnibus F significant
Post-hoc pairwise differences with adjusted $p$ and $d_{jk}$	✅ When omnibus F significant
Interaction plot for factorial/mixed designs	✅ When interaction significant
Simple effects for significant interactions	✅ When interaction significant
SS Type for unbalanced factorial designs	✅ For unbalanced factorial
Power analysis or sensitivity analysis	✅ For null results
Whether Welch's ANOVA was used	✅ If variances are unequal
Domain-specific benchmark context	✅ Recommended

Conversion Formulas

From	To	Formula
$\eta^2$	$f$	$f = \sqrt{\eta^2/(1-\eta^2)}$
$\omega^2$	$f$	$f = \sqrt{\omega^2/(1-\omega^2)}$
$f$	$\eta^2$	$\eta^2 = f^2/(1+f^2)$
$F$ , $df_B$ , $df_W$	$\eta^2$	$\eta^2 = F\cdot df_B/(F\cdot df_B+df_W)$
$F$ , $df_B$ , $df_W$	$\omega^2$ (approx)	$\omega^2 = (F-1)\cdot df_B/(F\cdot df_B+df_W+1)$
$\eta^2$ (2 groups)	Cohen's $d$	$d = 2\sqrt{\eta^2/(1-\eta^2)}$
Cohen's $d$ (2 groups)	$\eta^2$	$\eta^2 = d^2/(d^2+4)$
$\eta^2$	$r$ (2 groups)	$r = \sqrt{\eta^2}$
$W$ (Kendall's)	$r$ (avg pairwise)	$r = (nW-1)/(n-1)$
$\chi^2_r$	$\eta^2_F$	$\eta^2_F = \chi^2_r/(n(K-1))$

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting ANOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for applied coverage, Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth, Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for robust alternatives, Olejnik & Algina's (2003) "Generalized Eta and Omega Squared Statistics" (Educational and Psychological Measurement) for effect size recommendations in repeated measures designs, and Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science" (Frontiers in Psychology, 2013) for practical effect size guidance. For Bayesian ANOVA, see Rouder et al. (2012) in the Journal of Mathematical Psychology. For feature requests or support, contact the DataStatPro team.

ANOVA Tests and Alternatives

ANOVA Tests and Alternatives: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 The Logic of Variance Decomposition

1.2 The F-Distribution

1.3 Why Not Multiple t-Tests?

1.4 The Relationship Between ANOVA and Regression

1.5 Main Effects and Interactions

1.6 Fixed vs. Random Effects

1.7 Variance Explained: η2\eta^2η2, ω2\omega^2ω2, and ε2\varepsilon^2ε2

2. What is ANOVA?

2.1 The Core Idea

2.2 What ANOVA Tests and Does Not Test

2.3 The ANOVA Family

2.4 ANOVA in Context

3. The Mathematics Behind ANOVA

3.1 One-Way ANOVA: Sum of Squares Decomposition

3.2 Mean Squares and the F-Statistic

3.3 The Expected Mean Squares

3.4 The ANOVA Source Table

3.5 Effect Sizes for One-Way ANOVA

3.6 Confidence Intervals for ANOVA Effect Sizes

3.7 Factorial ANOVA: Partitioning Variance for Multiple Factors

3.8 Partial Eta Squared (ηp2\eta_p^2ηp2​) for Factorial Designs

3.9 Repeated Measures ANOVA: Within-Subjects Decomposition

3.10 Sphericity and the Mauchly Test

3.11 Mixed ANOVA: Between + Within Factors

4. Assumptions of ANOVA

4.1 Normality of Residuals

4.2 Homogeneity of Variance (Homoscedasticity)

4.3 Independence of Observations

4.4 Sphericity (Repeated Measures Only)

4.5 Interval Scale of Measurement

4.6 Absence of Significant Outliers

4.7 Assumption Summary Table

5. Types of ANOVA

5.1 Classification by Design

By Number of Independent Variables

By Type of Factor

5.2 Choosing the Correct ANOVA Design

5.3 Type I, II, and III Sums of Squares

6. Using the ANOVA Calculator Component

Step-by-Step Guide

7. One-Way Between-Subjects ANOVA

7.1 Purpose and Design

7.2 Full Procedure

Step 1 — State hypotheses

Step 2 — Compute grand mean and group means

Step 3 — Compute sums of squares

Step 4 — Compute degrees of freedom

Step 5 — Compute mean squares and F

Step 6 — Compute p-value and make a decision

Step 7 — Compute effect sizes

Step 8 — Conduct post-hoc tests or planned contrasts

7.3 Computing ω2\omega^2ω2 and η2\eta^2η2 from F

7.4 Interpreting the Omnibus F-Test

8. Factorial Between-Subjects ANOVA

8.1 Purpose and Design

8.2 The Concept of Interaction

8.3 Simple Effects Analysis

8.4 Effect Sizes in Factorial ANOVA

9. One-Way Repeated Measures ANOVA

9.1 Purpose and Design

9.2 Full Procedure

Step 1 — Compute condition means and participant means

Step 2 — Compute sums of squares

Step 3 — Degrees of freedom

Step 4 — Mean squares and F

Step 5 — Sphericity check and correction

Step 6 — Effect size

10. Mixed ANOVA

10.1 Purpose and Design

10.2 The Primary Interaction: Time ×\times× Group

10.3 Probing a Significant Interaction

10.4 Sphericity in Mixed ANOVA

10.5 Effect Sizes for Mixed ANOVA

11. Post-Hoc Tests and Planned Contrasts

11.1 The Need for Post-Hoc Testing

11.2 Overview of Post-Hoc Tests

1.7 Variance Explained: $\eta^2$ , $\omega^2$ , and $\varepsilon^2$

3.8 Partial Eta Squared ( $\eta_p^2$ ) for Factorial Designs

7.3 Computing $\omega^2$ and $\eta^2$ from F

10.2 The Primary Interaction: Time $\times$ Group

Step 5 — Effect size: $\eta^2_H$

Step 3 — Compute Friedman's $\chi^2_r$ statistic

Step 5 — Effect size: Kendall's $W$

Example 2: Two-Way Factorial ANOVA — Drug $\times$ Exercise on Anxiety

Mistake 2: Reporting $\eta^2$ as if It Were $\omega^2$

Mistake 3: Confusing $\eta^2$ and $\eta_p^2$ in Factorial Designs

Required Sample Size — One-Way ANOVA (80% Power, $\alpha = .05$ )