One-Way ANOVA: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of variance decomposition all the way through the mathematics, assumptions, effect sizes, post-hoc testing, non-parametric alternatives, interpretation, reporting, and practical usage of the One-Way ANOVA within the DataStatPro application. Whether you are encountering ANOVA for the first time or seeking a rigorous, unified understanding of between-groups inference, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is a One-Way ANOVA?
The Mathematics Behind One-Way ANOVA
Assumptions of One-Way ANOVA
Variants of One-Way ANOVA
Using the One-Way ANOVA Calculator Component
Full Step-by-Step Procedure
Effect Sizes for One-Way ANOVA
Post-Hoc Tests and Planned Contrasts
Confidence Intervals
Power Analysis and Sample Size Planning
Non-Parametric Alternative: Kruskal-Wallis Test
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into One-Way ANOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 The Logic of Comparing Groups

When we measure a continuous outcome across three or more independent groups, we ask: "Are the observed differences in group means larger than what we would expect from random sampling alone?" One-Way ANOVA answers this question by comparing two sources of variability:

Between-groups variability: How much do the group means differ from each other? If groups have truly different population means, this variability should be large.
Within-groups variability: How much do observations within each group vary around their own group mean? This reflects pure random (sampling) error.

If the between-groups variability is substantially larger than the within-groups variability, we conclude that the groups differ beyond what chance alone would produce.

1.2 Why Not Multiple t-Tests?

With $K$ groups, one could run all $\binom{K}{2} = K(K-1)/2$ pairwise t-tests. However, this inflates the familywise error rate (FWER):

$FWER = 1 - (1-\alpha)^m$

Where $m$ is the number of tests. For $K = 4$ groups ( $m = 6$ tests) at $\alpha = .05$ :

$FWER = 1 - (0.95)^6 = .265$

Over 26% chance of at least one false positive. The one-way ANOVA omnibus test maintains the FWER at exactly $\alpha$ for the simultaneous test that all group means are equal.

1.3 Variance and Its Decomposition

The variance of a dataset measures the average squared deviation from the mean:

$s^2 = \frac{\sum_{i=1}^n(x_i - \bar{x})^2}{n-1}$

The key insight behind ANOVA is that total variance can be partitioned into meaningful components. For a one-way design:

$SS_{total} = SS_{between} + SS_{within}$

Each sum of squares (SS), when divided by its degrees of freedom, becomes a mean square (MS) — a variance estimate. The ratio of these variance estimates is the F-statistic.

1.4 The F-Distribution

The F-distribution arises from the ratio of two independent chi-squared variates divided by their degrees of freedom. In ANOVA:

$F = \frac{MS_{between}}{MS_{within}} \sim F_{K-1,\;N-K}$ under $H_0$

Properties of the F-distribution:

Always non-negative (ratio of variances is always positive).
Right-skewed; skewness decreases as df increase.
Characterised by two df parameters: numerator ( $df_1 = K-1$ ) and denominator ( $df_2 = N-K$ ).
Under $H_0$ : $E[F] \approx 1$ (both MS estimate the same $\sigma^2$ ).
Under $H_1$ : $E[F] > 1$ (between-groups MS is inflated by true group differences).

1.5 The Expected Mean Squares

Understanding why the F-ratio works requires the expected values under $H_0$ and $H_1$ :

$E[MS_{within}] = \sigma^2$ (always)

$E[MS_{between}] = \sigma^2 + \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{K-1}$

Under $H_0$ (all $\mu_j$ equal): $E[MS_{between}] = \sigma^2$ , so $E[F] \approx 1$ .

Under $H_1$ : The second term is positive, so $E[MS_{between}] > \sigma^2$ , giving $E[F] > 1$ .

The non-centrality parameter:

$\lambda = \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{\sigma^2}$

links the true population effect to the power of the test.

1.6 Statistical Significance vs. Practical Significance

Like the t-test, the F-test answers: "Is the result unlikely under $H_0$ ?" It does not answer: "How large is the effect?"

With very large $N$ , even trivially small group differences produce significant F-values. A study with $N = 1{,}000$ participants across five groups might find $F(4, 995) = 3.50$ , $p = .008$ , while $\omega^2 = 0.010$ — a statistically significant but practically negligible effect.

Always report:

The $F$ -statistic, degrees of freedom, and p-value.
$\omega^2$ or $\varepsilon^2$ with 95% CI (practical effect size).
Group means and SDs.
Post-hoc comparisons with individual effect sizes.

1.7 The Relationship Between ANOVA and the t-Test

When $K = 2$ , the one-way ANOVA F-statistic is exactly the square of the independent samples t-statistic:

$F_{1,\;N-2} = t_{N-2}^2$

The p-values are identical (both two-tailed). ANOVA generalises the independent samples t-test to $K \geq 3$ groups.

1.8 The Relationship Between ANOVA and Regression

ANOVA is a special case of the General Linear Model (GLM):

$Y_i = \mu + \tau_j + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, \sigma^2)$

Where $\tau_j$ is the effect of group $j$ and $\sum_j \tau_j = 0$ (sum-to-zero constraint). In the regression framework, group membership is represented by dummy or effect-coded predictors. This equivalence is important because:

ANOVA results can always be replicated using regression.
Adding covariates (ANCOVA) is natural in the regression framework.
Unbalanced designs are handled more flexibly in regression.

2. What is a One-Way ANOVA?

2.1 The Core Idea

One-Way Analysis of Variance (ANOVA) is a parametric inferential procedure for testing whether the means of three or more independent groups are simultaneously equal. It is called "one-way" because there is exactly one independent variable (IV) with $K \geq 3$ levels, and "between-subjects" because different participants appear in each group.

The general omnibus null hypothesis:

$H_0: \mu_1 = \mu_2 = \cdots = \mu_K$

$H_1: \text{At least one } \mu_j \text{ differs from the others}$

2.2 What One-Way ANOVA Tests and Does Not Test

One-Way ANOVA tells you:

Whether the observed group mean differences are larger than expected by chance (omnibus test).
How much of the total outcome variance is explained by group membership ( $\omega^2$ , $\eta^2$ ).

One-Way ANOVA does NOT tell you:

Which specific groups differ from each other (requires post-hoc tests or planned contrasts).
The direction or magnitude of specific pairwise differences.
Whether the effect is practically meaningful (requires effect sizes with CIs).

2.3 Design Requirements

For one-way between-subjects ANOVA, the design must satisfy:

One continuous DV (interval or ratio scale).
One categorical IV with $K \geq 3$ levels (groups).
Different participants in each group (independent samples).
Each participant contributes exactly one score to exactly one group.

2.4 One-Way ANOVA in Context

Situation	Test
$K = 2$ groups, independent, normal	Independent t-test (Welch's)
$K \geq 3$ groups, independent, normal, equal variances	One-Way ANOVA
$K \geq 3$ groups, independent, normal, unequal variances	Welch's One-Way ANOVA
$K \geq 3$ groups, independent, non-normal or ordinal	Kruskal-Wallis test
$K \geq 3$ conditions, same participants, normal	One-Way Repeated Measures ANOVA
$K \geq 3$ conditions, same participants, non-normal	Friedman test
$K \geq 3$ groups + covariate	ANCOVA
$\geq 2$ IVs, independent groups	Factorial between-subjects ANOVA

2.5 Real-World Applications

Field	Example Application	IV (Levels)	DV
Clinical Psychology	CBT vs. BA vs. Waitlist on depression	3 therapy conditions	PHQ-9
Education	Lecture vs. Flipped vs. Project-based on scores	3 teaching methods	Exam %
Medicine	4 drug dosages on blood pressure	4 doses	Systolic BP
Marketing	5 ad formats on purchase intent	5 formats	Intent (0–100)
Neuroscience	3 sleep conditions on cognitive performance	3 conditions	Reaction time
Ecology	4 habitats on species richness	4 habitat types	Species count
HR/OB	3 leadership styles on productivity	3 styles	Units/hour
Nutrition	5 diets on weight loss	5 diet types	kg lost

3. The Mathematics Behind One-Way ANOVA

3.1 Notation

Symbol	Meaning
$K$	Number of groups
$n_j$	Sample size in group $j$
$N = \sum_{j=1}^K n_j$	Total sample size
$x_{ij}$	$i$ -th observation in group $j$
$\bar{x}_j = \frac{1}{n_j}\sum_i x_{ij}$	Mean of group $j$
$\bar{x}_{..} = \frac{1}{N}\sum_j\sum_i x_{ij}$	Grand mean
$s_j^2 = \frac{1}{n_j-1}\sum_i(x_{ij}-\bar{x}_j)^2$	Variance of group $j$

3.2 Sum of Squares Decomposition

Total Sum of Squares — total variability in the data:

$SS_{total} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_{..})^2$

$df_{total} = N - 1$

Between-Groups Sum of Squares — variability due to group membership:

$SS_{between} = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2$

$df_{between} = K - 1$

Within-Groups Sum of Squares — variability within each group (pure error):

$SS_{within} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_j)^2 = \sum_{j=1}^K(n_j-1)s_j^2$

$df_{within} = N - K$

Verification: $SS_{total} = SS_{between} + SS_{within}$ ; $df_{total} = df_{between} + df_{within}$

3.3 Mean Squares and the F-Ratio

Between-groups mean square:

$MS_{between} = \frac{SS_{between}}{K-1}$

Within-groups mean square (pooled error variance):

$MS_{within} = \frac{SS_{within}}{N-K}$

Note: $MS_{within}$ is the pooled estimate of the common population variance $\sigma^2$ , assuming homogeneity of variance across groups.

The F-statistic:

$F = \frac{MS_{between}}{MS_{within}}$

Under $H_0$ : $F \sim F_{K-1,\;N-K}$

p-value:

$p = P(F_{K-1,\;N-K} \geq F_{obs})$

3.4 The ANOVA Source Table

Source	SS	$df$	MS	$F$	$p$
Between groups	$SS_B$	$K-1$	$MS_B = SS_B/(K-1)$	$MS_B/MS_W$	$P(F \geq F_{obs})$
Within groups (Error)	$SS_W$	$N-K$	$MS_W = SS_W/(N-K)$
Total	$SS_T$	$N-1$

3.5 Computing the Grand Mean and Group Means

Grand mean (weighted by group sizes):

$\bar{x}_{..} = \frac{\sum_{j=1}^K n_j\bar{x}_j}{N} = \frac{\sum_{j=1}^K\sum_{i=1}^{n_j} x_{ij}}{N}$

For balanced designs (equal $n_j = n$ ):

$\bar{x}_{..} = \frac{1}{K}\sum_{j=1}^K \bar{x}_j$

3.6 The Pooled Standard Deviation

The pooled within-groups standard deviation $s_{pooled}$ is used for computing effect sizes for pairwise comparisons:

$s_{pooled} = \sqrt{MS_{within}} = \sqrt{\frac{\sum_{j=1}^K(n_j-1)s_j^2}{N-K}}$

This is a weighted average of the group standard deviations, using degrees of freedom as weights.

3.7 Computing SS from Summary Statistics

When only group means, SDs, and $n_j$ are available:

$SS_{between} = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2$

$SS_{within} = \sum_{j=1}^K(n_j-1)s_j^2$

$SS_{total} = SS_{between} + SS_{within}$

3.8 Computing $\omega^2$ and $\eta^2$ from F

When only the ANOVA table is reported (useful for computing effect sizes from published results):

Eta squared from F:

$\eta^2 = \frac{F \cdot df_B}{F \cdot df_B + df_W}$

Omega squared from F (approximate):

$\omega^2 \approx \frac{(F-1) \cdot df_B}{F \cdot df_B + df_W + 1}$

Exact omega squared from SS (preferred):

$\omega^2 = \frac{SS_B - (K-1)MS_W}{SS_T + MS_W}$

3.9 The Non-Central F-Distribution and Exact CIs

Under $H_1$ , the F-statistic follows a non-central F-distribution with non-centrality parameter $\lambda$ :

$\lambda = \frac{\sum_{j=1}^K n_j(\mu_j - \mu)^2}{\sigma^2} = \frac{\eta^2 \cdot N}{1-\eta^2}$

Exact 95% CI for $\omega^2$ (via non-central F):

Find $\lambda_L$ and $\lambda_U$ such that:

$P(F_{K-1,\;N-K}(\lambda_L) \geq F_{obs}) = 0.025$

$P(F_{K-1,\;N-K}(\lambda_U) \leq F_{obs}) = 0.025$

Then convert to $\eta^2$ : $\eta^2_L = \lambda_L/(\lambda_L+N)$ , $\eta^2_U = \lambda_U/(\lambda_U+N)$

And then to $\omega^2$ using the bias correction. DataStatPro performs this numerical computation automatically.

4. Assumptions of One-Way ANOVA

4.1 Normality of Residuals (Within-Group Normality)

One-Way ANOVA assumes that within each population, the observations are normally distributed. Equivalently, the residuals $e_{ij} = x_{ij} - \bar{x}_j$ should be normally distributed.

How to check:

Shapiro-Wilk test on residuals (most powerful for $n < 50$ per group; $H_0$ : residuals are normal). A significant result suggests departure.
Shapiro-Wilk per group (preferred for small $n_j$ ): run separately for each group.
Q-Q plot of all residuals: points should follow the diagonal reference line.
Histograms per group: approximate bell shape.
Skewness ( $|z_{skew}| < 2$ ) and kurtosis ( $|z_{kurt}| < 7$ ) of residuals.

Robustness: ANOVA is remarkably robust to mild non-normality, especially when:

Group sizes are equal (balanced design).
$n_j \geq 15$ – $20$ per group (CLT applies to group means).
The departure is mild skewness rather than heavy tails with extreme outliers.

When violated: Use the Kruskal-Wallis test as a non-parametric alternative (Section 12). Consider Box-Cox data transformations (log, square root) for right-skewed data. Report trimmed mean ANOVA for heavy-tailed distributions.

4.2 Homogeneity of Variance (Homoscedasticity)

One-Way ANOVA assumes all $K$ population variances are equal:

$\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_K^2$

This assumption is required for $MS_{within}$ to serve as a valid pooled estimate of the common error variance $\sigma^2$ .

How to check:

Levene's test (preferred — robust to non-normality): Tests $H_0: \sigma_1^2 = \cdots = \sigma_K^2$ . A significant result indicates heteroscedasticity.
Brown-Forsythe test (more robust than Levene's for non-normal data, uses group medians rather than means).
Bartlett's test (powerful but sensitive to non-normality — avoid for non-normal data).
Variance ratio rule of thumb: If $s^2_{max}/s^2_{min} > 4$ , heterogeneity is potentially problematic, especially with unequal $n_j$ .

Robustness: ANOVA is relatively robust to heteroscedasticity when:

Group sizes are equal ( $n_j$ all equal) — equal $n_j$ is a strong protective factor.
The larger variance is associated with the larger group (liberal; Type I error slightly inflated but manageable).

When $n_j$ are unequal AND variances are unequal, ANOVA Type I error can be severely inflated (or deflated depending on the pattern).

When violated: Use Welch's one-way ANOVA (Section 5), which does not assume equal variances. Follow with Games-Howell pairwise comparisons.

4.3 Independence of Observations

All observations must be independent of each other, both within and across groups. This is a design assumption — it cannot be tested statistically from the data.

Common violations:

Participants from the same family, classroom, or hospital ward.
Multiple measurements from the same participant treated as independent.
Time series data with autocorrelated errors.
Social networks where participants influence each other's scores.

When violated: Use multilevel models (participants nested within clusters), repeated measures ANOVA (repeated observations within participants), or time-series methods.

4.4 Interval Scale of Measurement

The dependent variable must be measured on at least an interval scale — equal spacing between values. Difference scores must be meaningful.

When violated: Use the Kruskal-Wallis test (for ordinal data), or ordinal regression for ordered categorical outcomes.

4.5 Absence of Influential Outliers

Extreme outliers distort both $SS_{between}$ and $SS_{within}$ , producing unreliable F-statistics.

How to check:

Boxplots per group: values beyond $1.5 \times IQR$ from the quartile are mild outliers; beyond $3 \times IQR$ are extreme.
Standardised residuals: $|e_{ij}/s_{pooled}| > 3$ flags potential outliers.
Studentised deleted residuals from the ANOVA model: $|t_i| > 3$ .

When outliers present: Investigate the cause (data entry error? legitimate extreme score?). Report analyses with and without outliers. Consider the Kruskal-Wallis test or trimmed mean ANOVA as robust alternatives.

4.6 Balanced vs. Unbalanced Designs

While equal group sizes ( $n_j$ all equal, balanced design) are not formally required, they are strongly preferred because:

ANOVA is more robust to normality and homoscedasticity violations with equal $n_j$ .
Statistical power is maximised for a given total $N$ .
Effect size estimation is less biased.

Unbalanced designs (unequal $n_j$ ) are common in observational research. They require careful attention to variance heterogeneity and post-hoc test selection.

4.7 Assumption Summary Table

Assumption	Description	How to Check	Remedy if Violated
Normality	Residuals $\sim \mathcal{N}(0,\sigma^2)$ within groups	Shapiro-Wilk, Q-Q plot	Kruskal-Wallis; transform
Homoscedasticity	$\sigma_1^2 = \cdots = \sigma_K^2$	Levene's, Brown-Forsythe	Welch's ANOVA + Games-Howell
Independence	Observations independent within and across groups	Design review	Multilevel model
Interval scale	DV has equal-interval properties	Measurement theory	Kruskal-Wallis
No outliers	No extreme influential values	Boxplots, standardised residuals	Investigate; Kruskal-Wallis

5. Variants of One-Way ANOVA

5.1 Standard One-Way ANOVA (Student's F-test)

The default one-way ANOVA assuming equal variances across groups. Uses the pooled $MS_{within}$ as the error term. This is appropriate when Levene's test is non-significant AND group sizes are approximately equal.

5.2 Welch's One-Way ANOVA

Welch's F-test (1951) is the recommended default for one-way between-subjects ANOVA. It does not assume homogeneity of variance. The statistic:

$F_W = \frac{\sum_{j=1}^K w_j(\bar{x}_j - \tilde{x})^2/(K-1)}{1 + \frac{2(K-2)}{K^2-1}\sum_{j=1}^K\frac{(1-w_j/\sum w_j)^2}{n_j-1}}$

Where $w_j = n_j/s_j^2$ and $\tilde{x} = \sum w_j\bar{x}_j/\sum w_j$ (weighted grand mean).

Welch-Satterthwaite df:

$\nu_W = \frac{K^2-1}{3\sum_{j=1}^K\frac{(1-w_j/\sum w_j)^2}{n_j-1}}$

Post-hoc: Use Games-Howell pairwise comparisons when Welch's ANOVA is significant.

💡 Just as Welch's t-test is the recommended default for two groups, Welch's one-way ANOVA is increasingly recommended as the default for $K \geq 3$ independent groups. The power loss when variances are truly equal is negligible, while the Type I error protection when variances differ is substantial. DataStatPro reports both standard and Welch's ANOVA by default.

5.3 Trimmed Mean ANOVA (Robust)

Trimmed mean ANOVA (Wilcox, 2017) replaces standard means with $\alpha$ -trimmed means (e.g., 20% trimming from each tail). It is substantially more powerful than the Kruskal-Wallis test for symmetric heavy-tailed distributions while controlling Type I error under non-normality. Available in DataStatPro under "Robust ANOVA."

5.4 Brown-Forsythe F-test

An alternative to Welch's ANOVA that is more robust to certain distributional departures. Uses median-centred deviations for variance estimation. DataStatPro provides this as an additional output alongside Welch's F.

5.5 Choosing Between Variants

Condition	Recommended Test
Normal data, equal variances, balanced	Standard ANOVA (or Welch's — nearly identical)
Normal data, unequal variances OR unequal $n_j$	Welch's ANOVA (recommended default)
Mildly non-normal, large $n_j \geq 20$	Either standard or Welch's
Non-normal, small $n_j$	Kruskal-Wallis or Trimmed Mean ANOVA
Severely non-normal with outliers	Kruskal-Wallis
Ordinal DV	Kruskal-Wallis

6. Using the One-Way ANOVA Calculator Component

The One-Way ANOVA Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting one-way ANOVA and its alternatives.

Step-by-Step Guide

Step 1 — Select "One-Way Between-Subjects ANOVA"

From the "Test Type" dropdown, choose:

One-Way ANOVA (Standard): Equal variance assumption; Student's F.
One-Way ANOVA (Welch's): Recommended default; no equal variance assumption.
One-Way ANOVA (Auto): Runs both; uses Welch's when Levene's is significant.
Kruskal-Wallis Test: Non-parametric alternative.

Step 2 — Input Method

Choose how to provide the data:

Raw data (long format): Two columns — one for the DV values, one for group membership. DataStatPro computes all statistics, runs all assumption checks, and generates full output automatically.
Raw data (wide format): One column per group. DataStatPro converts to long format.
Summary statistics: Enter $K$ , $n_j$ , $\bar{x}_j$ , and $s_j$ for each group. Full assumption checks are not available; all inferential statistics and effect sizes are computed.
ANOVA table values: Enter $SS_B$ , $SS_W$ , $df_B$ , $df_W$ (from a published paper) to compute effect sizes, power, and CIs.

Step 3 — Specify Group Labels

Enter descriptive names for each group (e.g., "CBT," "BA," "Waitlist"). These labels appear in all output tables, plots, and the auto-generated APA paragraph.

Step 4 — Select Assumption Checks

DataStatPro automatically runs and displays:

✅ Shapiro-Wilk normality test on residuals (and per group for small $n_j$ ).
✅ Levene's test for homogeneity of variance.
✅ Brown-Forsythe test for homogeneity of variance (alongside Levene's).
✅ Boxplots per group for outlier detection.
✅ Q-Q plot of residuals for normality assessment.
✅ Variance ratio ( $s^2_{max}/s^2_{min}$ ) with warning if $> 4$ .

Step 5 — Select Post-Hoc Tests

When the omnibus F is significant, select post-hoc tests:

Tukey HSD (default for balanced, equal variances).
Tukey-Kramer (unbalanced, equal variances).
Games-Howell (recommended when Levene's significant or Welch's used).
Bonferroni (conservative; any design).
Holm-Bonferroni (less conservative than Bonferroni).
Dunnett's (all vs. one control group).
Scheffé (all possible contrasts).
Custom planned contrasts (specify contrast weights $c_j$ ).

Step 6 — Select Effect Sizes

✅ $\omega^2$ (bias-corrected; primary).
✅ $\varepsilon^2$ (alternative bias correction).
✅ $\eta^2$ (biased; provided for comparison and journal requirements).
✅ Cohen's $f$ (for power analysis).
✅ 95% CIs for $\omega^2$ and $\eta^2$ via non-central F-distribution.
✅ Cohen's $d_{jk}$ with 95% CI for each post-hoc pairwise comparison.

Step 7 — Select Display Options

✅ Full ANOVA source table with $F$ , df, $p$ .
✅ Descriptive statistics table ( $n_j$ , $\bar{x}_j$ , $SD_j$ , $SE_j$ , 95% CI per group).
✅ Effect size table ( $\omega^2$ , $\varepsilon^2$ , $\eta^2$ , $f$ ) with CIs.
✅ Post-hoc comparison table (all pairs: difference, $SE$ , adjusted $p$ , $d_{jk}$ , 95% CI for difference).
✅ Assumption test results panel (colour-coded: green/yellow/red).
✅ Raincloud plot per group (half violin + boxplot + raw data points).
✅ Means plot with 95% CIs and individual data points.
✅ Cohen's $d$ diagram for each significant pairwise comparison.
✅ Power curve: power vs. $n$ for observed $\omega^2$ .
✅ APA 7th edition-compliant results paragraph (auto-generated).

Step 8 — Run the Analysis

Click "Run One-Way ANOVA". DataStatPro will:

Compute the full ANOVA source table.
Run all assumption tests and display colour-coded warnings.
Automatically switch to Welch's ANOVA if Levene's test is significant (when "Auto" selected).
Compute all effect sizes with exact non-central F-based CIs.
Run all selected post-hoc tests with adjusted p-values and individual $d_{jk}$ .
Generate all visualisations.
Auto-generate the APA-compliant results paragraph.

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

This section walks through every computational step for one-way ANOVA, from raw data to a complete APA-style conclusion.

Given: $K$ groups with observations $x_{ij}$ for $i = 1,\ldots,n_j$ and $j = 1,\ldots,K$ . Total $N = \sum_j n_j$ .

Step 1 — State the Hypotheses

$H_0: \mu_1 = \mu_2 = \cdots = \mu_K$

$H_1:$ At least one pair $\mu_j \neq \mu_k$ for $j \neq k$

Choose $\alpha$ (default: $.05$ ).

Step 2 — Compute Descriptive Statistics per Group

For each group $j$ :

$\bar{x}_j = \frac{1}{n_j}\sum_{i=1}^{n_j}x_{ij}$

$s_j = \sqrt{\frac{\sum_{i=1}^{n_j}(x_{ij}-\bar{x}_j)^2}{n_j-1}}$

$SE_j = s_j/\sqrt{n_j}$

Step 3 — Compute the Grand Mean

$\bar{x}_{..} = \frac{\sum_{j=1}^K n_j\bar{x}_j}{N}$

Step 4 — Check Assumptions

Normality: Run Shapiro-Wilk on residuals $e_{ij} = x_{ij} - \bar{x}_j$ . If $p_{SW} < .05$ and $n_j < 30$ : consider Kruskal-Wallis.

Homoscedasticity: Run Levene's test. If $p_{Levene} < .05$ : use Welch's ANOVA.

Outliers: Inspect boxplots and standardised residuals.

Step 5 — Compute Sums of Squares

$SS_{between} = \sum_{j=1}^K n_j(\bar{x}_j - \bar{x}_{..})^2$

$SS_{within} = \sum_{j=1}^K(n_j-1)s_j^2$

$SS_{total} = SS_{between} + SS_{within}$

Verification: $SS_{total}$ can also be computed directly as $\sum_j\sum_i(x_{ij}-\bar{x}_{..})^2$ — both must agree.

Step 6 — Compute Degrees of Freedom

$df_{between} = K-1, \quad df_{within} = N-K, \quad df_{total} = N-1$

Step 7 — Compute Mean Squares

$MS_{between} = SS_{between}/(K-1)$

$MS_{within} = SS_{within}/(N-K)$

Step 8 — Compute the F-Statistic and p-value

$F = MS_{between}/MS_{within}$

$p = P(F_{K-1,\;N-K} \geq F_{obs})$

Reject $H_0$ if $p \leq \alpha$ .

Step 9 — Compute Effect Sizes

Eta squared (biased):

$\eta^2 = SS_{between}/SS_{total}$

Omega squared (preferred, bias-corrected):

$\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}}$

Epsilon squared (alternative correction):

$\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}}$

Cohen's $f$ :

$f = \sqrt{\omega^2/(1-\omega^2)}$

Step 10 — Compute 95% CI for $\omega^2$

Using the non-central F-distribution (computed numerically by DataStatPro).

Non-centrality parameter: $\hat{\lambda} = (F-1) \times df_{between}$

Find $\lambda_L$ , $\lambda_U$ such that $P(F_{df_B,df_W}(\lambda) \geq F_{obs}) = 0.025$ for each bound.

$\omega^2_L = \frac{\lambda_L - df_B}{(\lambda_L - df_B) + N}$ (approximate)

$\omega^2_U = \frac{\lambda_U - df_B}{(\lambda_U - df_B) + N}$ (approximate)

Step 11 — Conduct Post-Hoc Tests (if $F$ significant)

Select the appropriate post-hoc test (Section 9). Compute pairwise differences, standard errors, adjusted p-values, and individual Cohen's $d_{jk}$ for each pair.

Step 12 — Interpret and Report

Combine all results into an APA-compliant report (Section 13.7).

8. Effect Sizes for One-Way ANOVA

8.1 Eta Squared ( $\eta^2$ ) — Common but Biased

$\eta^2 = \frac{SS_{between}}{SS_{total}}$

$\eta^2$ is the proportion of total sample variance explained by group membership. It is the most commonly reported ANOVA effect size and appears as default output in SPSS.

Critical limitation: $\eta^2$ is positively biased — it systematically overestimates the true population effect size, especially in small samples and when $K$ is large relative to $N$ . The bias magnitude:

$\text{Bias} = \eta^2 - \omega^2 \approx \frac{(K-1)(1-\eta^2)}{N-1}$

For $K = 3$ , $N = 30$ : Bias $\approx 2(1-\eta^2)/29$ — can be several percentage points. For $K = 3$ , $N = 300$ : Bias $\approx 2(1-\eta^2)/299$ — negligible.

⚠️ Report $\eta^2$ only when explicitly required by a journal or for historical comparison. Always report $\omega^2$ (or $\varepsilon^2$ ) as the primary effect size and label $\eta^2$ as "biased" in your manuscript.

8.2 Omega Squared ( $\omega^2$ ) — Preferred

$\omega^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total} + MS_{within}}$

$\omega^2$ is a bias-corrected estimate of the population proportion of variance explained by the IV. It is the recommended primary effect size for one-way ANOVA.

Properties:

Can be slightly negative in small samples when the true effect is zero or near zero — because the correction overshoots. Report as 0 by convention when negative.
Always $\leq \eta^2$ (the correction always reduces or maintains the estimate).
Converges to $\eta^2$ as $N \to \infty$ .
The population parameter estimated: $\omega^2_{pop} = \sigma^2_{between}/(\sigma^2_{between}+\sigma^2_{within})$

From F-statistic and df (approximate):

$\omega^2 \approx \frac{(F-1)(K-1)}{(F-1)(K-1) + N}$

8.3 Epsilon Squared ( $\varepsilon^2$ ) — Alternative Correction

$\varepsilon^2 = \frac{SS_{between} - (K-1)MS_{within}}{SS_{total}}$

$\varepsilon^2$ uses the same numerator as $\omega^2$ but divides by $SS_{total}$ instead of $SS_{total} + MS_{within}$ .

Properties:

Always between $\omega^2$ and $\eta^2$ : $\omega^2 \leq \varepsilon^2 \leq \eta^2$ .
Slightly less bias-correction than $\omega^2$ .
Computationally simpler than $\omega^2$ (no addition of $MS_{within}$ in denominator).
Increasingly reported alongside $\omega^2$ in recent literature.

8.4 Cohen's $f$ — For Power Analysis

$f = \sqrt{\frac{\eta^2}{1-\eta^2}}$ or $f = \sqrt{\frac{\omega^2}{1-\omega^2}}$

Cohen's $f$ is used as the effect size input for ANOVA power analysis. It represents the ratio of between-groups SD to within-groups SD.

From group means and $\sigma$ (when population parameters are known):

$f = \frac{\sigma_{\mu}}{\sigma}$

Where $\sigma_{\mu} = \sqrt{\sum_j n_j(\mu_j-\mu)^2/N}$ is the SD of the group means.

Benchmarks: Small = 0.10, Medium = 0.25, Large = 0.40 (Cohen, 1988).

8.5 Comparison of Effect Size Estimates

For a dataset with $K = 3$ , $N = 45$ (15 per group), and $F(2, 42) = 8.50$ :

$\eta^2 = \frac{8.50 \times 2}{8.50 \times 2 + 42} = \frac{17.00}{59.00} = 0.288$

$\omega^2 \approx \frac{(8.50-1)\times 2}{8.50\times 2 + 42 + 1} = \frac{15.00}{60.00} = 0.250$

$\varepsilon^2 = \frac{SS_B - (K-1)MS_W}{SS_T}$ (requires full SS values)

The difference $\eta^2 - \omega^2 = 0.038$ (almost 4 percentage points of overestimation from $\eta^2$ ) — substantial and worth correcting.

8.6 Effect Sizes for Pairwise Comparisons

After the omnibus F-test, report individual effect sizes for each significant pairwise comparison using $\sqrt{MS_{within}}$ as the standardiser:

Cohen's $d_{jk}$ (using pooled within-groups SD):

$d_{jk} = \frac{\bar{x}_j - \bar{x}_k}{\sqrt{MS_{within}}}$

Hedges' $g_{jk}$ (bias-corrected):

$g_{jk} = d_{jk} \times \left(1 - \frac{3}{4(N-K)-1}\right)$

Using $\sqrt{MS_{within}}$ from the full ANOVA model (rather than just the two-group pooled SD) is preferred because it is based on all $K$ groups and is therefore a more stable estimate of the common population SD.

95% CI for the pairwise mean difference:

$(\bar{x}_j - \bar{x}_k) \pm t_{\alpha/2,\;N-K} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

8.7 Omega Squared vs. Partial Omega Squared

In one-way ANOVA with a single IV:

$\omega^2 = \omega_p^2$ (they are identical for one-factor designs)

$\eta^2 = \eta_p^2$ (they are identical for one-factor designs)

The partial and non-partial versions diverge only in factorial (multi-factor) designs.

9. Post-Hoc Tests and Planned Contrasts

9.1 The Need for Post-Hoc Testing

A significant omnibus F-test establishes that at least one group mean differs from the others. Post-hoc tests (also called multiple comparison procedures) identify which specific pairs of groups differ, while controlling the FWER.

The key trade-off: Controlling the FWER requires more conservative critical values, which reduces power for individual comparisons. The choice of post-hoc test involves balancing Type I error control and statistical power.

9.2 Tukey's HSD — Standard Pairwise Comparisons

Tukey's Honestly Significant Difference (HSD) is the most widely used post-hoc test for balanced designs with equal variances. It controls the FWER at exactly $\alpha$ for all pairwise comparisons simultaneously.

Critical value: The studentised range distribution $q_{K,\;N-K,\;\alpha}$ .

Minimum Significant Difference:

$\text{HSD} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{n}}$ (balanced)

For unequal group sizes (Tukey-Kramer):

$\text{HSD}_{jk} = q_{K,\;N-K,\;\alpha} \times \sqrt{\frac{MS_{within}}{2}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

Declare groups $j$ and $k$ significantly different if $|\bar{x}_j - \bar{x}_k| > \text{HSD}_{jk}$ .

95% CI for pairwise difference $\mu_j - \mu_k$ :

$(\bar{x}_j-\bar{x}_k) \pm q_{K,\;N-K,\;\alpha}/\sqrt{2} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

9.3 Games-Howell — Unequal Variances

When Levene's test is significant or Welch's ANOVA is used, Games-Howell is the recommended post-hoc procedure. It uses separate variance estimates per pair:

$t_{jk} = \frac{\bar{x}_j-\bar{x}_k}{\sqrt{s_j^2/n_j+s_k^2/n_k}}$

Compared against the studentised range distribution with Welch-Satterthwaite df:

$\nu_{jk} = \frac{(s_j^2/n_j+s_k^2/n_k)^2}{(s_j^2/n_j)^2/(n_j-1)+(s_k^2/n_k)^2/(n_k-1)}$

9.4 Bonferroni and Holm-Bonferroni Corrections

Bonferroni correction (simplest, most conservative):

Compare each pairwise p-value to $\alpha^* = \alpha/m$ where $m = K(K-1)/2$ .

Holm-Bonferroni sequential procedure (less conservative, same FWER control):

Sort the $m$ p-values: $p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(m)}$ .
Compare $p_{(i)}$ to $\alpha/(m-i+1)$ .
Reject all $H_{0(i)}$ for which $p_{(j)} \leq \alpha/(m-j+1)$ for all $j \leq i$ .

Holm-Bonferroni is uniformly more powerful than Bonferroni and should be preferred in all cases.

9.5 Dunnett's Test — All Groups vs. One Control

When comparing $K-1$ experimental groups to a single control group (and not making comparisons among experimental groups), Dunnett's test provides optimal power while controlling FWER.

$m = K-1$ comparisons (each experimental group vs. control)

Compared against Dunnett's distribution (not the studentised range) with parameters $(K-1, N-K)$ .

9.6 Planned Contrasts — A Priori Comparisons

Planned contrasts are specific, theoretically motivated comparisons formulated before data collection. They are more powerful than post-hoc tests because:

They do not require a significant omnibus F-test (though conducting the F-test first is still recommended).
Fewer comparisons means less severe FWER correction.
Orthogonal planned contrasts do not require any correction.

Contrast specification: A contrast is a linear combination $\psi = \sum_{j=1}^K c_j\mu_j$ with the constraint $\sum_{j=1}^K c_j = 0$ .

Contrast SS and F:

$SS_\psi = \frac{\left(\sum_j c_j\bar{x}_j\right)^2}{\sum_j c_j^2/n_j}$

$F_\psi = SS_\psi/MS_{within}$ , compared to $F_{1,\;N-K}$

Orthogonality condition (two contrasts $c$ and $c'$ are orthogonal if):

$\sum_{j=1}^K \frac{c_j c_j'}{n_j} = 0$

A set of $K-1$ mutually orthogonal contrasts fully partitions $SS_{between}$ :

$\sum_{\psi=1}^{K-1} SS_\psi = SS_{between}$

Example for $K = 4$ groups (Control, Drug A, Drug B, Drug C):

Contrast	$c_1$	$c_2$	$c_3$	$c_4$	Comparison
$\psi_1$	$3$	$-1$	$-1$	$-1$	Control vs. all drugs
$\psi_2$	$0$	$2$	$-1$	$-1$	Drug A vs. Drugs B and C
$\psi_3$	$0$	$0$	$1$	$-1$	Drug B vs. Drug C

These three contrasts are mutually orthogonal (for equal $n_j$ ) and decompose $SS_{between}$ into three orthogonal components — no FWER correction needed.

10. Confidence Intervals

10.1 95% CI for Each Group Mean

The 95% CI for the population mean $\mu_j$ :

$\bar{x}_j \pm t_{\alpha/2,\;N-K} \times \sqrt{MS_{within}/n_j}$

Note: This CI uses $MS_{within}$ from the full ANOVA model (not $s_j$ ), producing a more stable estimate that borrows strength from all groups (valid under homoscedasticity).

10.2 95% CI for Pairwise Mean Differences

The 95% CI for $\mu_j - \mu_k$ :

$(\bar{x}_j-\bar{x}_k) \pm t_{\alpha/2,\;N-K} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

Using Tukey-adjusted critical values (simultaneous 95% CI for all pairs):

$(\bar{x}_j-\bar{x}_k) \pm \frac{q_{K,\;N-K,\;\alpha}}{\sqrt{2}} \times \sqrt{MS_{within}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}$

Tukey-adjusted CIs are wider but simultaneously valid for all $\binom{K}{2}$ pairs.

10.3 95% CI for $\omega^2$

The exact CI uses the non-central F-distribution (DataStatPro computes this numerically). The CI communicates the precision of the effect size estimate and is required for complete reporting.

CI width as a function of $N$ and $K$ (for $\omega^2 = 0.10$ ):

$N$ ( $K=3$ )	Approx. CI Width for $\omega^2$	Precision
30	0.24	Very low
60	0.17	Low
90	0.14	Moderate
150	0.11	Good
300	0.08	High
600	0.05	Very high

⚠️ With only 30 participants ( $n = 10$ /group), the 95% CI for $\omega^2 = 0.10$ spans approximately $[0.00, 0.24]$ — essentially uninformative about the true effect magnitude. Always report the CI alongside the point estimate.

11. Power Analysis and Sample Size Planning

11.1 A Priori Power Analysis

A priori power analysis determines the required sample size before data collection to achieve desired power $1-\beta$ at significance level $\alpha$ for a hypothesised effect of size $f$ .

Non-centrality parameter: $\lambda = f^2 \times N = f^2 \times Kn$ (balanced design)

Power computation (exact, using non-central F):

$\text{Power} = P\!\left(F_{K-1,\;K(n-1)}(\lambda) > F_{crit}\right)$

Where $F_{crit} = F_{\alpha,\;K-1,\;K(n-1)}$ and $\lambda = f^2 \times Kn$ .

No closed form exists — DataStatPro uses numerical integration of the non-central F-distribution.

Approximate $n$ per group:

$n \approx \frac{(z_{1-\alpha/(K-1)} + z_{1-\beta})^2}{f^2} + 1$

Required $n$ per group for 80% power ( $\alpha = .05$ , two-sided):

$f$	$\omega^2$	$K=3$	$K=4$	$K=5$	$K=6$
0.10	0.010	322	274	240	215
0.15	0.022	144	123	107	96
0.20	0.038	82	70	61	55
0.25	0.059	52	45	39	35
0.30	0.082	37	32	28	25
0.40	0.138	21	18	16	14
0.50	0.200	14	12	11	10
0.60	0.265	10	9	8	7

11.2 Determining $f$ from Prior Literature

When prior studies report $\eta^2$ or $\omega^2$ :

$f = \sqrt{\frac{\omega^2_{prior}}{1-\omega^2_{prior}}}$

When prior studies report group means and a common SD estimate:

$f = \frac{\text{SD of group means}}{\text{estimated within-group SD}} = \frac{\sqrt{\sum_j(\mu_j-\mu)^2/K}}{\sigma}$

When only a t-statistic from a pilot study with two groups is available:

$f = \frac{|d|}{2}$ (approximate, for a two-group pilot)

11.3 Sensitivity Analysis

The minimum detectable $f$ for a given $N$ , $K$ , and target power:

$f_{min} \approx \sqrt{\frac{(z_{.975}+z_{.80})^2}{n \cdot K}} \approx \sqrt{\frac{7.849}{N}}$

For total $N = 90$ ( $n = 30$ per group, $K = 3$ ):

$f_{min} \approx \sqrt{7.849/90} = \sqrt{0.0872} = 0.295$

This study can reliably detect only medium-to-large effects ( $f \geq 0.30$ , $\omega^2 \geq 0.082$ ). Smaller effects may exist but would be missed with 80% power.

⚠️ Report sensitivity analysis for null or inconclusive results. Do not use "observed power" (power computed from the observed effect size) — this is circular and provides no additional information beyond the p-value.

11.4 Planning for Specific Group Contrasts

When the primary research interest is in a specific planned contrast (rather than the omnibus F-test), power analysis should target that contrast:

For a contrast $\psi = \sum_j c_j\mu_j$ with $n_j = n$ :

$\lambda_\psi = \frac{(\sum_j c_j\mu_j)^2}{\sigma^2\sum_j c_j^2/n}$

Power for this contrast uses $F_{1,\;N-K}$ and non-centrality $\lambda_\psi$ . Power for planned contrasts is higher than for the omnibus F-test for the same data.

12. Non-Parametric Alternative: Kruskal-Wallis Test

12.1 When to Use the Kruskal-Wallis Test

The Kruskal-Wallis H test is the appropriate alternative to one-way ANOVA when:

Data are ordinal (e.g., Likert items, ranked outcomes).
Data are continuous but severely non-normally distributed with small $n_j < 15$ .
There are extreme outliers that cannot be explained or removed.
The homogeneity of variance assumption is severely violated and Welch's ANOVA is not adequate.

12.2 The Kruskal-Wallis Procedure

Step 1 — Rank all observations:

Combine all $N$ observations and rank from 1 (smallest) to $N$ (largest). Assign average (mid)ranks for tied values.

Step 2 — Compute rank sums per group:

$W_j = \sum_{i=1}^{n_j}R_{ij}$ (sum of ranks for group $j$ )

Step 3 — Compute the H statistic:

$H = \frac{12}{N(N+1)}\sum_{j=1}^K\frac{W_j^2}{n_j} - 3(N+1)$

Tie correction:

$H_c = \frac{H}{1 - \sum_m(t_m^3-t_m)/(N^3-N)}$

Where $t_m$ = number of observations in the $m$ -th tied group.

Step 4 — p-value:

For small samples with few groups: use exact tables. For $n_j \geq 5$ : $H_c \sim \chi^2_{K-1}$ approximately.

$p = P(\chi^2_{K-1} \geq H_c)$

Step 5 — Effect size ( $\eta^2_H$ ):

$\eta^2_H = \frac{H_c - K + 1}{N - K}$

Or equivalently: $\eta^2_H = (H_c/(N-1))$

Cohen's benchmarks apply: small = .01, medium = .06, large = .14.

12.3 Post-Hoc Tests for Kruskal-Wallis

When $H$ is significant, conduct pairwise comparisons using the Dunn test with Holm-Bonferroni correction:

$z_{jk} = \frac{\bar{R}_j - \bar{R}_k}{\sqrt{\frac{N(N+1)}{12}\left(\frac{1}{n_j}+\frac{1}{n_k}\right)}}$

Where $\bar{R}_j$ and $\bar{R}_k$ are the mean ranks for groups $j$ and $k$ .

Effect size for each pairwise comparison (rank-biserial $r_{rb}$ ):

$r_{rb,jk} = \frac{2z_{jk}}{\sqrt{n_j+n_k}}$

12.4 Asymptotic Relative Efficiency

For normal data, the Kruskal-Wallis test has ARE $= 3/\pi \approx 0.955$ relative to the one-way ANOVA — a negligible efficiency loss. For non-normal data (especially heavy-tailed distributions), the Kruskal-Wallis test can be substantially more powerful than the F-test.

13. Advanced Topics

13.1 ANOVA as a Linear Model

One-way ANOVA is a special case of linear regression with effect-coded (or dummy-coded) predictors. For $K = 3$ groups using effect coding:

$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_i$

Where:

$X_{1i} = 1$ if group 1, $-1$ if group 3, $0$ otherwise.
$X_{2i} = 1$ if group 2, $-1$ if group 3, $0$ otherwise.
$\beta_0 = \mu$ (grand mean under effect coding).
$\beta_1 = \mu_1 - \mu$ (effect of group 1 relative to grand mean).
$\beta_2 = \mu_2 - \mu$ (effect of group 2 relative to grand mean).

The $F$ -statistic for the regression model equals the ANOVA $F$ -statistic. This equivalence allows ANOVA to be computed using any regression software.

13.2 Trend Analysis for Ordered Groups

When the $K$ levels represent an ordered quantitative variable (e.g., dose: 0, 10, 20, 40 mg), polynomial trend analysis is more informative than omnibus F and pairwise comparisons. Orthogonal polynomial contrasts test:

Linear trend: Do means increase (or decrease) monotonically?
Quadratic trend: Do means follow a U-shape or inverted-U?
Cubic trend: Is there an S-shaped pattern?

Orthogonal polynomial coefficients for $K = 4$ equally spaced groups:

Trend	$c_1$	$c_2$	$c_3$	$c_4$
Linear	$-3$	$-1$	$1$	$3$
Quadratic	$1$	$-1$	$-1$	$1$
Cubic	$-1$	$3$	$-3$	$1$

Each trend contrast has $df = 1$ and $\sum_j c_j^2 =$ specified value from tables.

The three trend SS sum to $SS_{between}$ , fully partitioning the between-groups variance.

13.3 Dealing with Unequal Sample Sizes

In unbalanced designs, the grand mean is the weighted (not simple) average of group means. Several practical considerations:

Power is maximised when group sizes are equal. Any deviation from balance reduces total power.
Levene's test becomes critical: unequal $n_j$ combined with unequal variances is the most damaging violation.
Post-hoc tests: Use Tukey-Kramer (not Tukey HSD) for unbalanced designs with equal variances; Games-Howell for unequal variances.
Contrast tests: Divide contrast coefficients by group sizes: $\sum_j c_j/n_j = 0$ for orthogonality with unequal $n_j$ .

13.4 Bayesian One-Way ANOVA

Bayesian ANOVA (Rouder et al., 2012) computes Bayes Factors comparing models:

$BF_{10} = \frac{P(\text{data} \mid H_1: \text{group effect present})}{P(\text{data} \mid H_0: \text{all groups equal})}$

The prior on standardised group effects under $H_1$ uses a Cauchy distribution with scale $r = \sqrt{2}/2$ (default "medium" effect prior). DataStatPro computes this via the BayesFactor method.

Advantages:

Quantifies evidence for $H_0$ (group equality), not just failure to reject it.
Valid for sequential (interim) analyses.
Provides posterior distributions for group effects.

Reporting: "A Bayesian one-way ANOVA (Cauchy prior, $r = \sqrt{2}/2$ ) provided [strong / moderate / anecdotal] evidence for [the group effect / the null hypothesis], $BF_{10} =$ [value]."

13.5 Equivalence Testing for ANOVA

To positively establish that group means are negligibly different (equivalence), extend the TOST framework to ANOVA:

Step 1: Specify equivalence bounds $\Delta$ for all pairwise differences (e.g., $\Delta = 0.20 \times s_{pooled}$ corresponding to Cohen's $d = 0.20$ ).

Step 2: For each pair $(j, k)$ , conduct two one-sided tests:

$H_{01,jk}$ : $\mu_j - \mu_k \leq -\Delta$
$H_{02,jk}$ : $\mu_j - \mu_k \geq \Delta$

Step 3: Declare equivalence for pair $(j, k)$ when the 90% CI for $\mu_j - \mu_k$ falls entirely within $(-\Delta, \Delta)$ .

Apply Bonferroni correction across all $\binom{K}{2}$ pairs.

13.6 Robust ANOVA: Trimmed Means

Yuen's trimmed mean F-test (one-way version) uses $\alpha$ -trimmed means and Winsorised variances for each group:

$h_j = n_j - 2\lfloor\alpha n_j\rfloor$ (effective group size after trimming)

$F_{trim} = \frac{\sum_j h_j(\bar{x}_{t,j}-\bar{x}_{t,..})^2/(K-1)}{\sum_j W_j/(h_j(h_j-1))}$

Where $\bar{x}_{t,j}$ is the 20%-trimmed mean for group $j$ and $W_j$ is the Winsorised sum of squared deviations.

This test is substantially more powerful than Kruskal-Wallis for symmetric heavy-tailed distributions while maintaining nominal Type I error control.

13.7 Reporting One-Way ANOVA According to APA 7th Edition

Full minimum reporting set (APA 7th ed.):

Statement of which test (standard ANOVA or Welch's) and why.
Levene's test result.
$F(df_B, df_W) =$ [value], $p =$ [value].
$\omega^2 =$ [value] [95% CI: LB, UB].
Which effect size was computed ( $\omega^2$ not just "effect size").
Group means and SDs for all $K$ groups.
Post-hoc test results with adjusted p-values and $d_{jk}$ per pair.
95% CI for each significant pairwise mean difference.

14. Worked Examples

Example 1: Therapy Type on Depression — Standard One-Way ANOVA

A clinical researcher randomly assigns $N = 90$ participants to three therapy conditions: CBT ( $n_1 = 30$ ), Behavioural Activation (BA; $n_2 = 30$ ), or Waitlist Control (WL; $n_3 = 30$ ). Post-treatment PHQ-9 depression scores (0–27; lower = less depression) are the dependent variable.

Descriptive statistics:

Group	$n_j$	$\bar{x}_j$	$s_j$	$SE_j$
CBT	30	9.80	4.20	0.767
BA	30	11.40	4.60	0.840
WL	30	16.30	5.10	0.931

Assumption checks:

Shapiro-Wilk (residuals): $W = 0.976$ , $p = .342$ — normality not violated.

Levene's test: $F(2, 87) = 0.82$ , $p = .443$ — homogeneity of variance holds.

→ Standard one-way ANOVA is appropriate.

Step 1 — Grand mean:

$\bar{x}_{..} = (30\times9.80 + 30\times11.40 + 30\times16.30)/90 = (294+342+489)/90 = 1125/90 = 12.500$

Step 2 — Sums of squares:

$SS_{between} = 30[(9.80-12.50)^2 + (11.40-12.50)^2 + (16.30-12.50)^2]$

$= 30[7.290 + 1.210 + 14.440] = 30 \times 22.940 = 688.20$

$SS_{within} = (n_j-1)\sum_j s_j^2 = 29(17.640) + 29(21.160) + 29(26.010)$

$= 511.56 + 613.64 + 754.29 = 1879.49$

$SS_{total} = 688.20 + 1879.49 = 2567.69$

Step 3 — ANOVA source table:

Source	SS	$df$	MS	$F$	$p$
Between	$688.20$	$2$	$344.10$	$15.96$	$< .001$
Within	$1879.49$	$87$	$21.60$
Total	$2567.69$	$89$

Step 4 — Effect sizes:

$\eta^2 = 688.20/2567.69 = 0.268$

$\omega^2 = (688.20 - 2\times21.60)/(2567.69+21.60) = 645.00/2589.29 = 0.249$

$\varepsilon^2 = (688.20 - 2\times21.60)/2567.69 = 645.00/2567.69 = 0.251$

$f = \sqrt{0.249/0.751} = \sqrt{0.3316} = 0.576$

95% CI for $\omega^2$ (non-central F, $F_{obs} = 15.96$ , $df_1=2$ , $df_2=87$ ):

$\hat{\lambda} = (F-1)\times df_B = 14.96\times 2 = 29.92$

95% CI for $\lambda$ : $[13.78, 50.08]$ (DataStatPro numerical)

$\omega^2_L = 0.137$ ; $\omega^2_U = 0.360$

Step 5 — Post-hoc tests (Tukey HSD, balanced):

$q_{3,\;87,\;.05} = 3.369$ (studentised range)

$\text{HSD} = 3.369 \times \sqrt{21.60/30} = 3.369 \times 0.849 = 2.860$

Comparison	Difference	SE	$q$	$p_{adj}$	Cohen's $d_{jk}$	95% CI
CBT vs. BA	$1.600$	$0.849$	$1.884$	$.302$	$0.344$	$[-1.26, 4.46]$
CBT vs. WL	$6.500$	$0.849$	$7.656$	$< .001$	$1.399$	$[3.64, 9.36]$
BA vs. WL	$4.900$	$0.849$	$5.772$	$< .001$	$1.055$	$[2.04, 7.76]$

Where $d_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_W} = (\bar{x}_j-\bar{x}_k)/\sqrt{21.60} = (\bar{x}_j-\bar{x}_k)/4.648$

Interpretation:

Both active therapies (CBT and BA) significantly reduced depression compared to Waitlist Control ( $p < .001$ ). CBT and BA did not differ from each other ( $p = .302$ ).

APA write-up: "A one-way between-subjects ANOVA examined the effect of therapy type on PHQ-9 depression scores. Levene's test indicated homogeneity of variance ( $F(2, 87) = 0.82$ , $p = .443$ ). The ANOVA revealed a significant effect of therapy type, $F(2, 87) = 15.96$ , $p < .001$ , $\omega^2 = 0.249$ [95% CI: 0.137, 0.360], indicating a large effect. Tukey HSD post-hoc comparisons revealed that both CBT ( $M = 9.80$ , $SD = 4.20$ ) and Behavioural Activation ( $M = 11.40$ , $SD = 4.60$ ) produced significantly lower depression scores than the Waitlist Control ( $M = 16.30$ , $SD = 5.10$ ), $d_{CBT-WL} = 1.40$ [95% CI: 0.78, 2.01] and $d_{BA-WL} = 1.06$ [95% CI: 0.46, 1.65] (both $p < .001$ ). CBT and BA did not significantly differ, $d = 0.34$ [95% CI: $-$ 0.27, 0.95], $p = .302$ ."

Example 2: Welch's One-Way ANOVA — Reaction Time Across Sleep Conditions

A researcher compares simple reaction time (ms) across four sleep conditions: Normal (8h), Mild deprivation (6h), Moderate deprivation (4h), and Severe deprivation (2h), $n_j = 20$ per group ( $N = 80$ ).

Descriptive statistics:

Group	$n_j$	$\bar{x}_j$ (ms)	$s_j$ (ms)
Normal (8h)	20	241.3	18.4
Mild (6h)	20	268.7	24.1
Moderate (4h)	20	312.4	41.6
Severe (2h)	20	389.2	68.3

Assumption checks:

Shapiro-Wilk (residuals): $W = 0.958$ , $p = .009$ — mild non-normality (but $n_j = 20$ per group; CLT provides some protection).

Levene's test: $F(3, 76) = 12.84$ , $p < .001$ — significant heteroscedasticity.

→ Welch's one-way ANOVA is required.

Welch's F computation:

$w_j = n_j/s_j^2$ : $w_1 = 20/338.56 = 0.0591$ ; $w_2 = 20/580.81 = 0.0344$ ; $w_3 = 20/1730.56 = 0.0116$ ; $w_4 = 20/4664.89 = 0.0043$

$\sum w_j = 0.0591+0.0344+0.0116+0.0043 = 0.1094$

$\tilde{x} = (0.0591\times241.3+0.0344\times268.7+0.0116\times312.4+0.0043\times389.2)/0.1094$

$= (14.261+9.243+3.624+1.674)/0.1094 = 28.802/0.1094 = 263.3$

$F_W = \frac{\sum_j w_j(\bar{x}_j-\tilde{x})^2/(K-1)}{1+(2(K-2)/(K^2-1))\sum_j(1-w_j/\sum w_j)^2/(n_j-1)}$

Numerator: $\sum_j w_j(\bar{x}_j-\tilde{x})^2$ :

$= 0.0591(241.3-263.3)^2+0.0344(268.7-263.3)^2+0.0116(312.4-263.3)^2+0.0043(389.2-263.3)^2$

$= 0.0591(484)+0.0344(29.16)+0.0116(2410.81)+0.0043(15851.61)$

$= 28.604+1.003+27.965+68.162 = 125.734$

Numerator $/(K-1) = 125.734/3 = 41.911$

Denominator correction (computed by DataStatPro): $\approx 1.134$

$F_W = 41.911/1.134 = 36.96$ ; $\nu_W \approx 40.2$

$p < .001$

Effect size:

$\eta^2 = F_W\times(K-1)/(F_W\times(K-1)+\nu_W) = 36.96\times3/(36.96\times3+40.2) = 110.88/151.08 = 0.734$

$\omega^2 \approx (F_W-1)(K-1)/(F_W(K-1)+\nu_W+1) = 35.96\times3/(36.96\times3+40.2+1) = 107.88/152.08 = 0.709$

Games-Howell post-hoc tests:

Comparison	Diff (ms)	$SE_{GH}$	$p_{adj}$	$d_{jk}$
Normal vs. Mild	$27.4$	$7.29$	$.002$	$1.27$
Normal vs. Mod	$71.1$	$11.43$	$< .001$	$2.08$
Normal vs. Severe	$147.9$	$17.82$	$< .001$	$3.72$
Mild vs. Mod	$43.7$	$12.18$	$.004$	$1.28$
Mild vs. Severe	$120.5$	$18.39$	$< .001$	$2.82$
Mod vs. Severe	$76.8$	$20.25$	$.003$	$1.26$

All six pairwise comparisons are statistically significant, with very large effect sizes. Reaction time increases substantially at each stage of sleep deprivation.

APA write-up: "Levene's test indicated significant heterogeneity of variance ( $F(3, 76) = 12.84$ , $p < .001$ ); therefore, Welch's one-way ANOVA was applied. The test revealed a significant effect of sleep deprivation on reaction time, $F_W(3, 40.2) = 36.96$ , $p < .001$ , $\omega^2 = 0.709$ [95% CI: 0.581, 0.789], indicating a very large effect. Games-Howell post-hoc comparisons revealed that every level of sleep deprivation produced significantly longer reaction times than all others (all $p \leq .004$ ), with effect sizes ranging from large ( $d = 1.26$ ) to very large ( $d = 3.72$ )."

Example 3: Kruskal-Wallis — Pain Ratings Across Acupuncture Protocols

A researcher compares pain relief (NRS 0–10; ordinal) across five acupuncture protocol variants. Non-normality and ties make one-way ANOVA inappropriate.

$n_j = 12$ per group; $N = 60$ ; $K = 5$ .

Given: $H_c(4) = 18.34$ (tie-corrected Kruskal-Wallis H statistic).

p-value: $P(\chi^2_4 \geq 18.34) = .001$

Effect size:

$\eta^2_H = (18.34-5+1)/(60-5) = 14.34/55 = 0.261$

Large effect — acupuncture protocol explains approximately 26% of the rank variability.

Dunn post-hoc (Holm-corrected):

After Holm correction, Protocols 1 and 2 differ significantly from Protocols 4 and 5 ( $r_{rb}$ ranging from 0.48 to 0.73). Protocols 1 vs. 2 and 4 vs. 5 do not differ significantly.

APA write-up: "Due to ordinal measurement and non-normal distributions, a Kruskal-Wallis test was conducted. There was a significant difference in pain ratings across the five acupuncture protocols, $H(4) = 18.34$ , $p = .001$ , $\eta^2_H = 0.261$ [95% CI: 0.091, 0.421], indicating a large effect. Dunn's pairwise comparisons with Holm correction revealed that Protocols 1 and 2 produced significantly lower pain ratings than Protocols 4 and 5 (all $p < .05$ , $r_{rb}$ = 0.48–0.73)."

Example 4: Non-Significant Result with Sensitivity Analysis

An educational researcher tests whether three homework formats (Written, Digital, No Homework) affect standardised test scores in $n_j = 25$ students per group ( $N = 75$ ; $K = 3$ ).

Results: $F(2, 72) = 1.84$ , $p = .166$ , $\omega^2 = 0.021$ [95% CI: 0.000, 0.103].

Levene's test: $F(2, 72) = 0.61$ , $p = .547$ — variances equal.

Sensitivity analysis:

$f_{min} = \sqrt{7.849/75} = \sqrt{0.1047} = 0.323$

Corresponding $\omega^2_{min} = 0.323^2/(1+0.323^2) = 0.104/(1+0.104) = 0.094$

This study had 80% power to detect only medium-to-large effects ( $\omega^2 \geq 0.094$ ). The observed $\omega^2 = 0.021$ is a small effect well below this detection threshold.

APA write-up: "A one-way ANOVA revealed no significant effect of homework format on standardised test scores, $F(2, 72) = 1.84$ , $p = .166$ , $\omega^2 = 0.021$ [95% CI: 0.000, 0.103], indicating a very small and statistically non-significant effect. The study had 80% power to detect effects of $f \geq 0.32$ ( $\omega^2 \geq 0.094$ ); effects smaller than this threshold remain undetected. The observed $\omega^2 = 0.021$ is below this detection threshold, indicating the study was underpowered for the observed effect size."

15. Common Mistakes and How to Avoid Them

Mistake 1: Reporting $\eta^2$ as the Only Effect Size and Calling It Unbiased

Problem: Reporting $\eta^2 = 0.23$ as the effect size and implying it represents the true population proportion of variance explained. $\eta^2$ overestimates the population effect, sometimes substantially in small samples with few groups.

Solution: Report $\omega^2$ (or $\varepsilon^2$ ) as the primary effect size. If journals require $\eta^2$ (some do), report it alongside $\omega^2$ and label $\eta^2$ as a biased estimate. Always compute the 95% CI for $\omega^2$ using DataStatPro.

Mistake 2: Interpreting the Omnibus F Without Post-Hoc Tests

Problem: Reporting $F(3, 76) = 8.42$ , $p < .001$ and concluding "all groups differ significantly" or "Groups 1 and 4 differ based on their means" without conducting post-hoc tests. The omnibus F tells you only that at least one pair differs.

Solution: Always follow a significant omnibus F with appropriate post-hoc tests or planned contrasts. Report all pairwise comparisons with adjusted p-values and individual effect sizes $d_{jk}$ .

Mistake 3: Using Standard ANOVA When Variances Are Unequal

Problem: Running standard ANOVA without checking Levene's test, or ignoring a significant Levene's result, when group sizes are unequal. This produces inflated or deflated Type I error rates and untrustworthy p-values.

Solution: Always run Levene's test before deciding which ANOVA variant to use. When Levene's is significant (especially with unequal $n_j$ ), use Welch's ANOVA with Games-Howell post-hoc tests. Recommend setting DataStatPro to "Auto" mode, which applies Welch's ANOVA automatically when Levene's is significant.

Mistake 4: Running Multiple t-Tests Instead of ANOVA

Problem: Comparing three groups by running three separate pairwise t-tests without correction, inflating FWER to approximately 14%.

Solution: Use one-way ANOVA (or Welch's) for the omnibus test, followed by appropriate post-hoc tests or pre-planned contrasts. If pairwise comparisons are the primary interest, use Tukey HSD or Holm-Bonferroni corrections.

Mistake 5: Not Checking or Reporting Assumption Tests

Problem: Running ANOVA without checking normality and homoscedasticity, and reporting only the F-statistic and p-value without mentioning assumption checks. Readers cannot evaluate the validity of the results.

Solution: Always run and report Levene's test and Shapiro-Wilk (or Q-Q plot inspection). Report these results in the method or results section, and justify the test choice (standard vs. Welch's) based on the assumption check results.

Mistake 6: Using Fisher's LSD Without the Omnibus F Restriction

Problem: Applying Fisher's Least Significant Difference post-hoc test directly as a multiple comparison correction without first confirming the omnibus F is significant. Fisher's LSD does not adequately control FWER when $K > 3$ .

Solution: For $K = 3$ , Fisher's LSD is acceptable after a significant omnibus F (the "protected LSD"). For $K \geq 4$ , always use a proper FWER-controlling procedure (Tukey, Holm, Games-Howell). Never report Fisher's LSD without the omnibus F protection.

Mistake 7: Reporting Effect Sizes Without Confidence Intervals

Problem: Reporting $\omega^2 = 0.15$ without a CI. With moderate sample sizes, the CI for $\omega^2$ can be extremely wide, making the point estimate essentially uninformative about the true effect magnitude.

Solution: Always report the 95% CI for $\omega^2$ (available in DataStatPro via the non-central F-distribution). A point estimate without a CI gives a false sense of precision.

Mistake 8: Applying Post-Hoc Tests When the Omnibus F is Non-Significant

Problem: Running all pairwise post-hoc comparisons regardless of the omnibus F result, and selectively reporting those that happen to be significant. This is p-hacking and inflates the FWER.

Solution: When the omnibus F is non-significant, do not run post-hoc pairwise tests (except for pre-planned contrasts specified before data collection). Report the non-significant omnibus F alongside the effect size and sensitivity analysis, and acknowledge the study's power limitations.

Mistake 9: Confusing "Equal Sample Sizes" with "Equal Variances"

Problem: Assuming that because all groups have equal $n_j$ , the equal variances assumption is met. Equal sample sizes reduce the consequences of variance heterogeneity but do not eliminate it. Levene's test may still be significant with balanced designs.

Solution: Always run Levene's test regardless of balance. When Levene's is significant, use Welch's ANOVA even for balanced designs (the power loss is negligible).

Mistake 10: Neglecting to Report the Full Descriptive Statistics Table

Problem: Reporting only the ANOVA source table ( $F$ , df, $p$ ) without group means, SDs, and $n_j$ . Without descriptive statistics, the F-statistic is uninterpretable — readers cannot evaluate the direction, magnitude, or pattern of group differences.

Solution: Always include a descriptive statistics table with $n_j$ , $\bar{x}_j$ , $SD_j$ , and $SE_j$ (or 95% CI) for each group. Include a visualisation (raincloud plot or means plot with individual data) whenever possible.

16. Troubleshooting

Problem	Likely Cause	Solution
$F < 1.0$	$MS_{within} > MS_{between}$ ; group means very similar	Non-significant result; report $\omega^2 \approx 0$ ; inspect group means
$\omega^2$ or $\varepsilon^2$ is negative	True effect near zero; small sample; correction overshoots	Report as 0 (convention); note small effect; increase sample size
$\eta^2$ much larger than $\omega^2$	Small $N$ with $K$ groups; large bias correction	This is expected; always report $\omega^2$ as primary
Levene's test significant but $n_j$ are equal	Unequal variances exist but balanced design is partially protective	Still use Welch's ANOVA; equal $n_j$ reduces but does not eliminate the problem
Post-hoc tests show no significant pairs despite significant $F$	Effect is spread across many small pairwise differences	Report omnibus and acknowledge no single pair survives correction; consider planned contrasts
Shapiro-Wilk significant with large $n_j$ ( $> 50$ )	High power of normality test; minor deviations detected	With large $n_j$ , CLT protects the t-test; inspect Q-Q for severity; ANOVA likely valid
Games-Howell and Tukey HSD give contradictory results	Heterogeneous variances affecting the inference	Use Games-Howell when variances are unequal; report both and note discrepancy
Very large $F$ with very small $\omega^2$	Very large $N$ ; trivially small differences are statistically significant	Report effect size prominently; statistical significance ≠ practical significance
Kruskal-Wallis significant but all Dunn pairwise tests non-significant	Effect is distributed; Holm correction too conservative	Report all pairwise $z$ and $r_{rb}$ ; consider reporting without correction for planned pairs
95% CI for $\omega^2$ includes 0 despite significant $F$	Wide CI due to small $N$ or $K$ ; possible when $p$ is marginally significant	Report the wide CI; both values are correct; note limited precision
Welch's df is very small	Extreme variance heterogeneity with small $n_j$	Check data for errors; if genuine, use permutation ANOVA
One group has $n_j = 1$	ANOVA cannot estimate $s_j^2$ from a single observation	Collect more data; exclude the singleton group; use a different design
ANOVA gives different result from equivalent regression	Coding scheme issue (dummy vs. effect coding affects only interpretation, not F)	Verify coding; F-statistics should match; intercept and slopes will differ by coding
Post-hoc p-values are all exactly 1.0	Software error; all group means identical	Verify data; check for data entry errors

17. Quick Reference Cheat Sheet

Core One-Way ANOVA Formulas

Formula	Description
$\bar{x}_{..} = \sum_j n_j\bar{x}_j/N$	Grand mean (weighted)
$SS_B = \sum_j n_j(\bar{x}_j-\bar{x}_{..})^2$	Between-groups SS
$SS_W = \sum_j(n_j-1)s_j^2$	Within-groups SS
$SS_T = SS_B + SS_W$	Total SS
$df_B = K-1$ ; $df_W = N-K$ ; $df_T = N-1$	Degrees of freedom
$MS_B = SS_B/(K-1)$	Between-groups mean square
$MS_W = SS_W/(N-K)$	Within-groups mean square (error)
$F = MS_B/MS_W$	F-ratio
$s_{pooled} = \sqrt{MS_W}$	Pooled within-groups SD
$p = P(F_{K-1,\;N-K} \geq F_{obs})$	p-value

Effect Size Formulas

Formula	Description
$\eta^2 = SS_B/SS_T$	Eta squared (biased)
$\omega^2 = (SS_B-(K-1)MS_W)/(SS_T+MS_W)$	Omega squared (preferred)
$\varepsilon^2 = (SS_B-(K-1)MS_W)/SS_T$	Epsilon squared (alternative)
$f = \sqrt{\omega^2/(1-\omega^2)}$	Cohen's $f$ (from $\omega^2$ )
$\eta^2 = F\cdot df_B/(F\cdot df_B+df_W)$	$\eta^2$ from $F$
$\omega^2 \approx (F-1)df_B/(F\cdot df_B+df_W+1)$	$\omega^2$ from $F$ (approx.)
$d_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_W}$	Cohen's $d$ for pairwise
$g_{jk} = d_{jk}\times(1-3/(4(N-K)-1))$	Hedges' $g$ for pairwise

Welch's ANOVA Formulas

Formula	Description
$w_j = n_j/s_j^2$	Weight for group $j$
$\tilde{x} = \sum w_j\bar{x}_j/\sum w_j$	Weighted grand mean
$F_W = \text{(weighted SS)}/(K-1)/\text{correction}$	Welch's F (see Section 5.2)
$\nu_W = (K^2-1)/(3\sum(1-w_j/\sum w_j)^2/(n_j-1))$	Welch-Satterthwaite df

Kruskal-Wallis Formulas

Formula	Description
$W_j = \sum_{i=1}^{n_j} R_{ij}$	Rank sum for group $j$
$H = \frac{12}{N(N+1)}\sum_j W_j^2/n_j - 3(N+1)$	Kruskal-Wallis $H$
$H_c = H/(1-\sum_m(t_m^3-t_m)/(N^3-N))$	Tie-corrected $H$
$\eta^2_H = (H_c-K+1)/(N-K)$	Kruskal-Wallis effect size
$z_{jk} = (\bar{R}_j-\bar{R}_k)/SE_{jk}$	Dunn's test statistic
$r_{rb,jk} = 2z_{jk}/\sqrt{n_j+n_k}$	Rank-biserial $r$ (pairwise)

ANOVA Source Table Template

Source	SS	$df$	MS	$F$	$p$
Between groups	$SS_B$	$K-1$	$MS_B$	$MS_B/MS_W$	[value]
Within groups (Error)	$SS_W$	$N-K$	$MS_W$
Total	$SS_T$	$N-1$

One-Way ANOVA Reporting Checklist

Item	Required
$F$ -statistic with both df	✅ Always
Exact p-value (or $p < .001$ )	✅ Always
$\omega^2$ with 95% CI (primary effect size)	✅ Always
$\eta^2$ (labelled as biased)	✅ When journals require it
Which effect size was reported	✅ Always
Group means and SDs for all groups	✅ Always
Sample sizes per group	✅ Always
Levene's test result	✅ Always for independent designs
Whether standard or Welch's ANOVA was used	✅ Always
Shapiro-Wilk result (normality)	✅ When $n_j < 50$
Post-hoc test name and correction method	✅ When omnibus $F$ significant
All pairwise comparisons with adjusted $p$ and $d_{jk}$	✅ When omnibus $F$ significant
Planned contrast weights and rationale	✅ When planned contrasts used
$\varepsilon^2$ alongside $\omega^2$	✅ Recommended
Cohen's $f$ for power analysis reference	✅ When reporting power
95% CI for $\omega^2$ (via non-central $F$ )	✅ Always
95% CI for each pairwise mean difference	✅ Recommended
Sensitivity analysis (min detectable effect)	✅ For null results
Domain-specific benchmark context	✅ Recommended
Raincloud or violin plot	✅ Strongly recommended
Whether Games-Howell was used with Welch's	✅ When variances unequal
Descriptive statistics table	✅ Always

APA 7th Edition Reporting Templates

Standard One-Way ANOVA (significant result):

"A one-way between-subjects ANOVA was conducted to examine the effect of [IV] on [DV]. Levene's test indicated [equal / unequal] variances ( $F([df_1], [df_2]) =$ [value], $p =$ [value]). The ANOVA revealed a [significant / non-significant] effect of [IV], $F([df_B], [df_W]) =$ [value], $p =$ [value], $\omega^2 =$ [value] [95% CI: LB, UB], indicating a [small / medium / large] effect. [Post-hoc test name] pairwise comparisons revealed that [group pair(s)] differed significantly (all $p <$ [threshold after correction]). [Other pairs] did not differ significantly."

Welch's One-Way ANOVA:

"Due to significant heterogeneity of variance (Levene's $F([df_1], [df_2]) =$ [value], $p =$ [value]), Welch's one-way ANOVA was applied. The test revealed a [significant / non-significant] effect of [IV] on [DV], $F_W([K-1], [\nu_W]) =$ [value], $p =$ [value], $\omega^2 =$ [value] [95% CI: LB, UB]. Games-Howell post-hoc comparisons indicated that [describe pairwise results]."

Non-significant result with sensitivity analysis:

"A one-way ANOVA revealed no significant effect of [IV] on [DV], $F([df_B], [df_W]) =$ [value], $p =$ [value], $\omega^2 =$ [value] [95% CI: LB, UB]. Given the sample sizes ( $n_j =$ [value] per group), this study had power to detect effects of $f \geq$ [value] ( $\omega^2 \geq$ [value]) at 80% power. Effects smaller than this threshold may exist but remain undetected."

Kruskal-Wallis (non-parametric):

"Due to [non-normality / ordinal measurement], a Kruskal-Wallis test was conducted. The test revealed a [significant / non-significant] difference across groups, $H([K-1]) =$ [value], $p =$ [value], $\eta^2_H =$ [value]. Dunn's pairwise post-hoc comparisons with Holm correction indicated that [describe pairwise results]."

Conversion Formulas

From	To	Formula
$F$ , $df_B$ , $df_W$	$\eta^2$	$\eta^2 = F \cdot df_B / (F \cdot df_B + df_W)$
$F$ , $df_B$ , $df_W$	$\omega^2$ (approx.)	$\omega^2 \approx (F-1)\cdot df_B/(F\cdot df_B+df_W+1)$
$\eta^2$	$f$	$f = \sqrt{\eta^2/(1-\eta^2)}$
$\omega^2$	$f$	$f = \sqrt{\omega^2/(1-\omega^2)}$
$f$	$\eta^2$	$\eta^2 = f^2/(1+f^2)$
$\omega^2$	$\eta^2$	$\eta^2 = \omega^2 + \varepsilon_{bias}$ (always $\eta^2 > \omega^2$ )
$\eta^2$ (2 groups)	Cohen's $d$	$d = 2\sqrt{\eta^2/(1-\eta^2)}$
Cohen's $d$ (2 groups)	$\eta^2$	$\eta^2 = d^2/(d^2+4)$
$H$ (Kruskal-Wallis)	$\eta^2_H$	$\eta^2_H = (H-K+1)/(N-K)$
Pairwise $\bar{x}_j - \bar{x}_k$	$d_{jk}$	$d_{jk} = (\bar{x}_j-\bar{x}_k)/\sqrt{MS_{within}}$
$d_{jk}$	$g_{jk}$ (Hedges')	$g_{jk} = d_{jk} \times (1-3/(4(N-K)-1))$
$r_{rb}$ (Dunn)	$d$ (approx.)	$d \approx 2r_{rb}/\sqrt{1-r_{rb}^2}$

Required Sample Size per Group (80% Power, $\alpha = .05$ , Two-Sided)

Cohen's $f$	Label	$K = 3$	$K = 4$	$K = 5$	$K = 6$	$K = 8$
0.10	Small	322	274	240	215	180
0.15	Small-Med	144	123	107	96	80
0.25	Medium	52	45	39	35	29
0.35	Med-Large	27	23	21	19	16
0.40	Large	21	18	16	14	12
0.50	Large	14	12	11	10	8
0.60	Large	10	9	8	7	6
0.80	Very large	6	6	5	5	4

All values are $n$ per group. Total $N$ = $n \times K$ .

Cohen's Benchmarks — ANOVA Effect Sizes

Label	$\omega^2$	$\varepsilon^2$	$\eta^2$	$f$
Small	$0.01$	$0.01$	$0.01$	$0.10$
Medium	$0.06$	$0.06$	$0.06$	$0.25$
Large	$0.14$	$0.14$	$0.14$	$0.40$

Post-Hoc Test Selection Guide

Condition	Recommended Post-Hoc Test	Controls FWER
Balanced, equal variances	Tukey HSD	✅ Exactly
Unbalanced, equal variances	Tukey-Kramer	✅ Approximately
Unequal variances OR unequal $n_j$	Games-Howell	✅ Approximately
All groups vs. one control	Dunnett's	✅ Optimal
Any design, conservative	Bonferroni	✅ Conservative
Any design, less conservative	Holm-Bonferroni	✅ Sequential
All contrasts (not just pairwise)	Scheffé	✅ Most conservative
Non-parametric (Kruskal-Wallis)	Dunn + Holm	✅ Sequential

Degrees of Freedom Reference

Source	$df$	Notes
Between groups	$K - 1$	$K$ = number of groups
Within groups (Error)	$N - K$	$N$ = total observations
Total	$N - 1$
Welch's numerator	$K - 1$	Same as standard
Welch's denominator	$\nu_W$	Always $\leq N - K$
Planned contrast	$1$	Per orthogonal contrast

Assumption Checks Reference

Assumption	Test	Action if Violated
Normality of residuals	Shapiro-Wilk, Q-Q	Kruskal-Wallis; transform
Homogeneity of variance	Levene's, Brown-Forsythe	Welch's ANOVA + Games-Howell
Independence	Design review	Multilevel model
Outliers	Boxplots, $	z_i
Interval scale	Measurement theory	Kruskal-Wallis

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting One-Way ANOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for applied coverage; Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth; Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for robust alternatives including trimmed mean ANOVA; Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science" (Frontiers in Psychology, 2013) for the $\omega^2$ vs. $\eta^2$ discussion; Olejnik & Algina (2003) for generalised effect sizes; and Delacre, Lakens & Leys (2017) in the International Review of Social Psychology for the recommendation to default to Welch's ANOVA. For Bayesian ANOVA, see Rouder et al. (2012) in the Journal of Mathematical Psychology. For feature requests or support, contact the DataStatPro team.

One-Way ANOVA

One-Way ANOVA: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 The Logic of Comparing Groups

1.2 Why Not Multiple t-Tests?

1.3 Variance and Its Decomposition

1.4 The F-Distribution

1.5 The Expected Mean Squares

1.6 Statistical Significance vs. Practical Significance

1.7 The Relationship Between ANOVA and the t-Test

1.8 The Relationship Between ANOVA and Regression

2. What is a One-Way ANOVA?

2.1 The Core Idea

2.2 What One-Way ANOVA Tests and Does Not Test

2.3 Design Requirements

2.4 One-Way ANOVA in Context

2.5 Real-World Applications

3. The Mathematics Behind One-Way ANOVA

3.1 Notation

3.2 Sum of Squares Decomposition

3.3 Mean Squares and the F-Ratio

3.4 The ANOVA Source Table

3.5 Computing the Grand Mean and Group Means

3.6 The Pooled Standard Deviation

3.7 Computing SS from Summary Statistics

3.8 Computing ω2\omega^2ω2 and η2\eta^2η2 from F

3.9 The Non-Central F-Distribution and Exact CIs

4. Assumptions of One-Way ANOVA

4.1 Normality of Residuals (Within-Group Normality)

4.2 Homogeneity of Variance (Homoscedasticity)

4.3 Independence of Observations

4.4 Interval Scale of Measurement

4.5 Absence of Influential Outliers

4.6 Balanced vs. Unbalanced Designs

4.7 Assumption Summary Table

5. Variants of One-Way ANOVA

5.1 Standard One-Way ANOVA (Student's F-test)

5.2 Welch's One-Way ANOVA

5.3 Trimmed Mean ANOVA (Robust)

5.4 Brown-Forsythe F-test

5.5 Choosing Between Variants

6. Using the One-Way ANOVA Calculator Component

Step-by-Step Guide

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

Step 1 — State the Hypotheses

Step 2 — Compute Descriptive Statistics per Group

Step 3 — Compute the Grand Mean

Step 4 — Check Assumptions

Step 5 — Compute Sums of Squares

Step 6 — Compute Degrees of Freedom

Step 7 — Compute Mean Squares

Step 8 — Compute the F-Statistic and p-value

Step 9 — Compute Effect Sizes

Step 10 — Compute 95% CI for ω2\omega^2ω2

Step 11 — Conduct Post-Hoc Tests (if FFF significant)

Step 12 — Interpret and Report

8. Effect Sizes for One-Way ANOVA

8.1 Eta Squared (η2\eta^2η2) — Common but Biased

8.2 Omega Squared (ω2\omega^2ω2) — Preferred

8.3 Epsilon Squared (ε2\varepsilon^2ε2) — Alternative Correction

8.4 Cohen's fff — For Power Analysis

8.5 Comparison of Effect Size Estimates

8.6 Effect Sizes for Pairwise Comparisons

8.7 Omega Squared vs. Partial Omega Squared

9. Post-Hoc Tests and Planned Contrasts

9.1 The Need for Post-Hoc Testing

9.2 Tukey's HSD — Standard Pairwise Comparisons

9.3 Games-Howell — Unequal Variances

9.4 Bonferroni and Holm-Bonferroni Corrections

9.5 Dunnett's Test — All Groups vs. One Control

9.6 Planned Contrasts — A Priori Comparisons

10. Confidence Intervals

10.1 95% CI for Each Group Mean

10.2 95% CI for Pairwise Mean Differences

10.3 95% CI for ω2\omega^2ω2

11. Power Analysis and Sample Size Planning

11.1 A Priori Power Analysis

11.2 Determining fff from Prior Literature

3.8 Computing $\omega^2$ and $\eta^2$ from F

Step 10 — Compute 95% CI for $\omega^2$

Step 11 — Conduct Post-Hoc Tests (if $F$ significant)

8.1 Eta Squared ( $\eta^2$ ) — Common but Biased

8.2 Omega Squared ( $\omega^2$ ) — Preferred

8.3 Epsilon Squared ( $\varepsilon^2$ ) — Alternative Correction

8.4 Cohen's $f$ — For Power Analysis

10.3 95% CI for $\omega^2$

11.2 Determining $f$ from Prior Literature

Mistake 1: Reporting $\eta^2$ as the Only Effect Size and Calling It Unbiased

Required Sample Size per Group (80% Power, $\alpha = .05$ , Two-Sided)