One-Way ANOVA: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of variance decomposition all the way through the mathematics, assumptions, effect sizes, post-hoc testing, non-parametric alternatives, interpretation, reporting, and practical usage of the One-Way ANOVA within the DataStatPro application. Whether you are encountering ANOVA for the first time or seeking a rigorous, unified understanding of between-groups inference, this guide builds your knowledge systematically from the ground up.
Table of Contents
- Prerequisites and Background Concepts
- What is a One-Way ANOVA?
- The Mathematics Behind One-Way ANOVA
- Assumptions of One-Way ANOVA
- Variants of One-Way ANOVA
- Using the One-Way ANOVA Calculator Component
- Full Step-by-Step Procedure
- Effect Sizes for One-Way ANOVA
- Post-Hoc Tests and Planned Contrasts
- Confidence Intervals
- Power Analysis and Sample Size Planning
- Non-Parametric Alternative: Kruskal-Wallis Test
- Advanced Topics
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into One-Way ANOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.
1.1 The Logic of Comparing Groups
When we measure a continuous outcome across three or more independent groups, we ask: "Are the observed differences in group means larger than what we would expect from random sampling alone?" One-Way ANOVA answers this question by comparing two sources of variability:
- Between-groups variability: How much do the group means differ from each other? If groups have truly different population means, this variability should be large.
- Within-groups variability: How much do observations within each group vary around their own group mean? This reflects pure random (sampling) error.
If the between-groups variability is substantially larger than the within-groups variability, we conclude that the groups differ beyond what chance alone would produce.
1.2 Why Not Multiple t-Tests?
With groups, one could run all pairwise t-tests. However, this inflates the familywise error rate (FWER):
Where is the number of tests. For groups ( tests) at :
Over 26% chance of at least one false positive. The one-way ANOVA omnibus test maintains the FWER at exactly for the simultaneous test that all group means are equal.
1.3 Variance and Its Decomposition
The variance of a dataset measures the average squared deviation from the mean:
The key insight behind ANOVA is that total variance can be partitioned into meaningful components. For a one-way design:
Each sum of squares (SS), when divided by its degrees of freedom, becomes a mean square (MS) — a variance estimate. The ratio of these variance estimates is the F-statistic.
1.4 The F-Distribution
The F-distribution arises from the ratio of two independent chi-squared variates divided by their degrees of freedom. In ANOVA:
under
Properties of the F-distribution:
- Always non-negative (ratio of variances is always positive).
- Right-skewed; skewness decreases as df increase.
- Characterised by two df parameters: numerator () and denominator ().
- Under : (both MS estimate the same ).
- Under : (between-groups MS is inflated by true group differences).
1.5 The Expected Mean Squares
Understanding why the F-ratio works requires the expected values under and :
(always)
Under (all equal): , so .
Under : The second term is positive, so , giving .
The non-centrality parameter:
links the true population effect to the power of the test.
1.6 Statistical Significance vs. Practical Significance
Like the t-test, the F-test answers: "Is the result unlikely under ?" It does not answer: "How large is the effect?"
With very large , even trivially small group differences produce significant F-values. A study with participants across five groups might find , , while — a statistically significant but practically negligible effect.
Always report:
- The -statistic, degrees of freedom, and p-value.
- or with 95% CI (practical effect size).
- Group means and SDs.
- Post-hoc comparisons with individual effect sizes.
1.7 The Relationship Between ANOVA and the t-Test
When , the one-way ANOVA F-statistic is exactly the square of the independent samples t-statistic:
The p-values are identical (both two-tailed). ANOVA generalises the independent samples t-test to groups.
1.8 The Relationship Between ANOVA and Regression
ANOVA is a special case of the General Linear Model (GLM):
Where is the effect of group and (sum-to-zero constraint). In the regression framework, group membership is represented by dummy or effect-coded predictors. This equivalence is important because:
- ANOVA results can always be replicated using regression.
- Adding covariates (ANCOVA) is natural in the regression framework.
- Unbalanced designs are handled more flexibly in regression.
2. What is a One-Way ANOVA?
2.1 The Core Idea
One-Way Analysis of Variance (ANOVA) is a parametric inferential procedure for testing whether the means of three or more independent groups are simultaneously equal. It is called "one-way" because there is exactly one independent variable (IV) with levels, and "between-subjects" because different participants appear in each group.
The general omnibus null hypothesis:
2.2 What One-Way ANOVA Tests and Does Not Test
One-Way ANOVA tells you:
- Whether the observed group mean differences are larger than expected by chance (omnibus test).
- How much of the total outcome variance is explained by group membership (, ).
One-Way ANOVA does NOT tell you:
- Which specific groups differ from each other (requires post-hoc tests or planned contrasts).
- The direction or magnitude of specific pairwise differences.
- Whether the effect is practically meaningful (requires effect sizes with CIs).
2.3 Design Requirements
For one-way between-subjects ANOVA, the design must satisfy:
- One continuous DV (interval or ratio scale).
- One categorical IV with levels (groups).
- Different participants in each group (independent samples).
- Each participant contributes exactly one score to exactly one group.
2.4 One-Way ANOVA in Context
| Situation | Test |
|---|---|
| groups, independent, normal | Independent t-test (Welch's) |
| groups, independent, normal, equal variances | One-Way ANOVA |
| groups, independent, normal, unequal variances | Welch's One-Way ANOVA |
| groups, independent, non-normal or ordinal | Kruskal-Wallis test |
| conditions, same participants, normal | One-Way Repeated Measures ANOVA |
| conditions, same participants, non-normal | Friedman test |
| groups + covariate | ANCOVA |
| IVs, independent groups | Factorial between-subjects ANOVA |
2.5 Real-World Applications
| Field | Example Application | IV (Levels) | DV |
|---|---|---|---|
| Clinical Psychology | CBT vs. BA vs. Waitlist on depression | 3 therapy conditions | PHQ-9 |
| Education | Lecture vs. Flipped vs. Project-based on scores | 3 teaching methods | Exam % |
| Medicine | 4 drug dosages on blood pressure | 4 doses | Systolic BP |
| Marketing | 5 ad formats on purchase intent | 5 formats | Intent (0–100) |
| Neuroscience | 3 sleep conditions on cognitive performance | 3 conditions | Reaction time |
| Ecology | 4 habitats on species richness | 4 habitat types | Species count |
| HR/OB | 3 leadership styles on productivity | 3 styles | Units/hour |
| Nutrition | 5 diets on weight loss | 5 diet types | kg lost |
3. The Mathematics Behind One-Way ANOVA
3.1 Notation
| Symbol | Meaning |
|---|---|
| Number of groups | |
| Sample size in group | |
| Total sample size | |
| -th observation in group | |
| Mean of group | |
| Grand mean | |
| Variance of group |
3.2 Sum of Squares Decomposition
Total Sum of Squares — total variability in the data:
Between-Groups Sum of Squares — variability due to group membership:
Within-Groups Sum of Squares — variability within each group (pure error):
Verification: ;
3.3 Mean Squares and the F-Ratio
Between-groups mean square:
Within-groups mean square (pooled error variance):
Note: is the pooled estimate of the common population variance , assuming homogeneity of variance across groups.
The F-statistic:
Under :
p-value:
3.4 The ANOVA Source Table
| Source | SS | MS | |||
|---|---|---|---|---|---|
| Between groups | |||||
| Within groups (Error) | |||||
| Total |
3.5 Computing the Grand Mean and Group Means
Grand mean (weighted by group sizes):
For balanced designs (equal ):
3.6 The Pooled Standard Deviation
The pooled within-groups standard deviation is used for computing effect sizes for pairwise comparisons:
This is a weighted average of the group standard deviations, using degrees of freedom as weights.
3.7 Computing SS from Summary Statistics
When only group means, SDs, and are available:
3.8 Computing and from F
When only the ANOVA table is reported (useful for computing effect sizes from published results):
Eta squared from F:
Omega squared from F (approximate):
Exact omega squared from SS (preferred):
3.9 The Non-Central F-Distribution and Exact CIs
Under , the F-statistic follows a non-central F-distribution with non-centrality parameter :
Exact 95% CI for (via non-central F):
Find and such that:
Then convert to : ,
And then to using the bias correction. DataStatPro performs this numerical computation automatically.
4. Assumptions of One-Way ANOVA
4.1 Normality of Residuals (Within-Group Normality)
One-Way ANOVA assumes that within each population, the observations are normally distributed. Equivalently, the residuals should be normally distributed.
How to check:
- Shapiro-Wilk test on residuals (most powerful for per group; : residuals are normal). A significant result suggests departure.
- Shapiro-Wilk per group (preferred for small ): run separately for each group.
- Q-Q plot of all residuals: points should follow the diagonal reference line.
- Histograms per group: approximate bell shape.
- Skewness () and kurtosis () of residuals.
Robustness: ANOVA is remarkably robust to mild non-normality, especially when:
- Group sizes are equal (balanced design).
- – per group (CLT applies to group means).
- The departure is mild skewness rather than heavy tails with extreme outliers.
When violated: Use the Kruskal-Wallis test as a non-parametric alternative (Section 12). Consider Box-Cox data transformations (log, square root) for right-skewed data. Report trimmed mean ANOVA for heavy-tailed distributions.
4.2 Homogeneity of Variance (Homoscedasticity)
One-Way ANOVA assumes all population variances are equal:
This assumption is required for to serve as a valid pooled estimate of the common error variance .
How to check:
- Levene's test (preferred — robust to non-normality): Tests . A significant result indicates heteroscedasticity.
- Brown-Forsythe test (more robust than Levene's for non-normal data, uses group medians rather than means).
- Bartlett's test (powerful but sensitive to non-normality — avoid for non-normal data).
- Variance ratio rule of thumb: If , heterogeneity is potentially problematic, especially with unequal .
Robustness: ANOVA is relatively robust to heteroscedasticity when:
- Group sizes are equal ( all equal) — equal is a strong protective factor.
- The larger variance is associated with the larger group (liberal; Type I error slightly inflated but manageable).
When are unequal AND variances are unequal, ANOVA Type I error can be severely inflated (or deflated depending on the pattern).
When violated: Use Welch's one-way ANOVA (Section 5), which does not assume equal variances. Follow with Games-Howell pairwise comparisons.
4.3 Independence of Observations
All observations must be independent of each other, both within and across groups. This is a design assumption — it cannot be tested statistically from the data.
Common violations:
- Participants from the same family, classroom, or hospital ward.
- Multiple measurements from the same participant treated as independent.
- Time series data with autocorrelated errors.
- Social networks where participants influence each other's scores.
When violated: Use multilevel models (participants nested within clusters), repeated measures ANOVA (repeated observations within participants), or time-series methods.
4.4 Interval Scale of Measurement
The dependent variable must be measured on at least an interval scale — equal spacing between values. Difference scores must be meaningful.
When violated: Use the Kruskal-Wallis test (for ordinal data), or ordinal regression for ordered categorical outcomes.
4.5 Absence of Influential Outliers
Extreme outliers distort both and , producing unreliable F-statistics.
How to check:
- Boxplots per group: values beyond from the quartile are mild outliers; beyond are extreme.
- Standardised residuals: flags potential outliers.
- Studentised deleted residuals from the ANOVA model: .
When outliers present: Investigate the cause (data entry error? legitimate extreme score?). Report analyses with and without outliers. Consider the Kruskal-Wallis test or trimmed mean ANOVA as robust alternatives.
4.6 Balanced vs. Unbalanced Designs
While equal group sizes ( all equal, balanced design) are not formally required, they are strongly preferred because:
- ANOVA is more robust to normality and homoscedasticity violations with equal .
- Statistical power is maximised for a given total .
- Effect size estimation is less biased.
Unbalanced designs (unequal ) are common in observational research. They require careful attention to variance heterogeneity and post-hoc test selection.
4.7 Assumption Summary Table
| Assumption | Description | How to Check | Remedy if Violated |
|---|---|---|---|
| Normality | Residuals within groups | Shapiro-Wilk, Q-Q plot | Kruskal-Wallis; transform |
| Homoscedasticity | Levene's, Brown-Forsythe | Welch's ANOVA + Games-Howell | |
| Independence | Observations independent within and across groups | Design review | Multilevel model |
| Interval scale | DV has equal-interval properties | Measurement theory | Kruskal-Wallis |
| No outliers | No extreme influential values | Boxplots, standardised residuals | Investigate; Kruskal-Wallis |
5. Variants of One-Way ANOVA
5.1 Standard One-Way ANOVA (Student's F-test)
The default one-way ANOVA assuming equal variances across groups. Uses the pooled as the error term. This is appropriate when Levene's test is non-significant AND group sizes are approximately equal.
5.2 Welch's One-Way ANOVA
Welch's F-test (1951) is the recommended default for one-way between-subjects ANOVA. It does not assume homogeneity of variance. The statistic:
Where and (weighted grand mean).
Welch-Satterthwaite df:
Post-hoc: Use Games-Howell pairwise comparisons when Welch's ANOVA is significant.
💡 Just as Welch's t-test is the recommended default for two groups, Welch's one-way ANOVA is increasingly recommended as the default for independent groups. The power loss when variances are truly equal is negligible, while the Type I error protection when variances differ is substantial. DataStatPro reports both standard and Welch's ANOVA by default.
5.3 Trimmed Mean ANOVA (Robust)
Trimmed mean ANOVA (Wilcox, 2017) replaces standard means with -trimmed means (e.g., 20% trimming from each tail). It is substantially more powerful than the Kruskal-Wallis test for symmetric heavy-tailed distributions while controlling Type I error under non-normality. Available in DataStatPro under "Robust ANOVA."
5.4 Brown-Forsythe F-test
An alternative to Welch's ANOVA that is more robust to certain distributional departures. Uses median-centred deviations for variance estimation. DataStatPro provides this as an additional output alongside Welch's F.
5.5 Choosing Between Variants
| Condition | Recommended Test |
|---|---|
| Normal data, equal variances, balanced | Standard ANOVA (or Welch's — nearly identical) |
| Normal data, unequal variances OR unequal | Welch's ANOVA (recommended default) |
| Mildly non-normal, large | Either standard or Welch's |
| Non-normal, small | Kruskal-Wallis or Trimmed Mean ANOVA |
| Severely non-normal with outliers | Kruskal-Wallis |
| Ordinal DV | Kruskal-Wallis |
6. Using the One-Way ANOVA Calculator Component
The One-Way ANOVA Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting one-way ANOVA and its alternatives.
Step-by-Step Guide
Step 1 — Select "One-Way Between-Subjects ANOVA"
From the "Test Type" dropdown, choose:
- One-Way ANOVA (Standard): Equal variance assumption; Student's F.
- One-Way ANOVA (Welch's): Recommended default; no equal variance assumption.
- One-Way ANOVA (Auto): Runs both; uses Welch's when Levene's is significant.
- Kruskal-Wallis Test: Non-parametric alternative.
Step 2 — Input Method
Choose how to provide the data:
- Raw data (long format): Two columns — one for the DV values, one for group membership. DataStatPro computes all statistics, runs all assumption checks, and generates full output automatically.
- Raw data (wide format): One column per group. DataStatPro converts to long format.
- Summary statistics: Enter , , , and for each group. Full assumption checks are not available; all inferential statistics and effect sizes are computed.
- ANOVA table values: Enter , , , (from a published paper) to compute effect sizes, power, and CIs.
Step 3 — Specify Group Labels
Enter descriptive names for each group (e.g., "CBT," "BA," "Waitlist"). These labels appear in all output tables, plots, and the auto-generated APA paragraph.
Step 4 — Select Assumption Checks
DataStatPro automatically runs and displays:
- ✅ Shapiro-Wilk normality test on residuals (and per group for small ).
- ✅ Levene's test for homogeneity of variance.
- ✅ Brown-Forsythe test for homogeneity of variance (alongside Levene's).
- ✅ Boxplots per group for outlier detection.
- ✅ Q-Q plot of residuals for normality assessment.
- ✅ Variance ratio () with warning if .
Step 5 — Select Post-Hoc Tests
When the omnibus F is significant, select post-hoc tests:
- Tukey HSD (default for balanced, equal variances).
- Tukey-Kramer (unbalanced, equal variances).
- Games-Howell (recommended when Levene's significant or Welch's used).
- Bonferroni (conservative; any design).
- Holm-Bonferroni (less conservative than Bonferroni).
- Dunnett's (all vs. one control group).
- Scheffé (all possible contrasts).
- Custom planned contrasts (specify contrast weights ).
Step 6 — Select Effect Sizes
- ✅ (bias-corrected; primary).
- ✅ (alternative bias correction).
- ✅ (biased; provided for comparison and journal requirements).
- ✅ Cohen's (for power analysis).
- ✅ 95% CIs for and via non-central F-distribution.
- ✅ Cohen's with 95% CI for each post-hoc pairwise comparison.
Step 7 — Select Display Options
- ✅ Full ANOVA source table with , df, .
- ✅ Descriptive statistics table (, , , , 95% CI per group).
- ✅ Effect size table (, , , ) with CIs.
- ✅ Post-hoc comparison table (all pairs: difference, , adjusted , , 95% CI for difference).
- ✅ Assumption test results panel (colour-coded: green/yellow/red).
- ✅ Raincloud plot per group (half violin + boxplot + raw data points).
- ✅ Means plot with 95% CIs and individual data points.
- ✅ Cohen's diagram for each significant pairwise comparison.
- ✅ Power curve: power vs. for observed .
- ✅ APA 7th edition-compliant results paragraph (auto-generated).
Step 8 — Run the Analysis
Click "Run One-Way ANOVA". DataStatPro will:
- Compute the full ANOVA source table.
- Run all assumption tests and display colour-coded warnings.
- Automatically switch to Welch's ANOVA if Levene's test is significant (when "Auto" selected).
- Compute all effect sizes with exact non-central F-based CIs.
- Run all selected post-hoc tests with adjusted p-values and individual .
- Generate all visualisations.
- Auto-generate the APA-compliant results paragraph.
7. Full Step-by-Step Procedure
7.1 Complete Computational Procedure
This section walks through every computational step for one-way ANOVA, from raw data to a complete APA-style conclusion.
Given: groups with observations for and . Total .
Step 1 — State the Hypotheses
At least one pair for
Choose (default: ).
Step 2 — Compute Descriptive Statistics per Group
For each group :
Step 3 — Compute the Grand Mean
Step 4 — Check Assumptions
Normality: Run Shapiro-Wilk on residuals . If and : consider Kruskal-Wallis.
Homoscedasticity: Run Levene's test. If : use Welch's ANOVA.
Outliers: Inspect boxplots and standardised residuals.
Step 5 — Compute Sums of Squares
Verification: can also be computed directly as — both must agree.
Step 6 — Compute Degrees of Freedom
Step 7 — Compute Mean Squares
Step 8 — Compute the F-Statistic and p-value
Reject if .
Step 9 — Compute Effect Sizes
Eta squared (biased):
Omega squared (preferred, bias-corrected):
Epsilon squared (alternative correction):
Cohen's :
Step 10 — Compute 95% CI for
Using the non-central F-distribution (computed numerically by DataStatPro).
Non-centrality parameter:
Find , such that for each bound.
(approximate)
(approximate)
Step 11 — Conduct Post-Hoc Tests (if significant)
Select the appropriate post-hoc test (Section 9). Compute pairwise differences, standard errors, adjusted p-values, and individual Cohen's for each pair.
Step 12 — Interpret and Report
Combine all results into an APA-compliant report (Section 13.7).
8. Effect Sizes for One-Way ANOVA
8.1 Eta Squared () — Common but Biased
is the proportion of total sample variance explained by group membership. It is the most commonly reported ANOVA effect size and appears as default output in SPSS.
Critical limitation: is positively biased — it systematically overestimates the true population effect size, especially in small samples and when is large relative to . The bias magnitude:
For , : Bias — can be several percentage points. For , : Bias — negligible.
⚠️ Report only when explicitly required by a journal or for historical comparison. Always report (or ) as the primary effect size and label as "biased" in your manuscript.
8.2 Omega Squared () — Preferred
is a bias-corrected estimate of the population proportion of variance explained by the IV. It is the recommended primary effect size for one-way ANOVA.
Properties:
- Can be slightly negative in small samples when the true effect is zero or near zero — because the correction overshoots. Report as 0 by convention when negative.
- Always (the correction always reduces or maintains the estimate).
- Converges to as .
- The population parameter estimated:
From F-statistic and df (approximate):
8.3 Epsilon Squared () — Alternative Correction
uses the same numerator as but divides by instead of .
Properties:
- Always between and : .
- Slightly less bias-correction than .
- Computationally simpler than (no addition of in denominator).
- Increasingly reported alongside in recent literature.
8.4 Cohen's — For Power Analysis
or
Cohen's is used as the effect size input for ANOVA power analysis. It represents the ratio of between-groups SD to within-groups SD.
From group means and (when population parameters are known):
Where is the SD of the group means.
Benchmarks: Small = 0.10, Medium = 0.25, Large = 0.40 (Cohen, 1988).
8.5 Comparison of Effect Size Estimates
For a dataset with , (15 per group), and :
(requires full SS values)
The difference (almost 4 percentage points of overestimation from ) — substantial and worth correcting.
8.6 Effect Sizes for Pairwise Comparisons
After the omnibus F-test, report individual effect sizes for each significant pairwise comparison using as the standardiser:
Cohen's (using pooled within-groups SD):
Hedges' (bias-corrected):
Using from the full ANOVA model (rather than just the two-group pooled SD) is preferred because it is based on all groups and is therefore a more stable estimate of the common population SD.
95% CI for the pairwise mean difference:
8.7 Omega Squared vs. Partial Omega Squared
In one-way ANOVA with a single IV:
(they are identical for one-factor designs)
(they are identical for one-factor designs)
The partial and non-partial versions diverge only in factorial (multi-factor) designs.
9. Post-Hoc Tests and Planned Contrasts
9.1 The Need for Post-Hoc Testing
A significant omnibus F-test establishes that at least one group mean differs from the others. Post-hoc tests (also called multiple comparison procedures) identify which specific pairs of groups differ, while controlling the FWER.
The key trade-off: Controlling the FWER requires more conservative critical values, which reduces power for individual comparisons. The choice of post-hoc test involves balancing Type I error control and statistical power.
9.2 Tukey's HSD — Standard Pairwise Comparisons
Tukey's Honestly Significant Difference (HSD) is the most widely used post-hoc test for balanced designs with equal variances. It controls the FWER at exactly for all pairwise comparisons simultaneously.
Critical value: The studentised range distribution .
Minimum Significant Difference:
(balanced)
For unequal group sizes (Tukey-Kramer):
Declare groups and significantly different if .
95% CI for pairwise difference :
9.3 Games-Howell — Unequal Variances
When Levene's test is significant or Welch's ANOVA is used, Games-Howell is the recommended post-hoc procedure. It uses separate variance estimates per pair:
Compared against the studentised range distribution with Welch-Satterthwaite df:
9.4 Bonferroni and Holm-Bonferroni Corrections
Bonferroni correction (simplest, most conservative):
Compare each pairwise p-value to where .
Holm-Bonferroni sequential procedure (less conservative, same FWER control):
- Sort the p-values: .
- Compare to .
- Reject all for which for all .
Holm-Bonferroni is uniformly more powerful than Bonferroni and should be preferred in all cases.
9.5 Dunnett's Test — All Groups vs. One Control
When comparing experimental groups to a single control group (and not making comparisons among experimental groups), Dunnett's test provides optimal power while controlling FWER.
comparisons (each experimental group vs. control)
Compared against Dunnett's distribution (not the studentised range) with parameters .
9.6 Planned Contrasts — A Priori Comparisons
Planned contrasts are specific, theoretically motivated comparisons formulated before data collection. They are more powerful than post-hoc tests because:
- They do not require a significant omnibus F-test (though conducting the F-test first is still recommended).
- Fewer comparisons means less severe FWER correction.
- Orthogonal planned contrasts do not require any correction.
Contrast specification: A contrast is a linear combination with the constraint .
Contrast SS and F:
, compared to
Orthogonality condition (two contrasts and are orthogonal if):
A set of mutually orthogonal contrasts fully partitions :
Example for groups (Control, Drug A, Drug B, Drug C):
| Contrast | Comparison | ||||
|---|---|---|---|---|---|
| Control vs. all drugs | |||||
| Drug A vs. Drugs B and C | |||||
| Drug B vs. Drug C |
These three contrasts are mutually orthogonal (for equal ) and decompose into three orthogonal components — no FWER correction needed.
10. Confidence Intervals
10.1 95% CI for Each Group Mean
The 95% CI for the population mean :
Note: This CI uses from the full ANOVA model (not ), producing a more stable estimate that borrows strength from all groups (valid under homoscedasticity).
10.2 95% CI for Pairwise Mean Differences
The 95% CI for :
Using Tukey-adjusted critical values (simultaneous 95% CI for all pairs):
Tukey-adjusted CIs are wider but simultaneously valid for all pairs.
10.3 95% CI for
The exact CI uses the non-central F-distribution (DataStatPro computes this numerically). The CI communicates the precision of the effect size estimate and is required for complete reporting.
CI width as a function of and (for ):
| () | Approx. CI Width for | Precision |
|---|---|---|
| 30 | 0.24 | Very low |
| 60 | 0.17 | Low |
| 90 | 0.14 | Moderate |
| 150 | 0.11 | Good |
| 300 | 0.08 | High |
| 600 | 0.05 | Very high |
⚠️ With only 30 participants (/group), the 95% CI for spans approximately — essentially uninformative about the true effect magnitude. Always report the CI alongside the point estimate.
11. Power Analysis and Sample Size Planning
11.1 A Priori Power Analysis
A priori power analysis determines the required sample size before data collection to achieve desired power at significance level for a hypothesised effect of size .
Non-centrality parameter: (balanced design)
Power computation (exact, using non-central F):
Where and .
No closed form exists — DataStatPro uses numerical integration of the non-central F-distribution.
Approximate per group:
Required per group for 80% power (, two-sided):
| 0.10 | 0.010 | 322 | 274 | 240 | 215 |
| 0.15 | 0.022 | 144 | 123 | 107 | 96 |
| 0.20 | 0.038 | 82 | 70 | 61 | 55 |
| 0.25 | 0.059 | 52 | 45 | 39 | 35 |
| 0.30 | 0.082 | 37 | 32 | 28 | 25 |
| 0.40 | 0.138 | 21 | 18 | 16 | 14 |
| 0.50 | 0.200 | 14 | 12 | 11 | 10 |
| 0.60 | 0.265 | 10 | 9 | 8 | 7 |
11.2 Determining from Prior Literature
When prior studies report or :
When prior studies report group means and a common SD estimate:
When only a t-statistic from a pilot study with two groups is available:
(approximate, for a two-group pilot)
11.3 Sensitivity Analysis
The minimum detectable for a given , , and target power:
For total ( per group, ):
This study can reliably detect only medium-to-large effects (, ). Smaller effects may exist but would be missed with 80% power.
⚠️ Report sensitivity analysis for null or inconclusive results. Do not use "observed power" (power computed from the observed effect size) — this is circular and provides no additional information beyond the p-value.
11.4 Planning for Specific Group Contrasts
When the primary research interest is in a specific planned contrast (rather than the omnibus F-test), power analysis should target that contrast:
For a contrast with :
Power for this contrast uses and non-centrality . Power for planned contrasts is higher than for the omnibus F-test for the same data.
12. Non-Parametric Alternative: Kruskal-Wallis Test
12.1 When to Use the Kruskal-Wallis Test
The Kruskal-Wallis H test is the appropriate alternative to one-way ANOVA when:
- Data are ordinal (e.g., Likert items, ranked outcomes).
- Data are continuous but severely non-normally distributed with small .
- There are extreme outliers that cannot be explained or removed.
- The homogeneity of variance assumption is severely violated and Welch's ANOVA is not adequate.
12.2 The Kruskal-Wallis Procedure
Step 1 — Rank all observations:
Combine all observations and rank from 1 (smallest) to (largest). Assign average (mid)ranks for tied values.
Step 2 — Compute rank sums per group:
(sum of ranks for group )
Step 3 — Compute the H statistic:
Tie correction:
Where = number of observations in the -th tied group.
Step 4 — p-value:
For small samples with few groups: use exact tables. For : approximately.
Step 5 — Effect size ():
Or equivalently:
Cohen's benchmarks apply: small = .01, medium = .06, large = .14.
12.3 Post-Hoc Tests for Kruskal-Wallis
When is significant, conduct pairwise comparisons using the Dunn test with Holm-Bonferroni correction:
Where and are the mean ranks for groups and .
Effect size for each pairwise comparison (rank-biserial ):
12.4 Asymptotic Relative Efficiency
For normal data, the Kruskal-Wallis test has ARE relative to the one-way ANOVA — a negligible efficiency loss. For non-normal data (especially heavy-tailed distributions), the Kruskal-Wallis test can be substantially more powerful than the F-test.
13. Advanced Topics
13.1 ANOVA as a Linear Model
One-way ANOVA is a special case of linear regression with effect-coded (or dummy-coded) predictors. For groups using effect coding:
Where:
- if group 1, if group 3, otherwise.
- if group 2, if group 3, otherwise.
- (grand mean under effect coding).
- (effect of group 1 relative to grand mean).
- (effect of group 2 relative to grand mean).
The -statistic for the regression model equals the ANOVA -statistic. This equivalence allows ANOVA to be computed using any regression software.
13.2 Trend Analysis for Ordered Groups
When the levels represent an ordered quantitative variable (e.g., dose: 0, 10, 20, 40 mg), polynomial trend analysis is more informative than omnibus F and pairwise comparisons. Orthogonal polynomial contrasts test:
- Linear trend: Do means increase (or decrease) monotonically?
- Quadratic trend: Do means follow a U-shape or inverted-U?
- Cubic trend: Is there an S-shaped pattern?
Orthogonal polynomial coefficients for equally spaced groups:
| Trend | ||||
|---|---|---|---|---|
| Linear | ||||
| Quadratic | ||||
| Cubic |
Each trend contrast has and specified value from tables.
The three trend SS sum to , fully partitioning the between-groups variance.
13.3 Dealing with Unequal Sample Sizes
In unbalanced designs, the grand mean is the weighted (not simple) average of group means. Several practical considerations:
- Power is maximised when group sizes are equal. Any deviation from balance reduces total power.
- Levene's test becomes critical: unequal combined with unequal variances is the most damaging violation.
- Post-hoc tests: Use Tukey-Kramer (not Tukey HSD) for unbalanced designs with equal variances; Games-Howell for unequal variances.
- Contrast tests: Divide contrast coefficients by group sizes: for orthogonality with unequal .
13.4 Bayesian One-Way ANOVA
Bayesian ANOVA (Rouder et al., 2012) computes Bayes Factors comparing models:
The prior on standardised group effects under uses a Cauchy distribution with scale (default "medium" effect prior). DataStatPro computes this via the BayesFactor method.
Advantages:
- Quantifies evidence for (group equality), not just failure to reject it.
- Valid for sequential (interim) analyses.
- Provides posterior distributions for group effects.
Reporting: "A Bayesian one-way ANOVA (Cauchy prior, ) provided [strong / moderate / anecdotal] evidence for [the group effect / the null hypothesis], [value]."
13.5 Equivalence Testing for ANOVA
To positively establish that group means are negligibly different (equivalence), extend the TOST framework to ANOVA:
Step 1: Specify equivalence bounds for all pairwise differences (e.g., corresponding to Cohen's ).
Step 2: For each pair , conduct two one-sided tests:
- :
- :
Step 3: Declare equivalence for pair when the 90% CI for falls entirely within .
Apply Bonferroni correction across all pairs.
13.6 Robust ANOVA: Trimmed Means
Yuen's trimmed mean F-test (one-way version) uses -trimmed means and Winsorised variances for each group:
(effective group size after trimming)
Where is the 20%-trimmed mean for group and is the Winsorised sum of squared deviations.
This test is substantially more powerful than Kruskal-Wallis for symmetric heavy-tailed distributions while maintaining nominal Type I error control.
13.7 Reporting One-Way ANOVA According to APA 7th Edition
Full minimum reporting set (APA 7th ed.):
- Statement of which test (standard ANOVA or Welch's) and why.
- Levene's test result.
- [value], [value].
- [value] [95% CI: LB, UB].
- Which effect size was computed ( not just "effect size").
- Group means and SDs for all groups.
- Post-hoc test results with adjusted p-values and per pair.
- 95% CI for each significant pairwise mean difference.
14. Worked Examples
Example 1: Therapy Type on Depression — Standard One-Way ANOVA
A clinical researcher randomly assigns participants to three therapy conditions: CBT (), Behavioural Activation (BA; ), or Waitlist Control (WL; ). Post-treatment PHQ-9 depression scores (0–27; lower = less depression) are the dependent variable.
Descriptive statistics:
| Group | ||||
|---|---|---|---|---|
| CBT | 30 | 9.80 | 4.20 | 0.767 |
| BA | 30 | 11.40 | 4.60 | 0.840 |
| WL | 30 | 16.30 | 5.10 | 0.931 |
Assumption checks:
Shapiro-Wilk (residuals): , — normality not violated.
Levene's test: , — homogeneity of variance holds.
→ Standard one-way ANOVA is appropriate.
Step 1 — Grand mean:
Step 2 — Sums of squares:
Step 3 — ANOVA source table:
| Source | SS | MS | |||
|---|---|---|---|---|---|
| Between | |||||
| Within | |||||
| Total |
Step 4 — Effect sizes:
95% CI for (non-central F, , , ):
95% CI for : (DataStatPro numerical)
;
Step 5 — Post-hoc tests (Tukey HSD, balanced):
(studentised range)
| Comparison | Difference | SE | Cohen's | 95% CI | ||
|---|---|---|---|---|---|---|
| CBT vs. BA | ||||||
| CBT vs. WL | ||||||
| BA vs. WL |
Where
Interpretation:
Both active therapies (CBT and BA) significantly reduced depression compared to Waitlist Control (). CBT and BA did not differ from each other ().
APA write-up: "A one-way between-subjects ANOVA examined the effect of therapy type on PHQ-9 depression scores. Levene's test indicated homogeneity of variance (, ). The ANOVA revealed a significant effect of therapy type, , , [95% CI: 0.137, 0.360], indicating a large effect. Tukey HSD post-hoc comparisons revealed that both CBT (, ) and Behavioural Activation (, ) produced significantly lower depression scores than the Waitlist Control (, ), [95% CI: 0.78, 2.01] and [95% CI: 0.46, 1.65] (both ). CBT and BA did not significantly differ, [95% CI: 0.27, 0.95], ."
Example 2: Welch's One-Way ANOVA — Reaction Time Across Sleep Conditions
A researcher compares simple reaction time (ms) across four sleep conditions: Normal (8h), Mild deprivation (6h), Moderate deprivation (4h), and Severe deprivation (2h), per group ().
Descriptive statistics:
| Group | (ms) | (ms) | |
|---|---|---|---|
| Normal (8h) | 20 | 241.3 | 18.4 |
| Mild (6h) | 20 | 268.7 | 24.1 |
| Moderate (4h) | 20 | 312.4 | 41.6 |
| Severe (2h) | 20 | 389.2 | 68.3 |
Assumption checks:
Shapiro-Wilk (residuals): , — mild non-normality (but per group; CLT provides some protection).
Levene's test: , — significant heteroscedasticity.
→ Welch's one-way ANOVA is required.
Welch's F computation:
: ; ; ;
Numerator: :
Numerator
Denominator correction (computed by DataStatPro):
;
Effect size:
Games-Howell post-hoc tests:
| Comparison | Diff (ms) | |||
|---|---|---|---|---|
| Normal vs. Mild | ||||
| Normal vs. Mod | ||||
| Normal vs. Severe | ||||
| Mild vs. Mod | ||||
| Mild vs. Severe | ||||
| Mod vs. Severe |
All six pairwise comparisons are statistically significant, with very large effect sizes. Reaction time increases substantially at each stage of sleep deprivation.
APA write-up: "Levene's test indicated significant heterogeneity of variance (, ); therefore, Welch's one-way ANOVA was applied. The test revealed a significant effect of sleep deprivation on reaction time, , , [95% CI: 0.581, 0.789], indicating a very large effect. Games-Howell post-hoc comparisons revealed that every level of sleep deprivation produced significantly longer reaction times than all others (all ), with effect sizes ranging from large () to very large ()."
Example 3: Kruskal-Wallis — Pain Ratings Across Acupuncture Protocols
A researcher compares pain relief (NRS 0–10; ordinal) across five acupuncture protocol variants. Non-normality and ties make one-way ANOVA inappropriate.
per group; ; .
Given: (tie-corrected Kruskal-Wallis H statistic).
p-value:
Effect size:
Large effect — acupuncture protocol explains approximately 26% of the rank variability.
Dunn post-hoc (Holm-corrected):
After Holm correction, Protocols 1 and 2 differ significantly from Protocols 4 and 5 ( ranging from 0.48 to 0.73). Protocols 1 vs. 2 and 4 vs. 5 do not differ significantly.
APA write-up: "Due to ordinal measurement and non-normal distributions, a Kruskal-Wallis test was conducted. There was a significant difference in pain ratings across the five acupuncture protocols, , , [95% CI: 0.091, 0.421], indicating a large effect. Dunn's pairwise comparisons with Holm correction revealed that Protocols 1 and 2 produced significantly lower pain ratings than Protocols 4 and 5 (all , = 0.48–0.73)."
Example 4: Non-Significant Result with Sensitivity Analysis
An educational researcher tests whether three homework formats (Written, Digital, No Homework) affect standardised test scores in students per group (; ).
Results: , , [95% CI: 0.000, 0.103].
Levene's test: , — variances equal.
Sensitivity analysis:
Corresponding
This study had 80% power to detect only medium-to-large effects (). The observed is a small effect well below this detection threshold.
APA write-up: "A one-way ANOVA revealed no significant effect of homework format on standardised test scores, , , [95% CI: 0.000, 0.103], indicating a very small and statistically non-significant effect. The study had 80% power to detect effects of (); effects smaller than this threshold remain undetected. The observed is below this detection threshold, indicating the study was underpowered for the observed effect size."
15. Common Mistakes and How to Avoid Them
Mistake 1: Reporting as the Only Effect Size and Calling It Unbiased
Problem: Reporting as the effect size and implying it represents the true population proportion of variance explained. overestimates the population effect, sometimes substantially in small samples with few groups.
Solution: Report (or ) as the primary effect size. If journals require (some do), report it alongside and label as a biased estimate. Always compute the 95% CI for using DataStatPro.
Mistake 2: Interpreting the Omnibus F Without Post-Hoc Tests
Problem: Reporting , and concluding "all groups differ significantly" or "Groups 1 and 4 differ based on their means" without conducting post-hoc tests. The omnibus F tells you only that at least one pair differs.
Solution: Always follow a significant omnibus F with appropriate post-hoc tests or planned contrasts. Report all pairwise comparisons with adjusted p-values and individual effect sizes .
Mistake 3: Using Standard ANOVA When Variances Are Unequal
Problem: Running standard ANOVA without checking Levene's test, or ignoring a significant Levene's result, when group sizes are unequal. This produces inflated or deflated Type I error rates and untrustworthy p-values.
Solution: Always run Levene's test before deciding which ANOVA variant to use. When Levene's is significant (especially with unequal ), use Welch's ANOVA with Games-Howell post-hoc tests. Recommend setting DataStatPro to "Auto" mode, which applies Welch's ANOVA automatically when Levene's is significant.
Mistake 4: Running Multiple t-Tests Instead of ANOVA
Problem: Comparing three groups by running three separate pairwise t-tests without correction, inflating FWER to approximately 14%.
Solution: Use one-way ANOVA (or Welch's) for the omnibus test, followed by appropriate post-hoc tests or pre-planned contrasts. If pairwise comparisons are the primary interest, use Tukey HSD or Holm-Bonferroni corrections.
Mistake 5: Not Checking or Reporting Assumption Tests
Problem: Running ANOVA without checking normality and homoscedasticity, and reporting only the F-statistic and p-value without mentioning assumption checks. Readers cannot evaluate the validity of the results.
Solution: Always run and report Levene's test and Shapiro-Wilk (or Q-Q plot inspection). Report these results in the method or results section, and justify the test choice (standard vs. Welch's) based on the assumption check results.
Mistake 6: Using Fisher's LSD Without the Omnibus F Restriction
Problem: Applying Fisher's Least Significant Difference post-hoc test directly as a multiple comparison correction without first confirming the omnibus F is significant. Fisher's LSD does not adequately control FWER when .
Solution: For , Fisher's LSD is acceptable after a significant omnibus F (the "protected LSD"). For , always use a proper FWER-controlling procedure (Tukey, Holm, Games-Howell). Never report Fisher's LSD without the omnibus F protection.
Mistake 7: Reporting Effect Sizes Without Confidence Intervals
Problem: Reporting without a CI. With moderate sample sizes, the CI for can be extremely wide, making the point estimate essentially uninformative about the true effect magnitude.
Solution: Always report the 95% CI for (available in DataStatPro via the non-central F-distribution). A point estimate without a CI gives a false sense of precision.
Mistake 8: Applying Post-Hoc Tests When the Omnibus F is Non-Significant
Problem: Running all pairwise post-hoc comparisons regardless of the omnibus F result, and selectively reporting those that happen to be significant. This is p-hacking and inflates the FWER.
Solution: When the omnibus F is non-significant, do not run post-hoc pairwise tests (except for pre-planned contrasts specified before data collection). Report the non-significant omnibus F alongside the effect size and sensitivity analysis, and acknowledge the study's power limitations.
Mistake 9: Confusing "Equal Sample Sizes" with "Equal Variances"
Problem: Assuming that because all groups have equal , the equal variances assumption is met. Equal sample sizes reduce the consequences of variance heterogeneity but do not eliminate it. Levene's test may still be significant with balanced designs.
Solution: Always run Levene's test regardless of balance. When Levene's is significant, use Welch's ANOVA even for balanced designs (the power loss is negligible).
Mistake 10: Neglecting to Report the Full Descriptive Statistics Table
Problem: Reporting only the ANOVA source table (, df, ) without group means, SDs, and . Without descriptive statistics, the F-statistic is uninterpretable — readers cannot evaluate the direction, magnitude, or pattern of group differences.
Solution: Always include a descriptive statistics table with , , , and (or 95% CI) for each group. Include a visualisation (raincloud plot or means plot with individual data) whenever possible.
16. Troubleshooting
| Problem | Likely Cause | Solution |
|---|---|---|
| ; group means very similar | Non-significant result; report ; inspect group means | |
| or is negative | True effect near zero; small sample; correction overshoots | Report as 0 (convention); note small effect; increase sample size |
| much larger than | Small with groups; large bias correction | This is expected; always report as primary |
| Levene's test significant but are equal | Unequal variances exist but balanced design is partially protective | Still use Welch's ANOVA; equal reduces but does not eliminate the problem |
| Post-hoc tests show no significant pairs despite significant | Effect is spread across many small pairwise differences | Report omnibus and acknowledge no single pair survives correction; consider planned contrasts |
| Shapiro-Wilk significant with large () | High power of normality test; minor deviations detected | With large , CLT protects the t-test; inspect Q-Q for severity; ANOVA likely valid |
| Games-Howell and Tukey HSD give contradictory results | Heterogeneous variances affecting the inference | Use Games-Howell when variances are unequal; report both and note discrepancy |
| Very large with very small | Very large ; trivially small differences are statistically significant | Report effect size prominently; statistical significance ≠ practical significance |
| Kruskal-Wallis significant but all Dunn pairwise tests non-significant | Effect is distributed; Holm correction too conservative | Report all pairwise and ; consider reporting without correction for planned pairs |
| 95% CI for includes 0 despite significant | Wide CI due to small or ; possible when is marginally significant | Report the wide CI; both values are correct; note limited precision |
| Welch's df is very small | Extreme variance heterogeneity with small | Check data for errors; if genuine, use permutation ANOVA |
| One group has | ANOVA cannot estimate from a single observation | Collect more data; exclude the singleton group; use a different design |
| ANOVA gives different result from equivalent regression | Coding scheme issue (dummy vs. effect coding affects only interpretation, not F) | Verify coding; F-statistics should match; intercept and slopes will differ by coding |
| Post-hoc p-values are all exactly 1.0 | Software error; all group means identical | Verify data; check for data entry errors |
17. Quick Reference Cheat Sheet
Core One-Way ANOVA Formulas
| Formula | Description |
|---|---|
| Grand mean (weighted) | |
| Between-groups SS | |
| Within-groups SS | |
| Total SS | |
| ; ; | Degrees of freedom |
| Between-groups mean square | |
| Within-groups mean square (error) | |
| F-ratio | |
| Pooled within-groups SD | |
| p-value |
Effect Size Formulas
| Formula | Description |
|---|---|
| Eta squared (biased) | |
| Omega squared (preferred) | |
| Epsilon squared (alternative) | |
| Cohen's (from ) | |
| from | |
| from (approx.) | |
| Cohen's for pairwise | |
| Hedges' for pairwise |
Welch's ANOVA Formulas
| Formula | Description |
|---|---|
| Weight for group | |
| Weighted grand mean | |
| Welch's F (see Section 5.2) | |
| Welch-Satterthwaite df |
Kruskal-Wallis Formulas
| Formula | Description |
|---|---|
| Rank sum for group | |
| Kruskal-Wallis | |
| Tie-corrected | |
| Kruskal-Wallis effect size | |
| Dunn's test statistic | |
| Rank-biserial (pairwise) |
ANOVA Source Table Template
| Source | SS | MS | |||
|---|---|---|---|---|---|
| Between groups | [value] | ||||
| Within groups (Error) | |||||
| Total |
One-Way ANOVA Reporting Checklist
| Item | Required |
|---|---|
| -statistic with both df | ✅ Always |
| Exact p-value (or ) | ✅ Always |
| with 95% CI (primary effect size) | ✅ Always |
| (labelled as biased) | ✅ When journals require it |
| Which effect size was reported | ✅ Always |
| Group means and SDs for all groups | ✅ Always |
| Sample sizes per group | ✅ Always |
| Levene's test result | ✅ Always for independent designs |
| Whether standard or Welch's ANOVA was used | ✅ Always |
| Shapiro-Wilk result (normality) | ✅ When |
| Post-hoc test name and correction method | ✅ When omnibus significant |
| All pairwise comparisons with adjusted and | ✅ When omnibus significant |
| Planned contrast weights and rationale | ✅ When planned contrasts used |
| alongside | ✅ Recommended |
| Cohen's for power analysis reference | ✅ When reporting power |
| 95% CI for (via non-central ) | ✅ Always |
| 95% CI for each pairwise mean difference | ✅ Recommended |
| Sensitivity analysis (min detectable effect) | ✅ For null results |
| Domain-specific benchmark context | ✅ Recommended |
| Raincloud or violin plot | ✅ Strongly recommended |
| Whether Games-Howell was used with Welch's | ✅ When variances unequal |
| Descriptive statistics table | ✅ Always |
APA 7th Edition Reporting Templates
Standard One-Way ANOVA (significant result):
"A one-way between-subjects ANOVA was conducted to examine the effect of [IV] on [DV]. Levene's test indicated [equal / unequal] variances ( [value], [value]). The ANOVA revealed a [significant / non-significant] effect of [IV], [value], [value], [value] [95% CI: LB, UB], indicating a [small / medium / large] effect. [Post-hoc test name] pairwise comparisons revealed that [group pair(s)] differed significantly (all [threshold after correction]). [Other pairs] did not differ significantly."
Welch's One-Way ANOVA:
"Due to significant heterogeneity of variance (Levene's [value], [value]), Welch's one-way ANOVA was applied. The test revealed a [significant / non-significant] effect of [IV] on [DV], [value], [value], [value] [95% CI: LB, UB]. Games-Howell post-hoc comparisons indicated that [describe pairwise results]."
Non-significant result with sensitivity analysis:
"A one-way ANOVA revealed no significant effect of [IV] on [DV], [value], [value], [value] [95% CI: LB, UB]. Given the sample sizes ( [value] per group), this study had power to detect effects of [value] ( [value]) at 80% power. Effects smaller than this threshold may exist but remain undetected."
Kruskal-Wallis (non-parametric):
"Due to [non-normality / ordinal measurement], a Kruskal-Wallis test was conducted. The test revealed a [significant / non-significant] difference across groups, [value], [value], [value]. Dunn's pairwise post-hoc comparisons with Holm correction indicated that [describe pairwise results]."
Conversion Formulas
| From | To | Formula |
|---|---|---|
| , , | ||
| , , | (approx.) | |
| (always ) | ||
| (2 groups) | Cohen's | |
| Cohen's (2 groups) | ||
| (Kruskal-Wallis) | ||
| Pairwise | ||
| (Hedges') | ||
| (Dunn) | (approx.) |
Required Sample Size per Group (80% Power, , Two-Sided)
| Cohen's | Label | |||||
|---|---|---|---|---|---|---|
| 0.10 | Small | 322 | 274 | 240 | 215 | 180 |
| 0.15 | Small-Med | 144 | 123 | 107 | 96 | 80 |
| 0.25 | Medium | 52 | 45 | 39 | 35 | 29 |
| 0.35 | Med-Large | 27 | 23 | 21 | 19 | 16 |
| 0.40 | Large | 21 | 18 | 16 | 14 | 12 |
| 0.50 | Large | 14 | 12 | 11 | 10 | 8 |
| 0.60 | Large | 10 | 9 | 8 | 7 | 6 |
| 0.80 | Very large | 6 | 6 | 5 | 5 | 4 |
All values are per group. Total = .
Cohen's Benchmarks — ANOVA Effect Sizes
| Label | ||||
|---|---|---|---|---|
| Small | ||||
| Medium | ||||
| Large |
Post-Hoc Test Selection Guide
| Condition | Recommended Post-Hoc Test | Controls FWER |
|---|---|---|
| Balanced, equal variances | Tukey HSD | ✅ Exactly |
| Unbalanced, equal variances | Tukey-Kramer | ✅ Approximately |
| Unequal variances OR unequal | Games-Howell | ✅ Approximately |
| All groups vs. one control | Dunnett's | ✅ Optimal |
| Any design, conservative | Bonferroni | ✅ Conservative |
| Any design, less conservative | Holm-Bonferroni | ✅ Sequential |
| All contrasts (not just pairwise) | Scheffé | ✅ Most conservative |
| Non-parametric (Kruskal-Wallis) | Dunn + Holm | ✅ Sequential |
Degrees of Freedom Reference
| Source | Notes | |
|---|---|---|
| Between groups | = number of groups | |
| Within groups (Error) | = total observations | |
| Total | ||
| Welch's numerator | Same as standard | |
| Welch's denominator | Always | |
| Planned contrast | Per orthogonal contrast |
Assumption Checks Reference
| Assumption | Test | Action if Violated |
|---|---|---|
| Normality of residuals | Shapiro-Wilk, Q-Q | Kruskal-Wallis; transform |
| Homogeneity of variance | Levene's, Brown-Forsythe | Welch's ANOVA + Games-Howell |
| Independence | Design review | Multilevel model |
| Outliers | Boxplots, $ | z_i |
| Interval scale | Measurement theory | Kruskal-Wallis |
This tutorial provides a comprehensive foundation for understanding, conducting, and reporting One-Way ANOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for applied coverage; Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth; Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for robust alternatives including trimmed mean ANOVA; Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science" (Frontiers in Psychology, 2013) for the vs. discussion; Olejnik & Algina (2003) for generalised effect sizes; and Delacre, Lakens & Leys (2017) in the International Review of Social Psychology for the recommendation to default to Welch's ANOVA. For Bayesian ANOVA, see Rouder et al. (2012) in the Journal of Mathematical Psychology. For feature requests or support, contact the DataStatPro team.