Repeated Measures ANOVA: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of within-subjects experimental designs all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are encountering repeated measures ANOVA for the first time or deepening your understanding of analysing data where the same participants contribute observations across multiple conditions or time points, this guide builds your knowledge systematically from the ground up.
Table of Contents
- Prerequisites and Background Concepts
- What is Repeated Measures ANOVA?
- The Mathematics Behind Repeated Measures ANOVA
- Assumptions of Repeated Measures ANOVA
- Variants of Repeated Measures ANOVA
- Using the Repeated Measures ANOVA Calculator Component
- Step-by-Step Procedure
- Interpreting the Output
- Effect Sizes for Repeated Measures ANOVA
- Confidence Intervals
- Advanced Topics
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into repeated measures ANOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.
1.1 Within-Subjects vs. Between-Subjects Designs
A fundamental distinction in experimental design concerns how participants contribute data across conditions:
-
Between-subjects design: Different participants are assigned to different conditions. Each participant contributes one observation. Variability between participants is indistinguishable from variability between conditions — it becomes part of the error term.
-
Within-subjects design: The same participants are measured under all conditions (or at all time points). Each participant contributes multiple observations. Because we can model each participant's general tendency to score high or low, this individual variability is removed from the error term, substantially increasing statistical power.
Repeated measures ANOVA is the inferential framework for within-subjects designs with three or more conditions or time points.
1.2 The Logic of Variance Partitioning
Analysis of Variance (ANOVA) tests hypotheses by partitioning the total variability in the data () into meaningful components:
The key insight is that what constitutes "error" differs between designs:
-
In a between-subjects one-way ANOVA: , where includes both random measurement error and individual differences between participants.
-
In a within-subjects (repeated measures) ANOVA: (since all variation is within the same individuals), and:
By extracting (the variability attributable to stable individual differences), the residual error term is much smaller than in a between-subjects design, producing larger F-ratios and greater power.
1.3 The F-Ratio
The F-ratio is the core test statistic of ANOVA:
Under (all condition means are equal), . Under (at least one condition mean differs), . The larger the , the stronger the evidence against .
1.4 The F-Distribution
The F-distribution is parameterised by two degrees of freedom: (numerator, associated with the effect) and (denominator, associated with the error). It is:
- Right-skewed, defined only for non-negative values.
- Indexed by and ; approaches a normal distribution for very large .
- The p-value is always computed from the right tail: .
1.5 The Null and Alternative Hypotheses in ANOVA
For a within-subjects factor with levels (conditions or time points):
-
: All population condition means are equal:
-
: At least one population condition mean differs from at least one other: for some
is omnibus — it does not specify which means differ or in what direction. A significant F-test must therefore be followed by post-hoc tests or planned contrasts to identify the specific pattern of differences.
1.6 The p-Value and Significance Level
As in all hypothesis tests, the p-value is the probability of observing an F-ratio as large or larger than obtained, assuming is true. The significance level (conventionally ) is the threshold below which we reject .
⚠️ A significant omnibus F-test tells you only that the condition means are not all equal. It does not tell you which conditions differ, how large the differences are, or whether the differences are practically meaningful. Always follow up with effect sizes, confidence intervals, and post-hoc comparisons.
1.7 Carryover Effects and Counterbalancing
A unique concern in within-subjects designs is that participating in one condition may influence performance in a subsequent condition:
- Practice effects: Performance improves with experience across conditions.
- Fatigue effects: Performance deteriorates across conditions due to tiredness.
- Contrast effects: The subjective experience of one condition is influenced by the preceding condition.
Counterbalancing — systematically varying the order in which participants complete conditions — distributes these carryover effects evenly across conditions, preventing them from confounding the main effect of interest.
1.8 Mauchly's Sphericity and Why It Matters
The repeated measures ANOVA relies on an assumption called sphericity: the variances of all pairwise difference scores between condition levels must be equal. This is analogous to the homogeneity of variance assumption in between-subjects ANOVA, but specific to within-subjects designs. Violating sphericity inflates the Type I error rate. Mauchly's test and epsilon () corrections (Greenhouse-Geisser, Huynh-Feldt) are the standard diagnostic and remedial tools — covered fully in Section 4.
2. What is Repeated Measures ANOVA?
2.1 The Core Question
Repeated measures ANOVA (also called within-subjects ANOVA) is a parametric inferential test that determines whether the means of a continuous dependent variable differ significantly across three or more levels of a within-subjects factor — conditions, time points, or stimuli to which all participants are exposed.
Unlike the paired t-test (which compares two conditions), or between-subjects ANOVA (which compares independent groups), repeated measures ANOVA is the appropriate framework when:
- The same participants complete all conditions, OR
- Participants are measured at multiple time points (longitudinal panel data), OR
- The same participants are exposed to multiple stimuli or multiple tasks.
2.2 The General Logic
Repeated measures ANOVA exploits the within-person correlation across conditions. By modelling each participant's general response level (their row mean in the data matrix), the test removes stable individual differences from the error term:
This reduction in error variance means that for the same true effect size, repeated measures ANOVA has substantially greater statistical power than a comparable between-subjects design — particularly when individual differences are large (i.e., when participants consistently differ from one another regardless of condition).
2.3 When to Use Repeated Measures ANOVA
| Condition | Requirement |
|---|---|
| Research design | Same participants measured under all conditions |
| Dependent variable | Continuous (interval or ratio scale) |
| Within-subjects factor | Categorical with levels |
| Observations | Independence between participants (not within) |
| Distribution | Approximately normal within each condition (or ) |
| Sphericity | Variances of all pairwise difference scores are equal (testable) |
2.4 Real-World Applications
| Field | Research Question | Within-Subjects Factor |
|---|---|---|
| Clinical Psychology | Does anxiety score change across pre-treatment, mid-treatment, and post-treatment? | Time (3 levels) |
| Cognitive Neuroscience | Does reaction time differ across congruent, neutral, and incongruent Stroop conditions? | Congruency (3 levels) |
| Education | Does reading fluency improve across four assessment waves in a school year? | Time (4 levels) |
| Pharmacology | Does blood pressure differ across three drug dosage levels in the same patients? | Dosage (3 levels) |
| Sport Science | Does VO max differ across four stages of a progressive exercise protocol? | Exercise Stage (4 levels) |
| Nutrition | Does subjective hunger rating differ across morning, noon, afternoon, and evening? | Time of Day (4 levels) |
| Consumer Psychology | Do preference ratings differ across five product designs evaluated by each participant? | Product Design (5 levels) |
| Neuroimaging | Does BOLD signal differ across five experimental conditions within the same participants? | Condition (5 levels) |
2.5 Distinguishing from Related Tests
| Situation | Correct Test |
|---|---|
| One within-subjects factor, levels | Repeated measures ANOVA |
| One within-subjects factor, levels | Paired samples t-test |
| One between-subjects factor, groups | One-way between-subjects ANOVA |
| One within + one between factor | Mixed (split-plot) ANOVA |
| Two or more within-subjects factors | Factorial repeated measures ANOVA |
| Non-normal data, one within factor | Friedman test (non-parametric alternative) |
| Modelling trajectories over time with predictors | Linear mixed-effects model (LMM) |
| Binary or count outcome, repeated measures | Generalised linear mixed model (GLMM) |
3. The Mathematics Behind Repeated Measures ANOVA
3.1 Data Structure
Consider participants each measured under conditions. The data form an matrix of scores , where indexes participants and indexes conditions:
| Participant | Condition 1 | Condition 2 | Condition | Person Mean | |
|---|---|---|---|---|---|
| 1 | |||||
| 2 | |||||
| Condition Mean | (Grand Mean) |
3.2 Partitioning the Total Sum of Squares
The total sum of squares across all observations is:
In a repeated measures design, this partitions as:
The between-subjects component reflects how participants differ from each other (averaged across conditions):
The within-subjects component captures how each participant's scores vary across conditions. This is further partitioned into the condition effect and error:
Condition sum of squares (systematic variability between condition means):
Error sum of squares (residual variability after removing both condition and participant effects):
Or equivalently:
3.3 Degrees of Freedom
| Source | Degrees of Freedom |
|---|---|
| Between-Subjects | |
| Condition | |
| Error (Condition × Subjects) | |
| Total |
Verification: ✓
3.4 Mean Squares and the F-Ratio
Mean squares are obtained by dividing each sum of squares by its degrees of freedom:
The F-ratio for the within-subjects condition effect:
Under : , this F-ratio follows an F-distribution with and degrees of freedom.
3.5 The ANOVA Source Table
The complete one-way repeated measures ANOVA source table:
| Source | |||||
|---|---|---|---|---|---|
| Between-Subjects | — | — | — | ||
| Condition (Within) | from | ||||
| Error | |||||
| Total |
⚠️ The between-subjects row (, ) is typically not tested with an F-ratio — it represents stable individual differences that are partialled out, not an experimental factor of interest. Some software omits this row entirely.
3.6 Epsilon () Corrections for Sphericity Violations
When the sphericity assumption is violated (see Section 4), the actual sampling distribution of has heavier tails than the nominal distribution — producing inflated Type I error. Two corrections adjust the degrees of freedom to match the true distribution.
Greenhouse-Geisser (GG) epsilon:
Where are elements of the covariance matrix of condition scores, are column means, and is the grand mean of the covariance matrix.
Practically, ranges from (maximum violation of sphericity) to (perfect sphericity). GG is known to be conservative — it sometimes overcorrects, especially with larger and .
Huynh-Feldt (HF) epsilon:
HF epsilon is less conservative than GG and is recommended when . If , it is set to .
Corrected degrees of freedom:
The F-statistic itself is unchanged; only the reference distribution is adjusted.
Decision rule for epsilon corrections:
| Recommended Correction | |
|---|---|
| (no violation) | No correction needed |
| Huynh-Feldt correction | |
| Greenhouse-Geisser correction | |
| Any value (conservative approach) | Always use GG correction |
| Severe violation, small | Consider MANOVA approach |
3.7 The Multivariate Approach (MANOVA)
An alternative to the univariate F-test with epsilon corrections is the fully multivariate approach, which makes no sphericity assumption whatsoever. The repeated conditions are recast as contrast variables and tested with multivariate test statistics:
- Pillai's Trace:
- Wilks' Lambda:
- Hotelling-Lawley Trace:
- Roy's Largest Root:
Where is the hypothesis matrix and is the error matrix.
The multivariate approach is always valid regardless of sphericity, but requires and loses power relative to the corrected univariate test when sphericity holds approximately. For small relative to , the univariate approach with epsilon corrections is preferred.
3.8 Effect Size — Eta-Squared ()
The most straightforward effect size for repeated measures ANOVA is eta-squared:
is the proportion of total variance explained by the condition effect. However, in repeated measures designs, the between-subjects variance () is irreducible and not of interest. This makes artificially small compared to a between-subjects design with the same true effect — it is not directly comparable across designs.
3.9 Effect Size — Partial Eta-Squared ()
Partial eta-squared removes the between-subjects variance from the denominator:
represents the proportion of variance explained by the condition effect
after removing individual differences. It is the standard effect size reported
by most software (SPSS, SAS, R's ez package) and is comparable across between- and
within-subjects designs.
Relationship to :
3.10 Effect Size — Generalised Eta-Squared ()
Generalised eta-squared (Olejnik & Algina, 2003) is designed for comparability across studies with different designs. For a pure within-subjects design:
is the recommended effect size for meta-analysis involving repeated measures designs because it is invariant to the number of conditions measured, unlike .
3.11 Effect Size — Omega-Squared () and Partial Omega-Squared ()
Both and are positively biased (they overestimate the population effect, especially in small samples). Omega-squared applies a bias correction:
Partial omega-squared (preferred for repeated measures):
For large samples, . For small samples (), is the recommended effect size to report alongside .
3.12 Cohen's and Statistical Power
Cohen's is the standardised effect size for ANOVA, defined as:
Required sample size for desired power at two-sided (approximate):
Where is the non-centrality parameter satisfying the power equation.
Required per condition combination, one-way within-subjects (, , average intercorrelation):
| Cohen's | Verbal Label | Power = 0.80 | Power = 0.90 | Power = 0.95 |
|---|---|---|---|---|
| 0.10 | Small | 44 | 58 | 72 |
| 0.25 | Medium | 12 | 16 | 20 |
| 0.40 | Large | 7 | 9 | 11 |
| 0.50 | Large | 5 | 7 | 9 |
⚠️ Power for repeated measures ANOVA depends critically on the average correlation among conditions (). Higher → greater power (more individual variability removed). Always specify in power analyses for within-subjects designs.
4. Assumptions of Repeated Measures ANOVA
4.1 Normality of Residuals (or Condition Scores)
The repeated measures ANOVA assumes that the residual scores (or equivalently, the scores within each condition) are approximately normally distributed in the population.
How to check:
| Method | Details |
|---|---|
| Shapiro-Wilk test | Applied to residuals or condition-level scores; most powerful for |
| Q-Q plots | One per condition; points should fall along the diagonal |
| Histograms | One per condition; should be approximately bell-shaped |
| Skewness and kurtosis | ; suggest acceptable distributions |
Robustness: The F-test is moderately robust to non-normality when (via the Central Limit Theorem) and when the violation is symmetric. Severe skewness with small is the primary concern.
When violated:
- Use the Friedman test (non-parametric alternative) for small samples with non-normal data.
- Consider log or square-root transformation for right-skewed outcome variables.
- Use a linear mixed-effects model with robust standard errors.
4.2 Sphericity
Sphericity is the assumption that the variances of all pairwise difference scores between conditions are equal. For conditions, there are pairwise differences, each of which must have the same variance.
Formally, if is the difference score between conditions and for participant , then sphericity requires:
Compound symmetry (equal variances and equal covariances across all conditions) is a sufficient but not necessary condition for sphericity. Sphericity is a weaker requirement than compound symmetry and is the actual assumption of the F-test.
Mauchly's Test of Sphericity:
Where are the eigenvalues of the transformed covariance matrix (using orthonormal contrasts). The test statistic:
With .
: Sphericity holds (). → reject sphericity; apply corrections.
⚠️ Mauchly's test is sensitive to non-normality and can give misleading results in small samples (underpowered) and large samples (overpowered — detecting trivial violations). Always report alongside Mauchly's test result. indicates a practically meaningful violation regardless of the Mauchly p-value.
Epsilon values and their implications:
| Interpretation | Action | |
|---|---|---|
| Perfect sphericity | No correction needed | |
| Minimal violation | Huynh-Feldt or no correction | |
| Moderate violation | Huynh-Feldt correction | |
| Substantial violation | Greenhouse-Geisser correction | |
| Severe violation | GG correction or MANOVA | |
| Maximum violation | MANOVA strongly recommended |
Note: The sphericity assumption is irrelevant when (only two conditions form exactly one difference score, whose variance always equals itself). This is why the paired t-test needs no sphericity correction.
4.3 Independence of Observations Between Participants
While the design deliberately induces correlation within participants (across conditions), the observations between participants must be independent. Each participant's data must not influence another participant's data.
Common violations:
- Participants in the same lab session who can observe or influence each other.
- Family members or partners in the same study.
- Hierarchically nested data (e.g., students within classrooms all treated as independent) — use linear mixed-effects models instead.
4.4 Interval or Ratio Scale of Measurement
The dependent variable must be continuous and measured on at least an interval scale — equal numerical differences must represent equal psychological or physical differences across the entire scale range.
When violated: If the dependent variable is ordinal (e.g., ranks or Likert ratings treated as ordinal), use the Friedman test instead.
4.5 No Extreme Multivariate Outliers
Outliers in any condition can distort condition means and inflate , potentially masking real effects or creating spurious ones.
How to check:
- Boxplots for each condition.
- Standardised scores within each condition.
- Mahalanobis distance across conditions: flags participants who are outliers in the multivariate sense (unusual profile across all conditions simultaneously).
When outliers present: Investigate the cause. Report analyses with and without outliers. Consider the Friedman test or trimmed-mean ANOVA as robust alternatives.
4.6 Assumption Summary
| Assumption | How to Check | Remedy if Violated |
|---|---|---|
| Normality | Shapiro-Wilk; Q-Q plots per condition | Friedman test; data transformation; LMM |
| Sphericity | Mauchly's test; inspect | GG or HF correction; MANOVA |
| Independence between participants | Study design review | Linear mixed-effects model |
| Interval scale | Measurement theory review | Friedman test |
| No extreme outliers | Boxplots; -scores; Mahalanobis | Investigate; robust ANOVA; Friedman |
5. Variants of Repeated Measures ANOVA
5.1 One-Way Repeated Measures ANOVA
The standard form described throughout this tutorial: one within-subjects factor with levels. Tests whether condition means differ significantly.
5.2 Factorial Repeated Measures ANOVA (Two or More Within Factors)
When each participant is measured across all combinations of two or more within-subjects factors, a factorial repeated measures ANOVA is used.
For factors (with levels) and (with levels), the partition is:
Where denotes subjects. Each main effect and interaction has its own error term (the corresponding subjects-by-factor interaction). This allows each effect to be tested against a different, tailored error term.
5.3 Mixed ANOVA (Split-Plot Design)
The mixed ANOVA (also called split-plot ANOVA) combines:
- One or more between-subjects factors (different participants per group).
- One or more within-subjects factors (same participants across conditions).
Example: Comparing three treatment groups (between) measured at pre, mid, and post (within). The interaction between group and time (Group × Time) is typically the focal test — it assesses whether the trajectory of change over time differs across groups.
Variance partitioning:
Each between-subjects effect uses as its error term; each within-subjects effect uses as its error term.
5.4 Friedman Test (Non-Parametric Alternative)
When the normality assumption is severely violated or the data are ordinal, the Friedman test is the non-parametric equivalent of one-way repeated measures ANOVA.
Procedure:
- Rank each participant's scores across the conditions (1 = lowest, = highest).
- Compute the mean rank for each condition: .
- Compute the Friedman statistic:
Under , for large .
Effect size: Kendall's (coefficient of concordance):
ranges from 0 (no agreement in rankings across participants) to 1 (perfect agreement). Conversion to : (for ).
5.5 Trend Analysis (Polynomial Contrasts)
When the within-subjects factor is quantitative and equally spaced (e.g., time points at regular intervals, dosage levels at equal increments), trend analysis decomposes the condition effect into orthogonal polynomial components:
- Linear trend: Does the mean increase or decrease monotonically across levels?
- Quadratic trend: Is there a U-shaped or inverted-U-shaped pattern?
- Cubic trend: Is there an S-shaped or more complex pattern?
Each trend component has and is tested separately against the error mean square. Trend analysis is more powerful and more informative than the omnibus F-test when a specific trajectory is hypothesised.
5.6 Linear Mixed-Effects Models (LMM)
Linear mixed-effects models (also called multilevel models or hierarchical linear models) subsume repeated measures ANOVA as a special case while offering several important generalisations:
- Handle missing data without excluding participants (ANOVA requires complete data or imputation).
- Model unequal time intervals between measurements.
- Allow time-varying covariates as predictors.
- Specify flexible covariance structures (not restricted to compound symmetry or sphericity).
- Accommodate both balanced and unbalanced designs.
For complex longitudinal designs, LMMs are generally preferred over repeated measures ANOVA. DataStatPro's repeated measures ANOVA module automatically suggests LMM when missing data are detected.
5.7 Bayesian Repeated Measures ANOVA
The Bayesian approach computes Bayes Factors comparing models with and without the condition effect. Under default priors (Rouder et al., 2012):
Interpreting :
| Evidence | |
|---|---|
| Extreme evidence for | |
| Very strong | |
| Strong | |
| Moderate | |
| Anecdotal | |
| Moderate evidence for (no effect) |
6. Using the Repeated Measures ANOVA Calculator Component
The Repeated Measures ANOVA Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, and reporting within-subjects analyses.
Step-by-Step Guide
Step 1 — Select the Test
Navigate to Statistical Tests → ANOVA → Repeated Measures ANOVA.
Step 2 — Input Method
Choose how to provide data:
- Raw data (wide format): Each row is one participant; each column is one condition. DataStatPro automatically identifies the within-subjects structure.
- Raw data (long format): Three columns required: participant ID, condition label, and dependent variable value. DataStatPro reshapes to wide format internally.
- Summary statistics: Enter , condition means (), standard deviations (), and the correlation matrix across conditions. DataStatPro reconstructs the ANOVA source table.
Step 3 — Define the Within-Subjects Factor
- Specify the factor name (e.g., "Time", "Condition", "Dosage").
- Label each level (e.g., "Pre", "Mid", "Post").
- For factorial designs, define additional within-subjects factors and their levels.
- For mixed designs, specify the between-subjects grouping variable.
Step 4 — Select Post-Hoc Tests and Contrasts
- Post-hoc tests (exploratory): Bonferroni, Holm, Tukey's HSD (adapted for within-subjects), Sidák.
- Planned contrasts: Simple (each level vs. first), Helmert (each level vs. mean of preceding), polynomial (linear, quadratic, cubic trend), custom.
Step 5 — Set Significance Level and Confidence Level
Default: , 95% CI. Results at and are simultaneously displayed.
Step 6 — Select Display Options
- ✅ Full ANOVA source table with , , , , exact .
- ✅ Mauchly's test of sphericity and , .
- ✅ Greenhouse-Geisser and Huynh-Feldt corrected results (automatically displayed when sphericity is violated).
- ✅ Multivariate test statistics (Pillai, Wilks, Hotelling, Roy) as an alternative.
- ✅ Partial eta-squared () and omega-squared () with 95% CI.
- ✅ Generalised eta-squared ().
- ✅ Cohen's for power analysis.
- ✅ Condition means, standard deviations, and 95% CIs with error bar plots.
- ✅ Post-hoc comparison table (pairwise -tests with corrections).
- ✅ Profile plot (means across conditions with individual participant trajectories).
- ✅ Interaction plot (for factorial and mixed designs).
- ✅ Residual Q-Q plots and normality test per condition.
- ✅ Mahalanobis distance outlier detection.
- ✅ Power analysis: current post-hoc power and required for 80%, 90%, 95% power.
- ✅ Bayesian ANOVA (Bayes Factor ).
- ✅ APA 7th edition results paragraph (auto-generated).
Step 7 — Run the Analysis
Click "Run Repeated Measures ANOVA". DataStatPro will:
- Compute all , , , , and exact p-values.
- Conduct Mauchly's test; apply GG and HF corrections as appropriate.
- Compute multivariate test statistics.
- Compute , , , and Cohen's with 95% CIs.
- Run selected post-hoc comparisons or planned contrasts.
- Conduct normality and outlier diagnostics.
- Estimate post-hoc power.
- Output an APA-compliant results paragraph.
7. Step-by-Step Procedure
7.1 Full Manual Procedure
Step 1 — State the Hypotheses
(all condition population means are equal)
At least one differs from at least one other
Specify the within-subjects factor and the number of levels based on the research design, before examining the data.
Step 2 — Organise the Data Matrix
Arrange data in an matrix with one row per participant and one column per condition. Verify that there are no missing values (or plan imputation/LMM approach).
Step 3 — Check Assumptions
- Inspect distributions within each condition (histograms, Q-Q plots, Shapiro-Wilk).
- Investigate multivariate outliers (Mahalanobis distance).
- Confirm independence of observations between participants (design review).
- Proceed to compute expected frequencies — irrelevant here; proceed to compute — irrelevant; proceed to test sphericity after computing components.
Step 4 — Compute Grand Mean and Condition Means
Step 5 — Compute Sums of Squares
Step 6 — Compute Degrees of Freedom
Step 7 — Conduct Mauchly's Test of Sphericity
Compute the covariance matrix of condition scores and obtain and . If Mauchly's or , apply the appropriate correction.
Step 8 — Compute Mean Squares and the F-Ratio
(use corrected if applicable)
Step 9 — Compute the p-Value
Reject if .
Step 10 — Compute Effect Sizes
Step 11 — Conduct Post-Hoc Comparisons (if rejected)
For each pair of conditions , compute the pairwise paired t-test:
Where is the standard deviation of the difference scores . Apply Bonferroni, Holm, or Tukey correction for the pairwise comparisons.
Step 12 — Interpret and Report
Use the APA reporting template in Section 15. Always report , both values (with corrections if applicable), , , , and 95% CI for the effect size. Report Mauchly's test result and the epsilon correction applied.
8. Interpreting the Output
8.1 The F-Statistic
| Interpretation | Meaning |
|---|---|
| Condition variance error variance; consistent with | |
| Condition variance substantially exceeds error; evidence against | |
| Large with large | Can be significant even for very small |
| Small with small | May be non-significant even for large (low power) |
| (rare) | Observed means more similar than chance alone would predict |
8.2 The p-Value
| p-Value | Conventional Interpretation |
|---|---|
| No evidence against (equal condition means) | |
| Marginal evidence of condition differences (trend) | |
| Significant condition effect at | |
| Significant condition effect at | |
| Significant condition effect at |
⚠️ A significant F-test is omnibus: it indicates only that at least one pair of condition means differs. It does not identify which conditions differ or how large those differences are. Always follow a significant omnibus test with post-hoc comparisons or planned contrasts, and always report effect sizes.
8.3 Mauchly's Test and Epsilon
| Mauchly's | Recommended Action | |
|---|---|---|
| Report uncorrected results; sphericity holds | ||
| Report HF-corrected results as primary; note uncorrected | ||
| Use HF correction | ||
| Use GG correction; consider reporting MANOVA | ||
| Any | Report MANOVA as primary analysis |
⚠️ Always report regardless of Mauchly's test outcome, since Mauchly's test may lack power in small samples. A reader can use to judge the severity of any sphericity violation.
8.4 The ANOVA Source Table
When reading the output table, focus on:
- : Total systematic variability across conditions — larger values indicate greater spread of condition means.
- : Residual within-subjects variability after removing condition and participant effects — smaller values (more tightly correlated conditions) indicate a more sensitive design.
- ratio: The F-ratio — the key test statistic.
- values: Verify these match and (uncorrected) or their epsilon-adjusted equivalents.
8.5 Partial Eta-Squared () — Magnitude Interpretation
Cohen's (1988) benchmarks for (and ):
| Cohen's | Verbal Label | |
|---|---|---|
| Small | ||
| Medium | ||
| Large |
Extended benchmarks (Lakens, 2013, contextualised for within-subjects designs):
| Verbal Label | |
|---|---|
| Negligible | |
| Small | |
| Medium | |
| Large | |
| Very large |
⚠️ These benchmarks were developed for between-subjects designs. In within-subjects designs, values tend to be larger because individual differences are removed from the error term. Do not mechanically apply Cohen's benchmarks without considering the typical effect sizes in your specific research domain.
8.6 Post-Hoc Comparisons
After a significant omnibus F-test, post-hoc pairwise comparisons identify which specific pairs of conditions differ. For conditions there are pairs:
| Number of Pairs | |
|---|---|
| 3 | 3 |
| 4 | 6 |
| 5 | 10 |
| 6 | 15 |
For each comparison, report the mean difference , the -statistic, the adjusted p-value, and Cohen's for the paired comparison:
8.7 Profile Plots
A profile plot (line chart with conditions on the x-axis and mean outcome on the y-axis) is the primary visualisation for repeated measures ANOVA. Key features to examine:
- Parallel lines (in mixed designs): Suggests no interaction between between- and within-subjects factors.
- Crossing or diverging lines: Suggests a Group × Time interaction — the critical test in most mixed designs.
- Error bars: Should represent 95% within-subjects confidence intervals (not the standard between-subjects SEM), computed using the Cousineau-Morey correction, which removes between-subjects variance from the error.
9. Effect Sizes for Repeated Measures ANOVA
9.1 Partial Eta-Squared ()
The proportion of within-subjects variance (after removing participant-level variance) explained by the condition effect. This is the standard effect size reported by SPSS, SAS, and most ANOVA software. Upwardly biased in small samples.
9.2 Generalised Eta-Squared ()
Designed for comparability across studies that differ in the number of conditions and design type (between vs. within). Recommended for meta-analyses. Note that always, since the denominator of is larger.
9.3 Partial Omega-Squared () — Bias-Corrected
The unbiased (or less biased) estimate of the true population partial effect size. Preferred over for small samples (). Note that can be negative for very small effects — negative values should be reported as (indicating no effect).
9.4 Cohen's
Where is the standard deviation of the true condition means around the grand mean, and is the within-condition population standard deviation (error). Cohen's is used primarily in power analysis. Benchmarks: (small), (medium), (large).
9.5 Pairwise Cohen's for Post-Hoc Comparisons
For each pairwise comparison after a significant omnibus test:
Where is the standard deviation of the difference scores between conditions and . This is the paired Cohen's and is directly interpretable as the magnitude of the difference between two specific conditions.
9.6 Effect Size Summary Table
| Effect Size | Formula | Range | Best Use |
|---|---|---|---|
| Rarely recommended; underestimates effect | |||
| Standard reporting; comparable across designs | |||
| Meta-analysis; cross-design comparisons | |||
| Bias-corrected | Small samples (); unbiased estimate | ||
| Cohen's | Power analysis | ||
| Pairwise | Pairwise post-hoc comparisons |
10. Confidence Intervals
10.1 CI for Each Condition Mean
The standard 95% CI for condition :
Where is the standard deviation within condition . These between-subjects CIs are correct for estimating the true population mean of condition but are not appropriate for visual inference about within-subjects differences (they are too wide and do not reflect the advantage of the within-subjects design).
10.2 Within-Subjects CIs for Profile Plots (Cousineau-Morey)
For visualising within-subjects mean differences, the Cousineau-Morey within- subjects CI removes between-participants variance before computing the error:
-
Normalise each participant's scores:
-
Compute the standard error of the normalised scores:
-
Apply the Morey correction factor :
-
Construct the CI:
These within-subjects CIs are narrower than standard CIs and correctly represent the precision of within-subjects comparisons. When two such CIs barely overlap, the corresponding pairwise comparison is approximately significant at .
⚠️ Always label error bars in profile plots explicitly as "95% within-subjects confidence intervals (Morey correction)" to distinguish them from between-subjects CIs. Readers familiar with standard CIs will otherwise overestimate the uncertainty in pairwise differences.
10.3 CI for Partial Eta-Squared
CIs for are derived from the non-central F-distribution. The non-centrality parameter is:
Find and such that:
Then:
An approximate 95% CI (adequate for ):
DataStatPro computes exact CIs using numerical inversion of the non-central F-distribution.
10.4 CI for Pairwise Mean Differences
For each pairwise contrast :
Where (Bonferroni correction) or the appropriate adjusted critical value, and is the number of pairwise comparisons.
11. Advanced Topics
11.1 Interaction Contrasts in Factorial Within-Subjects Designs
In a two-way factorial repeated measures design (), a significant interaction indicates that the effect of factor depends on the level of factor (or vice versa). Interaction contrasts decompose this interaction into focused comparisons:
For a sub-table of a larger interaction, the interaction contrast compares the simple effect of at level to the simple effect of at level :
Each interaction contrast has and can be tested against a single-df error term derived from the interaction. The of all orthogonal interaction contrasts sum to the interaction .
11.2 Handling Missing Data
Standard repeated measures ANOVA requires complete data — every participant must have an observation in every condition. When data are missing:
| Missing Data Mechanism | Recommended Approach |
|---|---|
| Missing Completely at Random (MCAR) | Complete-case analysis acceptable; note reduced |
| Missing at Random (MAR) | Multiple imputation (MI); linear mixed-effects model |
| Missing Not at Random (MNAR) | Pattern-mixture or selection models; sensitivity analysis |
DataStatPro's LMM module handles MAR missing data using Full Information Maximum Likelihood (FIML), which includes all available data without requiring complete cases.
11.3 Counterbalancing and Order Effects
When the order of conditions may introduce carryover effects, counterbalancing assigns different condition orders to different participants. Complete counterbalancing (all orders represented) is only feasible for small (). For larger , Latin square designs provide a systematic partial counterbalancing:
In a Latin square counterbalancing, each condition appears exactly once in each ordinal position, ensuring that order effects are distributed evenly across conditions and do not confound the condition means.
To formally test for order effects in DataStatPro, add "Order Position" as a covariate in a mixed ANOVA after counterbalancing.
11.4 Multivariate vs. Univariate Approach: When to Choose
| Criterion | Favour Univariate (with correction) | Favour Multivariate (MANOVA) |
|---|---|---|
| Sphericity | Holds () | Violated () |
| Sample size | Small () | Large () |
| Number of conditions | is large | is small relative to |
| Focus | Omnibus test of condition effect | Specific multivariate structure |
| Power | Higher when sphericity holds | Higher when sphericity is violated and is large |
11.5 Effect Size Comparability: for Cross-Study Comparisons
A critical limitation of is that its value depends on how many conditions are included in the design. Adding a new condition level increases (the numerator) and may change (the denominator), making values from different studies with different incomparable.
Generalised eta-squared () solves this problem by including the stable between-subjects variance () in the denominator. This quantity is relatively constant across studies and provides a common denominator for effect size comparisons regardless of the number of conditions.
Recommendation: Always report both (for direct F-ratio interpretation and local comparison) and (for meta-analytic and cross-study comparisons).
11.6 Multiple Comparisons in Repeated Measures Designs
When conducting pairwise post-hoc comparisons, the familywise error rate inflates:
For conditions ( pairs): .
Correction strategies specific to repeated measures:
| Method | Description | Properties |
|---|---|---|
| Bonferroni | Simple; overly conservative with many comparisons | |
| Holm | Sequential Bonferroni | Less conservative; strongly controls FWER |
| Šidák | Slightly less conservative than Bonferroni | |
| Tukey's HSD | Uses the studentised range distribution | Optimal for all pairwise comparisons |
| Benjamini-Hochberg | Controls FDR rather than FWER | Appropriate for exploratory studies |
For planned contrasts (hypotheses specified before data collection), no correction is required if the contrasts are orthogonal and pre-registered.
11.7 Power Considerations: The Role of Within-Subjects Correlation
The primary driver of power in repeated measures ANOVA is the average correlation among conditions (). The error mean square is:
As increases toward 1 (perfect consistency of individual ordering across conditions), approaches 0 and power approaches 1 for any non-zero effect.
Implication for design: Repeated measures designs are most efficient when individual differences are large and consistent. For traits that are highly variable across individuals but stable within individuals (e.g., cognitive ability, personality), the power advantage of within-subjects designs is enormous.
Sample size estimation in DataStatPro requires specifying:
- The expected effect size (or equivalently ).
- The number of conditions .
- The expected average correlation among conditions.
- The significance level .
- The desired power .
11.8 Bayesian Repeated Measures ANOVA
The Bayesian approach provides evidence quantification rather than a binary decision. DataStatPro implements Rouder et al.'s (2012) Bayes Factor for within-subjects designs, comparing:
- : A model including the condition effect.
- : A model including only individual differences (no condition effect).
The Bayes Factor directly quantifies how much more probable the data are under the condition-effect model than under the null. Unlike frequentist p-values, constitutes positive evidence that the condition effect does not exist, making the Bayesian approach especially valuable for studies aiming to support a null result (e.g., demonstrating that a new training protocol has no effect on performance).
12. Worked Examples
Example 1: Pain Ratings Across Three Treatment Phases
A clinical researcher measures pain ratings (0–100 VAS scale) in chronic pain patients at three time points: Pre-treatment (Baseline), Mid-treatment (Week 6), and Post-treatment (Week 12). Do pain ratings change significantly over time?
Data Matrix (, ):
| Participant | Baseline () | Week 6 () | Week 12 () | Person Mean () |
|---|---|---|---|---|
| 1 | 72 | 58 | 45 | 58.33 |
| 2 | 65 | 50 | 38 | 51.00 |
| 3 | 80 | 62 | 50 | 64.00 |
| 4 | 55 | 44 | 32 | 43.67 |
| 5 | 78 | 60 | 48 | 62.00 |
| 6 | 60 | 48 | 35 | 47.67 |
| 7 | 70 | 54 | 42 | 55.33 |
| 8 | 82 | 65 | 52 | 66.33 |
| 9 | 58 | 46 | 33 | 45.67 |
| 10 | 75 | 57 | 45 | 59.00 |
| 11 | 68 | 52 | 40 | 53.33 |
| 12 | 63 | 49 | 37 | 49.67 |
| Condition Mean | 68.83 | 53.75 | 41.42 | 54.67 |
Step 1 — Hypotheses:
At least one time point mean differs.
Step 2 — Sum of Squares:
⚠️ Note: In this clean example, is computed directly from residuals as .
Step 3 — Degrees of Freedom:
Step 4 — Mauchly's Test:
, ,
,
Sphericity is not violated (, ); report uncorrected results.
Step 5 — Mean Squares and F-Ratio:
Step 6 — p-Value:
Step 7 — Effect Sizes:
Wait — let me recompute:
(very large)
Step 8 — Post-Hoc Comparisons (Bonferroni, pairs, ):
| Comparison | Mean Diff | (Bonferroni) | |||
|---|---|---|---|---|---|
| Baseline vs. Week 6 | |||||
| Baseline vs. Week 12 | |||||
| Week 6 vs. Week 12 |
All pairwise comparisons are significant; pain ratings decreased significantly at every time point.
Summary Table:
| Source | ||||||
|---|---|---|---|---|---|---|
| Between-Subjects | — | — | — | — | ||
| Time | ||||||
| Error | ||||||
| Total |
APA write-up: "A one-way repeated measures ANOVA revealed a significant effect of time on pain ratings, , , , [95% CI: .86, .96]. Mauchly's test indicated that the sphericity assumption was not violated, , , , . Bonferroni-corrected pairwise comparisons revealed that pain ratings decreased significantly from baseline (, ) to week 6 (, ; , , ), from baseline to week 12 (, ; , , ), and from week 6 to week 12 (, , )."
Example 2: Stroop Interference Across Three Congruency Conditions
A cognitive psychologist measures reaction time (ms) in participants across three Stroop conditions: Congruent, Neutral, and Incongruent.
Summary Statistics:
| Condition | ||
|---|---|---|
| Congruent | ms | ms |
| Neutral | ms | ms |
| Incongruent | ms | ms |
| Grand Mean | 540 ms |
Correlation matrix (estimated from pilot data):
| Congruent | Neutral | Incongruent | |
|---|---|---|---|
| Congruent | 1.00 | 0.72 | 0.65 |
| Neutral | 0.72 | 1.00 | 0.78 |
| Incongruent | 0.65 | 0.78 | 1.00 |
Step 1 — Compute SS from Summary Statistics:
Computing from the covariance matrix (using and ):
Each pairwise difference variance:
Using the direct formula:
Note: Exact computation requires the full data matrix; summary statistics yield an approximation. DataStatPro uses the full data matrix for precise results.
Step 2 — Degrees of Freedom:
Step 3 — F-Ratio:
(approximate; exact from full data)
Step 4 — Mauchly's Test:
From the correlation matrix: , Mauchly . Sphericity holds; no correction required.
Step 5 — Effect Size:
(medium-to-large effect; )
Post-Hoc Contrasts (a priori hypothesis: Congruent < Neutral < Incongruent):
| Comparison | Mean Diff (ms) | Bonferroni | |
|---|---|---|---|
| Incongruent vs. Congruent | |||
| Incongruent vs. Neutral | |||
| Neutral vs. Congruent |
APA write-up: "A one-way repeated measures ANOVA indicated a significant effect of Stroop congruency on reaction time, , , [95% CI: .03, .38], . Mauchly's test confirmed that sphericity was not violated, , , . Post-hoc pairwise comparisons (Bonferroni corrected) revealed that incongruent trials ( ms, ms) were significantly slower than both neutral ( ms; ms, , ) and congruent trials ( ms; ms, , ). Neutral trials were also significantly slower than congruent trials ( ms, , )."
Example 3: Mixed ANOVA — Rehabilitation Programme Across Two Groups and Three Time Points
A physiotherapist compares two rehabilitation protocols (Standard vs. Enhanced) on functional mobility scores (higher = better) in patients ( per group) across three time points: Pre, Post-4wk, and Post-8wk.
Summary Statistics:
| Group | Pre | Post-4wk | Post-8wk |
|---|---|---|---|
| Standard () | () | () | () |
| Enhanced () | () | () | () |
Step 1 — Hypotheses:
- : Standard and Enhanced groups have equal mean mobility scores (averaged across time).
- : Mean mobility scores are equal across Pre, Post-4wk, and Post-8wk (averaged across groups).
- : The time trajectory does not differ between groups (primary hypothesis).
Step 2 — ANOVA Source Table (condensed):
| Source | ||||
|---|---|---|---|---|
| Group (Between) | ||||
| Error (Between: Subjects within Groups) | — | |||
| Time (Within) | ||||
| Group × Time (Interaction) | ||||
| Error (Within) | — |
Mauchly's test: , , . Sphericity holds; uncorrected results reported.
Step 3 — Interpreting the Interaction (Primary Test):
The Group × Time interaction is significant, , , . This indicates that the trajectory of improvement over time differs between the Standard and Enhanced groups. Profile plot inspection reveals that both groups improve over time, but the Enhanced group improves at a faster rate — the gap between groups widens from Pre to Post-8wk.
Step 4 — Simple Effects (Follow-Up):
To unpack the interaction, compute the time effect separately within each group:
| Group | for Time | ||
|---|---|---|---|
| Standard | |||
| Enhanced |
Both groups show significant improvement over time. The difference in rate of improvement (interaction) is characterised by the Group × Time contrast.
APA write-up: "A 2 (Group: Standard vs. Enhanced) × 3 (Time: Pre, Post-4wk, Post-8wk) mixed ANOVA was conducted with Group as the between-subjects factor and Time as the within-subjects factor. Mauchly's test confirmed sphericity was not violated, , , , . The Group × Time interaction was significant, , , [95% CI: .11, .47], indicating that the two rehabilitation groups differed in their rate of improvement over time. Simple effects analyses confirmed that both groups improved significantly across time points (Standard: , ; Enhanced: , ), with the Enhanced group demonstrating a steeper improvement trajectory, reaching a mean mobility score of () at Post-8wk compared to () for the Standard group. Main effects of Time, , , , and Group, , , , were both significant but are qualified by the interaction."
13. Common Mistakes and How to Avoid Them
Mistake 1: Ignoring the Sphericity Assumption
Problem: Running repeated measures ANOVA without testing or correcting for sphericity. When sphericity is violated, the nominal F-distribution is incorrect and the Type I error rate is inflated — sometimes substantially (e.g., actual of when nominal for and severe violation).
Solution: Always report Mauchly's test result and and . Apply the HF correction when and the GG correction when . Consider the multivariate approach for severe violations with adequate .
Mistake 2: Treating the Omnibus F-Test as the Final Answer
Problem: Reporting a significant omnibus F and concluding "the conditions differ" without specifying which conditions differ or by how much. The omnibus test provides no information about the pattern of differences.
Solution: Always follow a significant omnibus F with post-hoc pairwise comparisons (for exploratory research) or planned contrasts (for hypothesis-driven research). Report mean differences, confidence intervals, and pairwise effect sizes () for each comparison of interest.
Mistake 3: Using Between-Subjects CIs in Profile Plots
Problem: Plotting standard between-subjects error bars (which include individual difference variance) on a profile plot for repeated measures data. These CIs are too wide and misleadingly suggest that adjacent condition means are not significantly different even when the within-subjects F-test is highly significant.
Solution: Use within-subjects confidence intervals (Cousineau-Morey correction) for all profile plots involving repeated measures. Always label error bars explicitly in the figure caption and state the correction used.
Mistake 4: Conflating the Time Effect with the Group × Time Interaction in Mixed ANOVA
Problem: In mixed ANOVA, focusing on the significant main effect of Time and concluding that the treatment works, while ignoring that the Time effect averages across groups (including the control group). The relevant test for treatment efficacy is the Group × Time interaction, not the main effect of Time.
Solution: For intervention studies with a control group, the primary hypothesis test is always the Group × Time interaction. The main effects of Time and Group are typically secondary or incidental.
Mistake 5: Reporting Only Without Context
Problem: Reporting without noting that this is a within-subjects design, or without reporting . Partial eta-squared in repeated measures designs is typically much larger than in between-subjects designs for equivalent true effect sizes because is excluded from the denominator. This makes values non-comparable across designs.
Solution: Always report both and . Note whether the effect size is from a within- or between-subjects design. Report as the bias-corrected complement to , especially for small .
Mistake 6: Applying Repeated Measures ANOVA to Dependent Groups
Problem: Analysing data where different participants are matched or paired (not the same participants) using the same software procedure as for within-subjects designs. While mathematically the analysis is valid (matched pairs can be treated as if the same participant were measured twice), the interpretation of the between-subjects term changes and must be communicated clearly.
Solution: Clearly state in the methods section whether the design uses the same participants (within-subjects) or matched participants. The statistical procedure is the same, but the language of "the same participants across conditions" vs. "matched pairs" differs and affects interpretation of the between-subjects variance term.
Mistake 7: Conducting Multiple Repeated Measures ANOVAs Without Correction
Problem: Running separate repeated measures ANOVAs on multiple dependent variables (e.g., one for each of 10 outcome scales), each at , without any multiple comparison correction. This inflates the experiment-wise Type I error rate.
Solution: Use MANOVA to jointly test all dependent variables simultaneously if they are theoretically related (and if is adequate). If separate ANOVAs are necessary, apply Bonferroni or Holm correction to the omnibus p-values. Report all tested ANOVAs, not just significant ones.
Mistake 8: Forgetting Counterbalancing for Counterorder-Sensitive Conditions
Problem: Presenting all participants with conditions in the same fixed order (e.g., always Condition 1, then Condition 2, then Condition 3). Any practice, fatigue, or contrast effects will be confounded with the condition effect, potentially inflating or deflating specific condition means.
Solution: Counterbalance condition order across participants. Use a complete counterbalancing scheme (all orders) for small or a Latin square for larger . Include order position as a covariate in the analysis to check for residual order effects.
14. Troubleshooting
| Problem | Likely Cause | Solution |
|---|---|---|
| Condition means more similar than chance; very noisy data; possible outliers | Check data; verify conditions manipulate the intended construct; inspect outliers | |
| Mauchly's | Only conditions (sphericity trivially satisfied) or perfectly equal difference variances | Report sphericity as trivially satisfied for ; verify data for |
| is very small () | Severe sphericity violation; markedly unequal difference variances | Report GG-corrected results; also report MANOVA; consider LMM |
| GG and HF corrected p-values differ substantially | Moderate sphericity violation near the boundary | Report both corrections; use HF as primary if |
| MANOVA test is significant but univariate F is not | Multivariate structure not captured by univariate mean differences; multivariate test sensitive to profile shape, not just level | Report both; describe the multivariate pattern using discriminant function analysis |
| Very large effect; possible design confound; or is small relative to the effect | Verify no confounds (order effects, demand characteristics); replicate with independent sample | |
| is negative | Very small true effect; ; small | Report as ; do not report negative values as meaningful |
| Post-hoc tests all non-significant despite significant omnibus | Bonferroni correction too conservative with many comparisons | Use Holm or Tukey correction; consider planned contrasts if hypotheses existed a priori |
| Sphericity test unavailable | Only conditions (sphericity not testable) or all difference variances are zero | For , sphericity is trivially met; report paired t-test results |
| Missing data in one or more cells | Participant missing a condition | Exclude participant from analysis (report reduced ) or switch to LMM for all-inclusive analysis |
| Very wide CIs for | Small | Increase ; report CIs faithfully — they convey genuine uncertainty; plan an adequately powered replication |
| Profile plot lines cross (mixed ANOVA) | Likely significant Group × Time interaction | Test the interaction formally; if significant, report and interpret simple effects |
| Friedman test disagrees with repeated measures ANOVA | Non-normality causing ANOVA to be unreliable | Use Friedman test results if normality is severely violated; report both and note discrepancy |
15. Quick Reference Cheat Sheet
Core Equations
| Formula | Description |
|---|---|
| Condition sum of squares | |
| Between-subjects sum of squares | |
| Error sum of squares | |
| Condition degrees of freedom | |
| Error degrees of freedom | |
| F-ratio | |
| Right-tail p-value | |
| Partial eta-squared | |
| Generalised eta-squared | |
| Partial omega-squared (bias-corrected) | |
| Cohen's for power analysis | |
| Pairwise Cohen's for post-hoc comparison | |
| , | Epsilon-corrected degrees of freedom |
Epsilon Correction Decision Guide
| Correction | |
|---|---|
| None required | |
| Huynh-Feldt | |
| Greenhouse-Geisser | |
| Severe () | GG + report MANOVA |
Effect Size Benchmarks
| Cohen's | Verbal Label | |
|---|---|---|
| Small | ||
| Medium | ||
| Large |
Required Sample Size (One-Way, , , )
| Cohen's | Power = 0.80 | Power = 0.90 |
|---|---|---|
| 0.10 (small) | 52 | 70 |
| 0.25 (medium) | 12 | 16 |
| 0.40 (large) | 7 | 9 |
| 0.50 | 5 | 7 |
Assumes , two-tailed; average inter-condition correlation.
Decision Guide
| Condition | Recommended Test |
|---|---|
| Same participants, conditions, normal | Repeated measures ANOVA |
| Same participants, conditions | Paired samples t-test |
| Same participants, , non-normal or ordinal | Friedman test |
| Severe sphericity violation, adequate | MANOVA approach |
| Missing data, unequal intervals, or covariates | Linear mixed-effects model (LMM) |
| One within + one between factor | Mixed (split-plot) ANOVA |
| Two or more within factors | Factorial repeated measures ANOVA |
| Quantitative, equally-spaced within factor | Polynomial trend analysis |
| Establishing null condition effect | Bayesian RM ANOVA () |
Post-Hoc Correction Comparison
| Method | Controls | Best Used When |
|---|---|---|
| Bonferroni | FWER (strict) | Few comparisons () |
| Holm | FWER (sequential) | Moderate ; less conservative than Bonferroni |
| Tukey's HSD | FWER | All pairwise comparisons; equal |
| Šidák | FWER | Independent comparisons; slightly less conservative than Bonferroni |
| Benjamini-Hochberg | FDR | Exploratory; large |
| None (planned orthogonal) | Per-comparison | Strictly pre-registered, orthogonal contrasts |
APA 7th Edition Reporting Templates
One-Way Repeated Measures ANOVA (sphericity met): "A one-way repeated measures ANOVA revealed a significant effect of [Factor] on [Outcome], [value], [value], [value] [95% CI: LB, UB], [value]. Mauchly's test indicated that sphericity was not violated, [value], [value], [value], [value]."
One-Way Repeated Measures ANOVA (sphericity violated; GG correction): "A one-way repeated measures ANOVA with Greenhouse-Geisser correction revealed a significant effect of [Factor] on [Outcome], [value], [value], [value] [95% CI: LB, UB], [value]. Mauchly's test indicated that sphericity was violated, [value], [value], [value], [value]."
Mixed ANOVA (Group × Time interaction): "A mixed ANOVA with [Between-Factor] as the between-subjects factor and [Within-Factor] as the within-subjects factor revealed a significant [Between × Within] interaction, [value], [value], [value] [95% CI: LB, UB]. Simple effects analyses indicated that..."
With post-hoc comparisons: "Post-hoc pairwise comparisons (Bonferroni corrected) indicated that [Condition A] ( [value], [value]) was significantly [higher/lower] than [Condition B] ( [value], [value]), [value], [value], [value] [95% CI: LB, UB]."
With Friedman test (non-parametric): "A Friedman test indicated a significant effect of [Factor] on [Outcome], [value], [value], [value]. Post-hoc Wilcoxon signed-rank tests (Bonferroni corrected) indicated that..."
Reporting Checklist
| Item | Required |
|---|---|
| -statistic (uncorrected or corrected) | ✅ Always |
| Both degrees of freedom (, ) | ✅ Always |
| Exact p-value | ✅ Always |
| Mauchly's , , , | ✅ Always (when ) |
| and | ✅ Always (when ) |
| Statement of which correction was applied | ✅ When sphericity violated |
| Condition means and standard deviations | ✅ Always |
| 95% CIs for condition means (within-subjects) | ✅ Always |
| with 95% CI | ✅ Always |
| (bias-corrected) | ✅ When |
| ✅ For meta-analytic or cross-study comparisons | |
| Cohen's | ✅ For power analysis reporting |
| Post-hoc comparisons (means, , , ) | ✅ When omnibus is significant |
| Planned contrasts (if pre-registered) | ✅ When applicable |
| Normality check (Shapiro-Wilk, Q-Q plots) | ✅ When |
| Outlier check (Mahalanobis ) | ✅ Always |
| Profile plot with within-subjects error bars | ✅ Always |
| Power analysis (post-hoc or a priori) | ✅ For non-significant results; underpowered studies |
| Bayes Factor | Recommended for null results |
| Counterbalancing statement | ✅ When within-subjects design with potential carryover |
| Sample size per condition | ✅ Always |
This tutorial provides a comprehensive foundation for understanding, conducting, and reporting repeated measures ANOVA within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018), Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018), Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science" (Frontiers in Psychology, 2013), Rouder et al.'s "Default Bayes Factors for ANOVA Designs" (Journal of Mathematical Psychology, 2012), and Olejnik & Algina's "Generalized Eta and Omega Squared Statistics" (Psychological Methods, 2003). For feature requests or support, contact the DataStatPro team.