ANCOVA: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of covariance adjustment all the way through the mathematics, assumptions, effect sizes, post-hoc testing, non-parametric alternatives, interpretation, reporting, and practical usage of the Analysis of Covariance (ANCOVA) within the DataStatPro application. Whether you are encountering ANCOVA for the first time or seeking a rigorous, unified understanding of covariate-adjusted between-groups inference, this guide builds your knowledge systematically from the ground up.
Table of Contents
- Prerequisites and Background Concepts
- What is ANCOVA?
- The Mathematics Behind ANCOVA
- Assumptions of ANCOVA
- Variants of ANCOVA
- Using the ANCOVA Calculator Component
- Full Step-by-Step Procedure
- Effect Sizes for ANCOVA
- Post-Hoc Tests and Planned Contrasts
- Confidence Intervals
- Power Analysis and Sample Size Planning
- Non-Parametric Alternative: Quade and Ranked ANCOVA
- Advanced Topics
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into ANCOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.
1.1 From One-Way ANOVA to ANCOVA
One-Way ANOVA tests whether group means on a dependent variable (DV) differ beyond what chance alone would produce. However, ANOVA assumes that groups are equivalent on all variables except the independent variable (IV) — an assumption satisfied by random assignment in experiments, but rarely in observational research.
ANCOVA extends ANOVA by:
- Statistically controlling for one or more continuous covariates (CVs) that are correlated with the DV.
- Removing covariate-related variance from the error term, increasing statistical power.
- Adjusting group means to what they would be if all groups had identical covariate scores — producing adjusted means (also called estimated marginal means).
1.2 What is a Covariate?
A covariate (also called a concomitant variable) is a continuous variable that:
- Is correlated with the DV.
- Is not manipulated by the researcher (it is measured, not assigned).
- Is measured before the treatment or is logically prior to the treatment effect.
Examples:
- Pre-test score as a covariate when the DV is a post-test score.
- Age as a covariate when the DV is a cognitive outcome.
- Baseline anxiety when the DV is post-treatment anxiety.
- IQ when the DV is academic achievement.
The covariate should be chosen on theoretical grounds, not selected post-hoc because it "improves" the results. Including irrelevant covariates reduces power by consuming degrees of freedom.
1.3 The Two Goals of ANCOVA
ANCOVA serves two distinct but related purposes, and understanding which goal applies to your study is critical for correct interpretation:
Goal 1 — Increase Statistical Power (Experimental Designs)
In randomised experiments, groups are equal on average at baseline (including on the covariate). Including a covariate reduces by removing variance explained by the covariate from the residual. This shrinks the denominator of the F-ratio, increasing power to detect treatment effects.
Goal 2 — Statistical Control (Quasi-Experimental and Observational Designs)
In non-randomised designs, groups may differ on the covariate at baseline. ANCOVA adjusts group means to a common covariate value, providing a partial statistical control for pre-existing differences. However, this control is imperfect and cannot fully substitute for randomisation.
⚠️ These two goals have different interpretational requirements. For Goal 1 (randomised experiments), ANCOVA assumptions are easily met and interpretation is straightforward. For Goal 2 (observational designs), the assumption of covariate independence from group membership is violated by design, requiring careful interpretational caveats about residual confounding.
1.4 The Regression Foundation of ANCOVA
ANCOVA is a special case of the General Linear Model (GLM):
Where:
- is the DV score for participant .
- is the effect of group ().
- is the covariate score for participant .
- is the grand mean of the covariate.
- is the common within-group regression coefficient (slope) of on .
- is the residual error.
This is equivalent to a multiple regression model with group dummy codes and the covariate as predictors. The F-test for the group effect in ANCOVA tests whether groups differ after partialling out the covariate.
1.5 Variance Partitioning in ANCOVA
ANCOVA partitions the total sum of squares differently from ANOVA:
The covariate "absorbs" variance from the error term. The adjusted within-groups SS () is smaller than the unadjusted , leading to a smaller and greater power — provided the covariate is genuinely correlated with the DV.
1.6 Adjusted Means
The core output of ANCOVA is the adjusted group mean — the estimated group mean after removing the linear effect of the covariate:
Where:
- is the observed (unadjusted) mean of group .
- is the pooled within-group regression coefficient.
- is the mean covariate score in group .
- is the grand mean of the covariate.
The adjusted mean represents what the group mean would have been if all groups had the same average covariate score (). These are also called Estimated Marginal Means (EMMs) and are the primary means for interpretation and post-hoc testing in ANCOVA.
1.7 The Homogeneity of Regression Slopes Assumption
Unlike ANOVA, ANCOVA carries a critical additional assumption: the within-group regression slope of on must be the same in all groups. If the slope varies across groups, the covariate adjustment is non-uniform, the ANCOVA F-test is invalid, and an interaction model (covariate × group) is more appropriate. This is the most commonly violated and overlooked ANCOVA assumption.
1.8 ANCOVA vs. Gain Score Analysis
A common alternative to ANCOVA for pre-post designs is to compute gain scores (post − pre) and run a one-way ANOVA. The choice between ANCOVA and gain score analysis depends on the reliability and variability of the pre-test:
- ANCOVA is preferred when the pre-test is highly reliable and when groups have different pre-test means (common in quasi-experimental designs).
- Gain score ANOVA is preferred when pre-test scores are unreliable (Lord's paradox arises under certain conditions).
- Both approaches are valid for fully randomised experiments; ANCOVA generally has more power.
2. What is ANCOVA?
2.1 The Core Idea
Analysis of Covariance (ANCOVA) is a parametric inferential procedure that combines one-way ANOVA with linear regression. It tests whether the adjusted means of independent groups are simultaneously equal, after statistically controlling for one or more continuous covariates.
The ANCOVA omnibus null hypothesis:
The adjusted means are population means evaluated at the grand mean of the covariate:
2.2 What ANCOVA Tests and Does Not Test
ANCOVA tells you:
- Whether adjusted group mean differences are larger than expected by chance, after accounting for the covariate (omnibus test).
- How much of the covariate-adjusted outcome variance is explained by group membership (, ).
- The direction and statistical significance of the covariate's relationship with the DV.
- What group means would look like if groups were equated on the covariate.
ANCOVA does NOT tell you:
- Which specific adjusted group means differ (requires post-hoc tests on adjusted means).
- Whether the treatment caused the DV change in non-randomised designs (residual confounding may remain).
- The effect size for individual pairwise adjusted mean differences (requires Cohen's ).
- Whether the homogeneity of regression slopes assumption holds (must be tested separately).
2.3 Design Requirements
For a one-way between-subjects ANCOVA, the design must satisfy:
- One continuous DV (interval or ratio scale).
- One categorical IV with levels (groups).
- One or more continuous covariates (interval or ratio scale), measured prior to or independently of the treatment.
- Different participants in each group (independent samples).
- Each participant contributes exactly one score to exactly one group.
2.4 ANCOVA in Context
| Situation | Test |
|---|---|
| groups, no covariates, normal, equal variances | One-Way ANOVA |
| groups, no covariates, normal, unequal variances | Welch's One-Way ANOVA |
| groups, one or more continuous covariates, normal | ANCOVA |
| groups, covariate, unequal slopes across groups | ANCOVA with interaction (Johnson-Neyman) |
| groups, covariate, non-normal or ordinal DV | Quade test / Ranked ANCOVA |
| IVs, one or more covariates | Factorial ANCOVA |
| conditions, repeated measures, covariate | ANCOVA with repeated measures |
| Continuous IV, continuous DV, continuous moderator | Moderated regression |
| DVs, one or more covariates | MANCOVA |
2.5 Real-World Applications
| Field | Example Application | IV (Levels) | Covariate | DV |
|---|---|---|---|---|
| Clinical Psychology | CBT vs. BA vs. Waitlist | 3 therapy conditions | Pre-treatment PHQ-9 | Post-treatment PHQ-9 |
| Education | 3 teaching methods | 3 conditions | Pre-test score | Post-test score |
| Medicine | 3 drug doses vs. placebo | 4 groups | Baseline BP | Post-treatment BP |
| Neuroscience | 3 sleep conditions | 3 groups | Age | Reaction time |
| HR/OB | 3 leadership styles | 3 groups | Job experience | Productivity |
| Nutrition | 4 diet types | 4 groups | Baseline weight | Weight loss |
| Marketing | 5 ad formats | 5 groups | Prior brand attitude | Purchase intent |
| Epidemiology | 3 intervention programmes | 3 groups | SES score | Health outcome |
3. The Mathematics Behind ANCOVA
3.1 Notation
| Symbol | Meaning |
|---|---|
| Number of groups | |
| Sample size in group | |
| Total sample size | |
| Number of covariates | |
| Covariate score for participant in group (single covariate) | |
| DV score for participant in group | |
| Mean covariate score in group | |
| Grand mean of the covariate | |
| Observed (unadjusted) mean of DV in group | |
| Adjusted mean of DV in group | |
| Pooled within-group regression slope of on | |
| Within-group slope in group (for homogeneity test) | |
| Adjusted within-groups mean square (error after covariate removal) |
3.2 The Within-Group Regression Coefficient
ANCOVA uses the pooled within-group regression coefficient , computed from the pooled within-group sums of cross-products:
Within-group sum of squares for the covariate:
Within-group sum of cross-products (covariate × DV):
Pooled within-group regression coefficient:
This slope represents the average linear relationship between the covariate and DV within groups, pooled across all groups. It is equivalent to the slope obtained from a regression of on with all between-group variance removed.
3.3 Adjusted Group Means
The adjusted group mean for group :
Interpretation: The adjusted mean is the estimated group mean when the group's covariate mean equals the grand mean of the covariate. It answers: "What would Group 's mean DV score be if they had, on average, the same covariate score as the entire sample?"
Adjusted grand mean:
The adjusted grand mean equals the unadjusted grand mean (covariate adjustment preserves the overall mean).
3.4 Sum of Squares Decomposition in ANCOVA
ANCOVA involves computing adjusted sums of squares by removing the linear effect of the covariate from both the total and within-group SS.
Total sum of squares for DV (unadjusted):
Within-group sum of squares for DV (unadjusted):
Total sum of squares for covariate:
Total sum of cross-products:
Adjusted within-groups SS (error after covariate removal):
The second term is the reduction in error SS due to the covariate. It equals — the within-group regression of on .
Adjusted total SS:
Adjusted between-groups SS:
Verification:
Note: In ANCOVA, if computed from raw unadjusted SS — it is the adjusted versions that sum correctly.
3.5 Degrees of Freedom
| Source | |
|---|---|
| Between groups (adjusted) | |
| Covariate | (number of covariates) |
| Within groups / Error (adjusted) | |
| Total (adjusted) |
The key difference from ANOVA: the error is reduced by (one per covariate), because each covariate costs one degree of freedom to estimate its slope. This is why including irrelevant covariates (low correlation with DV) reduces power despite removing some variance.
Power gain from covariate requires:
Simplified: the covariate must explain more variance than the it consumes. For a single covariate (), any provides power gain when is large enough.
Break-even correlation for a single covariate:
For , : . Any within groups yields higher power from ANCOVA than from one-way ANOVA.
3.6 Mean Squares and the F-Ratio
Adjusted between-groups mean square:
Adjusted within-groups mean square (adjusted error variance):
The ANCOVA F-statistic:
Under :
p-value:
3.7 The F-Test for the Covariate
ANCOVA also produces an F-test for the covariate itself:
Where:
This tests whether the pooled within-group regression slope is significantly different from zero. A non-significant covariate F-test suggests the covariate is not linearly related to the DV within groups, and including it may reduce power.
3.8 The ANCOVA Source Table
| Source | SS | MS | |||
|---|---|---|---|---|---|
| Covariate () | |||||
| Between groups (adjusted) | |||||
| Error (adjusted) | |||||
| Total (adjusted) |
3.9 Computing the Pooled Within-Group Correlation
The pooled within-group correlation between the covariate and DV:
This correlation quantifies the linear relationship between covariate and DV within groups, pooled across groups. It determines how much variance the covariate removes from the error term.
Percentage of within-group variance explained by covariate:
Percentage reduction in error SS:
3.10 Adjusted Standard Error for Adjusted Means
The standard error of an adjusted group mean:
The second term inside the square root captures additional uncertainty from the fact that the covariate mean of group may deviate from the grand mean. When , this term vanishes and .
3.11 Multiple Covariates
With covariates, the ANCOVA model becomes:
The adjusted SS are computed using matrix algebra (the GLM framework):
Where is the vector of pooled within-group regression coefficients, is the pooled within-group covariance matrix of covariates, and is the pooled within-group covariance vector between covariates and DV.
DataStatPro handles multiple covariates automatically using matrix algebra in its GLM engine.
3.12 Computing Effect Sizes from the ANCOVA Table
Partial eta squared (from F):
Partial omega squared (bias-corrected, preferred):
Exact partial omega squared (from SS):
4. Assumptions of ANCOVA
ANCOVA carries all the assumptions of one-way ANOVA plus three additional covariate-specific assumptions. Violating the covariate assumptions is more consequential than violating standard ANOVA assumptions.
4.1 Normality of Residuals
The adjusted residuals must be normally distributed within each group.
How to check:
- Shapiro-Wilk test on adjusted residuals (: residuals are normal).
- Q-Q plot of adjusted residuals: points should follow the diagonal.
- Histograms per group of adjusted residuals.
- Skewness and kurtosis of adjusted residuals: , .
Robustness: ANCOVA is robust to mild non-normality, especially with balanced designs and per group. The robustness applies to the F-test for the group effect; tests on the covariate coefficient are also robust with moderate .
When violated: Use the Quade test (non-parametric ANCOVA) or ranked ANCOVA as described in Section 12. Consider log or square root transformations for right-skewed DVs.
4.2 Homogeneity of Variance (Homoscedasticity)
The adjusted within-group variances must be equal across all groups:
This applies to the residuals after removing the covariate effect.
How to check:
- Levene's test on the adjusted residuals (most commonly used).
- Brown-Forsythe test on adjusted residuals (more robust to non-normality).
- Variance ratio rule: If , heterogeneity is potentially problematic.
When violated: Use a heteroscedastic ANCOVA (Welch-type adjustment) or non-parametric alternatives. Report the violation and the robust alternative results alongside standard ANCOVA results.
4.3 Independence of Observations
All observations must be independent within and across groups. This is a design assumption that cannot be tested from data.
Common violations:
- Clustered data (students nested in classrooms).
- Repeated measurements from the same participant.
- Social network dependencies among participants.
When violated: Use multilevel ANCOVA (covariate in mixed-effects model) or repeated measures ANCOVA.
4.4 Interval Scale of Measurement
Both the DV and the covariate(s) must be measured on at least an interval scale. The covariate must be continuous or at minimum have many ordered categories.
When violated for DV: Use the Quade test or ranked ANCOVA.
When violated for covariate: If the covariate is binary (0/1), include it as a categorical factor rather than a continuous covariate (use two-way ANOVA or ANCOVA with the binary variable as a blocking factor).
4.5 Homogeneity of Regression Slopes ⭐ CRITICAL
The most important ANCOVA-specific assumption: the within-group regression slope of on must be the same in all groups:
Why this matters: ANCOVA uses a single pooled slope to adjust all group means. If the true slopes differ across groups, the single adjustment is incorrect for at least some groups — the adjusted means are meaningless.
Conceptually: The homogeneity of regression slopes assumption requires that the covariate-DV relationship is parallel across groups. Heterogeneous slopes indicate a covariate × group interaction — the effect of the covariate on the DV differs depending on group membership.
How to check:
-
Test the covariate × group interaction in a separate model:
where represents the deviation of group 's slope from the common slope.
-
The interaction -test (: all ) tests homogeneity of slopes.
-
Rule: If the interaction is significant at , the homogeneity of regression slopes assumption is violated and standard ANCOVA should not be used.
-
Visual check: Plot the regression lines of on separately for each group. Lines should be approximately parallel.
When violated:
- Do not use standard ANCOVA.
- Use Johnson-Neyman analysis (see Section 13.3) to identify regions of the covariate where groups differ significantly.
- Use moderated regression with the group × covariate interaction term.
- Report the interaction as a finding in its own right.
4.6 Independence of Covariate and Treatment (Group Membership)
For ANCOVA to provide valid adjusted means, the covariate must be independent of (i.e., not caused by) the treatment. Specifically:
-
In randomised experiments: The covariate must be measured before random assignment. A pre-test score measured before treatment satisfies this assumption. Post-randomisation covariates (measured after treatment begins) may be influenced by the treatment, making adjustment inappropriate.
-
In observational studies: Groups systematically differ on the covariate by design (e.g., younger vs. older participants). ANCOVA adjusts for this, but the adjustment is conditional on the model being correctly specified.
How to check:
- Verify that the covariate was measured before treatment.
- Check whether groups differ significantly on the covariate: run a one-way ANOVA
with the covariate as the DV and the IV as the grouping factor.
- In randomised experiments: this test should be non-significant (covariate balance is expected from randomisation). A significant result may indicate randomisation failure.
- In observational studies: this test will typically be significant (groups differ on the covariate by design). ANCOVA provides statistical control, not a substitute for randomisation.
When violated (covariate influenced by treatment):
- Removing the variance explained by a post-treatment covariate may remove part of the treatment effect itself — over-controlling bias.
- Use causal diagram analysis (DAG) to determine appropriate covariate adjustment.
4.7 Linearity of Covariate-DV Relationship
ANCOVA assumes the relationship between the covariate and DV is linear within each group. Non-linear relationships are not fully removed by the linear adjustment, leaving residual covariate variance in the error term.
How to check:
- Scatterplots of vs. within each group: relationship should appear linear.
- Residual plots: Plot adjusted residuals vs. covariate scores. A U-shaped or inverted-U pattern indicates non-linearity.
- Polynomial test: Add to the ANCOVA model and test whether it contributes significantly. If is significant, the linear assumption may be violated.
When violated:
- Add a quadratic term to the ANCOVA model (polynomial ANCOVA).
- Use spline regression for flexible non-linear adjustment.
- Apply a transformation to the covariate (e.g., log for right-skewed covariates).
4.8 Reliability of the Covariate
ANCOVA assumes the covariate is measured without error. In practice, all psychological and behavioural measures contain measurement error. Measurement error in the covariate causes incomplete adjustment — residual confounding remains even after ANCOVA.
Consequences of covariate unreliability:
- Under-adjustment: The adjusted means do not fully reflect what group means would be at a common true covariate score.
- In randomised experiments: small downward bias in power (measurement error in the covariate leaves some adjustable variance in the error term).
- In observational studies: systematic bias in adjusted means — groups that score higher on the unreliable covariate will be over-adjusted; lower-scoring groups will be under-adjusted. This can produce misleading conclusions about treatment effects.
Remedy:
- Use reliability-corrected ANCOVA (Porter & Raudenbush, 1987).
- Report the reliability (Cronbach's , test-retest ) of the covariate.
- Acknowledge unreliability as a limitation when it is low ().
4.9 Absence of Influential Outliers
Outliers on the covariate or DV can distort and the adjusted means substantially.
How to check:
- Cook's distance for each observation in the ANCOVA regression model: flags influential observations.
- Leverage values (): High leverage points have extreme covariate scores that pull the regression slope.
- Studentised deleted residuals: flags potential outliers on the DV after covariate adjustment.
- Scatterplots of vs. with group labels to visually identify extreme points.
4.10 Assumption Summary Table
| Assumption | Description | How to Check | Remedy if Violated |
|---|---|---|---|
| Normality | Adjusted residuals | Shapiro-Wilk, Q-Q plot | Quade test; transform DV |
| Homoscedasticity | Equal adjusted within-group variances | Levene's (on residuals) | Heteroscedastic ANCOVA |
| Independence | Observations independent within and across groups | Design review | Multilevel ANCOVA |
| Interval scale (DV & CV) | Both DV and covariate have equal-interval properties | Measurement theory | Ranked ANCOVA; Quade test |
| Homogeneity of regression slopes | Same in all groups | Interaction F-test; parallel scatterplots | Johnson-Neyman; moderated regression |
| Independence of covariate and treatment | Covariate not caused by treatment | Covariate balance test; timing of measurement | Use pre-treatment covariates only |
| Linearity | Linear -on- relationship within groups | Scatterplots; residual plots; polynomial test | Add ; transform covariate |
| Covariate reliability | Covariate measured without substantial error | Report reliability coefficient | Reliability-corrected ANCOVA |
| No outliers | No extreme influential observations | Cook's , leverage, studentised residuals | Investigate; report sensitivity analysis |
5. Variants of ANCOVA
5.1 Standard One-Way ANCOVA
The default ANCOVA with a single continuous covariate, assuming homogeneity of regression slopes, normality of adjusted residuals, and homoscedasticity. Uses the pooled within-group regression slope for adjustment. This is appropriate when all assumptions are met.
5.2 ANCOVA with Multiple Covariates
When two or more covariates are available and each contributes unique variance to the DV, including all of them in ANCOVA maximises power. The mathematical extension uses multiple regression within the GLM framework (Section 3.11).
Guidelines for multiple covariates:
- Include only theoretically motivated covariates chosen before data collection.
- Avoid including covariates that are highly correlated with each other (multicollinearity degrades slope estimation).
- Each additional covariate costs one ; the power benefit must exceed this cost ().
- With covariates and small , ANCOVA can become over-parameterised.
5.3 Welch-Type Heteroscedastic ANCOVA
Analogous to Welch's one-way ANOVA, this variant relaxes the assumption of equal adjusted within-group variances. It uses group-specific variance estimates in the F-test denominator, with Welch-Satterthwaite degrees of freedom correction.
DataStatPro implements heteroscedastic ANCOVA using the HC3 heteroscedasticity- consistent variance estimator for the group effect test.
Use when: Levene's test on adjusted residuals is significant (especially with unequal ).
5.4 ANCOVA with Categorical Covariate (Blocking Factor)
When the "covariate" is categorical (e.g., site, school, gender), it functions as a blocking factor rather than a continuous covariate. This is handled as a two-way ANOVA (main effect of group + main effect of block) rather than ANCOVA.
DataStatPro handles this automatically: selecting a categorical variable as a covariate prompts the user to reclassify it as a blocking factor in a factorial ANOVA design.
5.5 ANCOVA for Pre-Post Designs (Pre-Test as Covariate)
The most common ANCOVA application in clinical and educational research:
- DV = post-test score
- Covariate = pre-test score (same measure, administered before treatment)
- IV = treatment group
This design removes pre-existing individual differences (captured by the pre-test) from the error term, isolating treatment effects on change from baseline while controlling for regression to the mean.
Advantages over gain score analysis:
- More power when the pre-post correlation is moderate to high ().
- Corrects for regression to the mean more effectively than gain scores.
- Adjusted means are interpretable as "what the post-test would be if groups had equal pre-test scores."
5.6 Johnson-Neyman ANCOVA (Heterogeneous Slopes)
When the homogeneity of regression slopes assumption is violated (significant group × covariate interaction), Johnson-Neyman analysis identifies the region of the covariate where the group difference is statistically significant and the region where it is not.
The Johnson-Neyman boundary point(s) are the covariate values at which the adjusted group difference transitions from significant to non-significant. Between groups:
DataStatPro computes Johnson-Neyman regions numerically and displays them as a floodlight plot (significance region shaded along the covariate axis).
5.7 Choosing Between Variants
| Condition | Recommended Test |
|---|---|
| Normal, equal slopes, equal variances | Standard ANCOVA |
| Normal, equal slopes, unequal variances | Heteroscedastic ANCOVA (HC3) |
| Non-normal, small | Quade test or ranked ANCOVA |
| Unequal slopes across groups | Johnson-Neyman analysis / moderated regression |
| Multiple theoretically-motivated covariates | ANCOVA with multiple covariates |
| Pre-post design, same DV measured twice | ANCOVA (pre-test as covariate) |
| Categorical covariate | Two-way ANOVA (blocking factor) |
6. Using the ANCOVA Calculator Component
The ANCOVA Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting ANCOVA and its alternatives.
Step-by-Step Guide
Step 1 — Select "ANCOVA"
From the "Test Type" dropdown, choose:
- ANCOVA (Standard): Equal regression slopes and equal variances assumed.
- ANCOVA (Heteroscedastic): HC3 variance-corrected; no equal variance assumption.
- ANCOVA (Auto): Runs standard ANCOVA, then switches to heteroscedastic if Levene's test on adjusted residuals is significant.
- Quade Test: Non-parametric ANCOVA alternative.
Step 2 — Input Method
Choose how to provide the data:
- Raw data (long format): Three or more columns — DV values, group membership, and covariate(s). DataStatPro computes all statistics, runs all assumption checks, and generates full output automatically.
- Raw data (wide format): One column per group for DV, plus covariate column(s). DataStatPro converts to long format.
- Summary statistics: Enter , , , , , , and within-group covariance for each group. Full assumption checks (especially homogeneity of slopes) are limited; inferential statistics and effect sizes are computed.
Step 3 — Specify Variables
- Dependent Variable (DV): Select the continuous outcome column.
- Independent Variable (IV / Group): Select the categorical grouping column.
- Covariate(s): Select one or more continuous covariate columns. For multiple covariates, select all relevant columns; DataStatPro enters them simultaneously in the GLM.
- Group Labels: Enter descriptive names for each group level.
Step 4 — Select Assumption Checks
DataStatPro automatically runs and displays:
- ✅ Homogeneity of regression slopes test (group × covariate interaction F-test) — most critical ANCOVA assumption.
- ✅ Shapiro-Wilk test on adjusted residuals.
- ✅ Levene's test on adjusted residuals for homoscedasticity.
- ✅ Brown-Forsythe test on adjusted residuals.
- ✅ Q-Q plot of adjusted residuals.
- ✅ Scatterplot of DV vs. covariate with separate regression lines per group (visual check for parallel slopes).
- ✅ Linearity check: Residual vs. covariate plot; optional polynomial test.
- ✅ Cook's distance and leverage plot for influential observations.
- ✅ Covariate balance test (one-way ANOVA with covariate as DV).
- ✅ Variance ratio () with warning if .
Step 5 — Select Post-Hoc Tests
When the omnibus F is significant, select post-hoc tests on adjusted means:
- Tukey HSD on adjusted means (balanced designs, equal variances; default).
- Tukey-Kramer on adjusted means (unbalanced, equal variances).
- Games-Howell on adjusted means (unequal variances; use with heteroscedastic ANCOVA).
- Bonferroni on adjusted means (conservative; any design).
- Holm-Bonferroni on adjusted means (less conservative than Bonferroni).
- Dunnett's on adjusted means (all vs. one control group).
- Scheffé on adjusted means (all possible contrasts).
- Custom planned contrasts on adjusted means (specify contrast weights ).
Step 6 — Select Effect Sizes
- ✅ Partial (bias-corrected; primary for group effect).
- ✅ Partial (alternative bias correction).
- ✅ Partial (biased; provided for comparison and journal requirements).
- ✅ Cohen's (for power analysis).
- ✅ for covariate (variance explained by covariate, partialling group).
- ✅ 95% CIs for and via non-central F-distribution.
- ✅ Cohen's with 95% CI for each post-hoc pairwise comparison on adjusted means.
Step 7 — Select Display Options
- ✅ Full ANCOVA source table with , df, (covariate row + group row + error row).
- ✅ Unadjusted and adjusted means table (, , , , 95% CI per group).
- ✅ Covariate statistics (, , , , ).
- ✅ Effect size table (, , , ) with CIs.
- ✅ Post-hoc comparison table (adjusted mean differences, , adjusted , , 95% CI).
- ✅ Assumption test results panel (colour-coded: green/yellow/red).
- ✅ Scatterplot with per-group regression lines (parallel slopes check).
- ✅ Adjusted means plot with 95% CIs (EMM plot).
- ✅ Q-Q plot of adjusted residuals.
- ✅ Cook's distance plot.
- ✅ Unadjusted vs. adjusted means comparison plot.
- ✅ Johnson-Neyman floodlight plot (if slopes are heterogeneous).
- ✅ Power curve: power vs. for observed .
- ✅ APA 7th edition-compliant results paragraph (auto-generated).
Step 8 — Run the Analysis
Click "Run ANCOVA". DataStatPro will:
- Test homogeneity of regression slopes; warn if violated and offer Johnson-Neyman analysis as an alternative.
- Compute the full ANCOVA source table (adjusted SS, adjusted MS, F-ratios, p-values).
- Run all assumption tests and display colour-coded warnings.
- Compute all adjusted group means () and their standard errors.
- Compute all effect sizes with exact non-central F-based CIs.
- Run all selected post-hoc tests on adjusted means with adjusted p-values and individual .
- Generate all visualisations.
- Auto-generate the APA-compliant results paragraph.
7. Full Step-by-Step Procedure
7.1 Complete Computational Procedure
This section walks through every computational step for ANCOVA, from raw data to a complete APA-style conclusion. A single covariate () is assumed.
Given: groups, DV , covariate , , . Total .
Step 1 — State the Hypotheses
At least one adjusted population mean differs from the others.
Choose (default: ).
Step 2 — Compute Descriptive Statistics per Group
For each group , compute for both and :
Grand means:
Step 3 — Check Assumption: Homogeneity of Regression Slopes
Fit the full interaction model:
Test : all using the interaction F-test.
If : stop standard ANCOVA; use Johnson-Neyman or moderated regression instead. Report this finding.
If : proceed with standard ANCOVA.
Step 4 — Compute Within-Group Sums of Squares and Cross-Products
Step 5 — Compute Total Sums of Squares and Cross-Products
Step 6 — Compute Between-Group Sums of Squares and Cross-Products
Step 7 — Compute the Pooled Within-Group Regression Coefficient
Step 8 — Compute Adjusted Sums of Squares
Adjusted within-groups SS (error):
Adjusted total SS:
Adjusted between-groups SS:
Covariate SS:
Step 9 — Compute Degrees of Freedom
(for single covariate; in general)
(for single covariate; in general)
(for single covariate; in general)
Step 10 — Compute Mean Squares and F-Ratios
with
with
Step 11 — Compute Adjusted Group Means
Step 12 — Check Remaining Assumptions
- Run Shapiro-Wilk on adjusted residuals.
- Run Levene's test on adjusted residuals.
- Run linearity check (residual vs. covariate plot).
- Inspect Cook's distances for influential observations.
- Run covariate balance test (ANOVA with covariate as DV).
Step 13 — Compute Effect Sizes
Partial eta squared:
Partial omega squared (bias-corrected, preferred):
Partial epsilon squared:
Cohen's :
Step 14 — Compute 95% CI for
Using the non-central F-distribution with and . DataStatPro performs this computation numerically (see Section 8.5 for details).
Step 15 — Conduct Post-Hoc Tests on Adjusted Means (if F significant)
Select the appropriate post-hoc test (Section 9). Compute pairwise differences of adjusted means, standard errors, adjusted p-values, and individual Cohen's for each pair.
Step 16 — Interpret and Report
Combine all results into an APA-compliant report (Section 13.8).
8. Effect Sizes for ANCOVA
8.1 Partial vs. Total Effect Sizes in ANCOVA
In ANCOVA, effect sizes are partial — they express the proportion of variance explained by group membership after removing the variance explained by the covariate. Partial effect sizes are appropriate because the covariate is not the substantive effect of interest; it is only a control variable.
Important distinction:
- Total (non-partial): Proportion of total variance in explained by group membership. This ignores the covariate and is equivalent to the ANOVA effect size calculated as if the covariate were not in the model.
- Partial : Proportion of DV variance not explained by the covariate that is explained by group membership. Always total .
- Always report partial effect sizes for ANCOVA and label them explicitly as partial.
8.2 Partial Eta Squared () — Common but Biased
is the proportion of adjusted DV variance accounted for by group membership. It is the most commonly reported ANCOVA effect size (default in SPSS and most software).
Critical limitation: is positively biased, overestimating the true population partial effect, particularly in small samples with many groups.
From F (direct computation):
⚠️ Report only when explicitly required by a journal or for historical comparison. Always report (or ) as the primary effect size and label as "biased" in your manuscript.
8.3 Partial Omega Squared () — Preferred
is the bias-corrected estimate of the population partial proportion of variance explained. It is the recommended primary effect size for ANCOVA.
Properties:
- Can be slightly negative in small samples when the true effect is zero — report as 0 by convention.
- Always .
- Converges to as .
- Accounts for the reduction in due to the covariate.
From F (approximate):
8.4 Partial Epsilon Squared () — Alternative Correction
Properties:
- Always: .
- Slightly less bias-correction than .
- Computationally simpler (no in denominator).
8.5 Cohen's — For Power Analysis
or
Cohen's is used as the effect size input for ANCOVA power analysis. It represents the ratio of between-groups adjusted SD to within-groups adjusted SD.
Benchmarks: Small = 0.10, Medium = 0.25, Large = 0.40 (Cohen, 1988).
8.6 Cohen's for Pairwise Comparisons
For each pairwise comparison of adjusted means, report Cohen's :
Using as the standardiser is preferred because it is the ANCOVA-based estimate of the common within-group SD after covariate adjustment.
95% CI for the pairwise adjusted mean difference:
Note: The CI is wider than in ANOVA because of the additional uncertainty from groups potentially differing on the covariate.
8.7 Variance Explained by the Covariate
The partial for the covariate quantifies how much of the adjusted DV variance the covariate explains:
This is the squared semi-partial correlation between the covariate and DV, partialling the group effect.
Proportion of within-group variance explained (squared pooled within-group correlation):
8.8 Comparison of ANOVA vs. ANCOVA Effect Sizes
ANCOVA effect sizes (, ) are generally larger than ANOVA effect sizes (, ) on the same data, because the denominator (error variance) is reduced by covariate adjustment. This inflated appearance is appropriate — it reflects the genuine increase in precision from including the covariate. However, it is not valid to compare effect sizes across ANOVA and ANCOVA analyses without acknowledging this difference.
Power improvement from covariate (approximate):
For (): Power ratio — ANCOVA has approximately 40% more power than ANOVA.
9. Post-Hoc Tests and Planned Contrasts
9.1 Post-Hoc Tests in ANCOVA Are Applied to Adjusted Means
The critical distinction between ANOVA and ANCOVA post-hoc testing is that in ANCOVA, all pairwise comparisons are made on the adjusted means (), not the observed means. Using observed means for post-hoc tests after a significant ANCOVA F-test is incorrect and invalidates the comparison.
Standard errors for pairwise adjusted mean differences include an additional term for covariate mean differences between groups (Section 8.6).
9.2 Tukey HSD on Adjusted Means
Tukey's HSD for ANCOVA balanced designs:
The studentised range distribution uses (ANCOVA error df) rather than (ANOVA error df).
For unequal sample sizes (Tukey-Kramer extension):
Declare groups and different if $|\bar{y}{j(adj)} - \bar{y}{k(adj)}|
\text{HSD}_{jk}$.
9.3 Games-Howell on Adjusted Means
When Levene's test on adjusted residuals is significant, use Games-Howell with group-specific variance estimates:
with Welch-Satterthwaite df.
9.4 Bonferroni and Holm-Bonferroni on Adjusted Means
Bonferroni: Compare each pairwise adjusted-mean p-value to where .
Holm-Bonferroni (preferred): Sequential procedure applied to sorted p-values from adjusted mean comparisons. Uniformly more powerful than Bonferroni.
9.5 Dunnett's Test on Adjusted Means
When the primary interest is comparing experimental groups to a single control group, Dunnett's test on adjusted means provides optimal power:
Compared against Dunnett's critical values with .
9.6 Planned Contrasts on Adjusted Means
For pre-planned comparisons, the contrast on adjusted means is:
with .
Contrast SS and F:
with
For orthogonal contrasts on adjusted means, the orthogonality condition is:
(Slightly different from ANOVA orthogonality due to covariate terms.)
DataStatPro computes these automatically when contrast weights are entered.
10. Confidence Intervals
10.1 95% CI for Each Adjusted Group Mean
The 95% CI for the adjusted population mean :
Note: When (group has the same covariate mean as the grand mean), the CI simplifies to the ANOVA formula. When groups differ on the covariate, the CI is wider — reflecting additional uncertainty in the adjustment.
10.2 95% CI for Pairwise Adjusted Mean Differences
The 95% CI for :
Using Tukey-adjusted critical values (simultaneous CIs):
Replace with .
10.3 95% CI for the Regression Coefficient
The CI for the pooled within-group slope:
10.4 95% CI for
Using the non-central F-distribution (computed numerically by DataStatPro).
The non-centrality parameter:
Find , such that:
Convert to : ; .
Then apply bias correction to obtain bounds.
10.5 CI Width and the Covariate
The CI for adjusted means is wider than the CI for observed means in ANOVA unless groups have equal covariate means. The additional width from the covariate term:
This additional uncertainty vanishes in randomised experiments where random assignment ensures on average. In observational studies, large covariate mean differences between groups produce substantially wider CIs for adjusted means.
11. Power Analysis and Sample Size Planning
11.1 Power Advantage of ANCOVA over ANOVA
The power advantage of ANCOVA over one-way ANOVA depends on the within-group correlation between the covariate and DV:
Effective sample size multiplier:
ANCOVA with participants has equivalent power to ANOVA with participants. For : — ANCOVA with 100 participants has the same power as ANOVA with 156 participants.
However, ANCOVA costs one per covariate, slightly reducing this gain. The net effect on power:
For any , ANCOVA is more powerful.
11.2 A Priori Power Analysis for ANCOVA
Non-centrality parameter for ANCOVA:
Where is computed from the partial effect size.
Power computation (exact, using non-central F):
Where and .
Note: The critical F uses (ANCOVA df), not (ANOVA df).
11.3 Required Sample Size
For ANCOVA power analysis, specify:
- Expected (or Cohen's ) for the group effect on adjusted means.
- Expected within-group correlation between covariate and DV.
- Number of groups .
- Number of covariates .
- Desired power (; typically 0.80 or 0.90).
- Significance level (typically .05).
Required per group for 80% power (, one covariate):
| 0.10 | 0.010 | 0.30 | 315 | 268 | 236 |
| 0.10 | 0.010 | 0.60 | 211 | 180 | 159 |
| 0.25 | 0.059 | 0.30 | 50 | 43 | 38 |
| 0.25 | 0.059 | 0.60 | 34 | 29 | 26 |
| 0.40 | 0.138 | 0.30 | 20 | 17 | 15 |
| 0.40 | 0.138 | 0.60 | 13 | 12 | 10 |
| 0.50 | 0.200 | 0.50 | 11 | 10 | 9 |
| 0.60 | 0.265 | 0.50 | 8 | 7 | 6 |
All values are per group. Total . Higher requires fewer participants because the covariate removes more error variance.
11.4 Determining Effect Size Inputs
From prior ANOVA literature (convert ANOVA to ANCOVA ):
If a prior ANOVA found and you plan to add a covariate with expected :
(approximate)
Because the covariate removes within-group variance from the denominator, the partial effect size is amplified.
From pilot data: Run ANCOVA on the pilot sample and use the observed (with appropriate shrinkage, as pilot estimates are noisy).
From theory: Specify minimum practically meaningful differences between adjusted means and estimate .
11.5 Sensitivity Analysis for ANCOVA
The minimum detectable partial for a given , , , and 80% power:
(single covariate; approximate)
For , , :
Corresponding .
⚠️ Report sensitivity analysis for null ANCOVA results. The additional covariate df reduces minimum detectable effects compared to ANOVA — ANCOVA needs to be powered for the partial effect size, which may be larger than the corresponding ANOVA effect.
12. Non-Parametric Alternative: Quade and Ranked ANCOVA
12.1 When to Use Non-Parametric ANCOVA Alternatives
The Quade test and ranked ANCOVA are appropriate when:
- The DV is ordinal or severely non-normally distributed.
- There are extreme outliers on the DV that cannot be removed.
- The normality assumption for ANCOVA is severely violated with small .
- Homoscedasticity of adjusted residuals is seriously violated.
12.2 The Quade Test
The Quade test (Quade, 1967) is a non-parametric extension of ANCOVA for ranked data with a single covariate.
Procedure:
Step 1 — Rank the covariate: Rank all covariate scores from 1 to . Assign average ranks for ties.
Step 2 — Rank the DV within blocks: Using the ranked covariate to define blocks (sort participants by covariate rank and divide into blocks of approximately equal size), rank the DV scores within each block from 1 to .
Step 3 — Compute residuals within blocks: For each participant, compute the deviation of their DV rank from the expected rank under within their block.
Step 4 — Compute the Quade F-statistic:
Where:
and (weighted rank sums; = block weight proportional to block range; = rank of group in block ).
Step 5 — p-value:
approximately for large .
Effect size:
(analogous to for Kruskal-Wallis)
12.3 Ranked ANCOVA (General Approach)
A more flexible non-parametric alternative is to:
- Rank both the DV and the covariate (Conover & Iman, 1982).
- Run standard ANCOVA on the ranks using the ranked covariate as the covariate and the ranked DV as the dependent variable.
- The resulting F-test is approximately distribution-free for large samples.
This approach is simpler to implement, available in DataStatPro, and handles multiple covariates naturally.
Advantages over Quade test:
- Handles multiple covariates.
- No need to define blocks.
- Compatible with all standard ANCOVA post-hoc procedures applied to ranked data.
Limitations:
- Less powerful than Quade test for small samples.
- Effect size interpretation is on the rank scale.
12.4 Post-Hoc Tests for Quade and Ranked ANCOVA
After a significant Quade test, use Dunn-type pairwise comparisons on the rank residuals with Holm-Bonferroni correction. DataStatPro computes these automatically.
After a significant ranked ANCOVA, apply standard ANCOVA post-hoc tests to the ranked adjusted means with appropriate corrections.
12.5 Efficiency of Non-Parametric ANCOVA
For normal data, ranked ANCOVA has ARE relative to standard ANCOVA — negligible loss. For non-normal data (especially heavy-tailed or skewed distributions), ranked ANCOVA can be substantially more powerful.
13. Advanced Topics
13.1 ANCOVA as a Special Case of the General Linear Model
ANCOVA is a restricted version of the GLM:
Where is the design matrix containing group indicator columns (effect-coded or dummy-coded) and the covariate column(s). In matrix form:
Where is the residual maker matrix after partialling the covariate.
This GLM framework naturally handles:
- Unbalanced designs (unequal ).
- Multiple covariates.
- Factorial ANCOVA (multiple IVs with covariates).
- MANCOVA (multiple DVs with covariates).
13.2 Type I, II, and III Sums of Squares in ANCOVA
For balanced designs, Types I, II, and III SS give identical results. For unbalanced designs, they differ:
- Type I SS (sequential): Order-dependent; tests the group effect after the covariate, but also includes group-by-covariate overlap. Not recommended for ANCOVA.
- Type II SS: Tests the group effect after the covariate, ignoring the group × covariate interaction (assumes no interaction). Recommended when no interaction is present.
- Type III SS (partial): Tests the group effect after the covariate and all other terms in the model. Recommended as default for ANCOVA — matches the interpretation of "adjusted group means." DataStatPro uses Type III SS by default.
⚠️ Always report which type of SS was used. Most statistical software defaults to Type III; SPSS, R (car package), and DataStatPro all use Type III by default. Using Type I SS in unbalanced ANCOVA designs leads to incorrect group effect tests.
13.3 Johnson-Neyman Analysis for Heterogeneous Slopes
When the homogeneity of regression slopes assumption is violated, Johnson-Neyman (J-N) analysis identifies the specific values of the covariate at which the group difference in transitions between statistically significant and non-significant.
For groups with slopes and :
The adjusted mean difference as a function of is:
The J-N boundary is where :
(Where are variance-covariance terms from the heterogeneous ANCOVA model.)
DataStatPro computes J-N regions numerically and displays:
- Floodlight plot: The covariate range is displayed on the x-axis; regions where the group difference is significant are shaded.
- Region of significance: Exact J-N boundary values with 95% confidence bands.
13.4 ANCOVA with Multiple Covariates: Stepwise vs. Simultaneous Entry
When multiple covariates are available:
Simultaneous entry (recommended): All theoretically-motivated covariates are entered together in a single ANCOVA model. This is appropriate when all covariates are chosen based on theory before data collection.
Hierarchical entry: Covariates are entered in theoretically-motivated blocks (e.g., demographic covariates first, then psychological covariates). Tests whether each block explains incremental variance after prior blocks.
Stepwise entry (not recommended): Automated variable selection based on statistical criteria (e.g., forward, backward, stepwise). This inflates Type I error, produces unstable models, and overfits the sample. DataStatPro does not support stepwise ANCOVA but supports hierarchical block entry.
13.5 Lord's Paradox
Lord's Paradox (Lord, 1967) is a famous conceptual puzzle arising in pre-post ANCOVA designs. It demonstrates that two different but seemingly valid analyses can produce contradictory conclusions:
- Analysis 1 (gain scores): No significant difference in change scores between groups.
- Analysis 2 (ANCOVA with pre-test as covariate): Significant group difference after adjusting for pre-test.
Resolution: The two analyses answer different questions:
- Gain score analysis: Do groups change by the same absolute amount?
- ANCOVA: Would groups differ on the post-test if they had the same pre-test score?
In randomised experiments with equal pre-test means, both analyses give equivalent results. In observational studies with pre-existing group differences, the two analyses address fundamentally different causal questions. Use directed acyclic graphs (DAGs) to determine which question is scientifically appropriate.
13.6 ANCOVA in Randomised vs. Observational Studies
| Feature | Randomised Experiment | Observational Study |
|---|---|---|
| Purpose of covariate | Increase power | Statistical control for confounding |
| Covariate-group independence | Guaranteed by randomisation | Violated by design |
| Interpretation of adjusted means | Interpretable causal effect | Conditional association; residual confounding possible |
| Measurement error consequences | Minor power reduction | Systematic bias (under-adjustment) |
| Homogeneity of slopes importance | Standard check | Critical; more likely to be violated |
| Validity of causal inference | Strong (covariate adds precision) | Weak without strong assumptions |
13.7 ANCOVA for Factorial Designs (Factorial ANCOVA)
When there are two or more IVs and one or more covariates, Factorial ANCOVA (also called Two-Way ANCOVA) extends the model:
Where = effect of IV level , = effect of IV level , = interaction, and = covariate slope.
The homogeneity of regression slopes assumption requires the covariate slope to be homogeneous across all cells, not just across levels of one IV.
DataStatPro handles factorial ANCOVA within its GLM engine; see the Factorial ANOVA tutorial for the base factorial framework.
13.8 Bayesian ANCOVA
Bayesian ANCOVA computes Bayes Factors comparing models that do and do not include the group effect, while including the covariate in both models:
This tests whether there is evidence for group differences beyond what the covariate explains. DataStatPro implements Bayesian ANCOVA using the BayesFactor method (Rouder et al., 2012) with Cauchy priors on standardised group effects ().
Reporting: "A Bayesian ANCOVA (Cauchy prior, r = \sqrt{2}/2}) with [covariate name] as covariate provided [strong/moderate/anecdotal] evidence for [the group effect / the null hypothesis] after covariate adjustment, [value]."
13.9 Reporting ANCOVA According to APA 7th Edition
Full minimum reporting set (APA 7th ed.):
- Statement of which test (standard ANCOVA or heteroscedastic), the covariate(s), and the rationale for including each covariate.
- Homogeneity of regression slopes test result.
- Levene's test result on adjusted residuals.
- Covariate -test, coefficient, and partial for the covariate.
- Group [value], [value].
- [value] [95% CI: LB, UB].
- Which effect size was computed ( not just "partial effect size").
- Both unadjusted AND adjusted group means, SDs, and SEs for all groups.
- Post-hoc test results on adjusted means with adjusted p-values and per pair.
- 95% CI for each significant pairwise adjusted mean difference.
14. Worked Examples
Example 1: Pre-Post ANCOVA — Therapy Type on Depression
A clinical researcher randomly assigns participants to three therapy conditions: CBT (), Behavioural Activation (BA; ), and Waitlist Control (WL; ). Pre-treatment PHQ-9 scores are recorded as the covariate; post- treatment PHQ-9 scores are the DV.
Descriptive statistics:
| Group | (pre) | (post) | ||||
|---|---|---|---|---|---|---|
| CBT | 30 | 18.20 | 3.80 | 9.80 | 4.20 | |
| BA | 30 | 17.90 | 4.10 | 11.40 | 4.60 | |
| WL | 30 | 18.40 | 3.90 | 16.30 | 5.10 |
Grand means:
Homogeneity of regression slopes test:
Testing group × covariate interaction: , — slopes are homogeneous. Proceed with standard ANCOVA.
Within-group SS and cross-products:
Pooled within-group regression coefficient:
Adjusted within-groups SS:
Total SS and cross-products:
(computed from raw data)
Adjusted total SS:
Adjusted between-groups SS:
Covariate SS:
Degrees of freedom:
; ;
Mean squares and F-ratios:
,
,
ANCOVA source table:
| Source | SS | MS | |||
|---|---|---|---|---|---|
| Covariate (Pre-PHQ-9) | |||||
| Group (Therapy) | |||||
| Error | |||||
| Total (adjusted) |
Adjusted group means:
Standard errors of adjusted means:
CBT:
BA:
WL:
(Adjustments are minimal here because pre-test means are nearly equal — expected in a randomised design.)
Effect sizes:
95% CI for (non-central F, , , ):
95% CI for : (DataStatPro numerical)
;
Comparison with one-way ANOVA (without covariate):
From the One-Way ANOVA tutorial, the same data gave , .
ANCOVA gives , — the pre-test covariate ( pooled) substantially increased power and effect size.
Post-hoc tests (Tukey HSD on adjusted means):
For CBT vs. BA: ,
| Comparison | Adj. Diff | 95% CI | ||||
|---|---|---|---|---|---|---|
| CBT vs. BA | ||||||
| CBT vs. WL | ||||||
| BA vs. WL |
Where
Assumption checks:
Shapiro-Wilk (adjusted residuals): , — normality not violated.
Levene's test (adjusted residuals): , — homoscedasticity holds.
Covariate balance: , — groups do not differ on pre-PHQ-9 (expected from randomisation). ✅
APA write-up: "A one-way ANCOVA was conducted with therapy type (CBT, BA, Waitlist Control) as the independent variable, post-treatment PHQ-9 as the dependent variable, and pre-treatment PHQ-9 as the covariate. The homogeneity of regression slopes assumption was met (, ), and Levene's test indicated equal variances across groups on the adjusted residuals (, ). The covariate (pre-treatment PHQ-9) was significantly related to the outcome after controlling for therapy type, , , . After controlling for pre-treatment depression, there was a significant effect of therapy type on post-treatment PHQ-9, , , [95% CI: 0.261, 0.482], indicating a large effect. Adjusted post-treatment means were: CBT (, ), BA (, ), and Waitlist Control (, ). Tukey HSD post-hoc comparisons on adjusted means revealed that both CBT and BA produced significantly lower adjusted post-treatment scores than the Waitlist Control (both ), with large effects ( [95% CI: 1.19, 2.44]; [95% CI: 0.67, 1.90]). CBT and BA did not differ significantly, [95% CI: 0.09, 1.15], ."
Example 2: Observational ANCOVA — Teaching Methods with SES Covariate
An educational researcher compares three teaching methods (Lecture, Flipped, Project-based) on standardised test scores in a non-randomised study. Socioeconomic Status (SES) index (0–100) is included as a covariate because it is known to correlate with academic outcomes and groups were self-selected into schools with different SES distributions.
per group; ; .
Descriptive statistics:
| Group | (SES) | (score) | ||||
|---|---|---|---|---|---|---|
| Lecture | 35 | 42.80 | 12.40 | 64.20 | 9.80 | |
| Flipped | 35 | 58.60 | 11.90 | 72.40 | 10.20 | |
| Project | 35 | 71.30 | 10.80 | 78.10 | 11.40 |
Covariate balance test:
ANOVA with SES as DV: , — groups differ significantly on SES. This is expected in an observational design. ANCOVA will adjust for these pre-existing SES differences.
Homogeneity of regression slopes: , — slopes are homogeneous. Standard ANCOVA is appropriate.
Pooled within-group sums:
Unadjusted between-groups SS: ;
Total adjusted SS (computed from full data):
;
,
Compare with unadjusted ANOVA:
; ; ,
Note: Here the unadjusted F is slightly larger because the observed group differences on the DV are partly inflated by SES differences. ANCOVA removes the SES contribution, giving a more accurate (though still significant) test.
Adjusted group means:
Effect sizes:
Post-hoc tests (Tukey HSD on adjusted means):
With adjusted means of 71.15, 71.91, and 71.64 — these are very close.
| Comparison | Adj. Diff | 95% CI | ||
|---|---|---|---|---|
| Lecture vs. Flipped | ||||
| Lecture vs. Project | ||||
| Flipped vs. Project |
After controlling for SES, none of the pairwise adjusted mean differences are statistically significant. The apparent differences in observed means were largely due to pre-existing SES differences between the self-selected school groups.
APA write-up: "A one-way ANCOVA was conducted to examine the effect of teaching method (Lecture, Flipped, Project-based) on standardised test scores, with SES index as the covariate to control for pre-existing socioeconomic differences between schools. As expected in this non-randomised design, groups differed significantly on the covariate (, ). The homogeneity of regression slopes assumption was satisfied (, ). The covariate was significantly related to test scores, , , . After controlling for SES, there was a statistically significant effect of teaching method, , , [95% CI: 0.022, 0.196], indicating a medium effect. However, Tukey HSD post-hoc comparisons on the SES-adjusted means revealed no significant pairwise differences between any teaching method pair (all , range: 0.03–0.09). The adjusted means were nearly identical across methods (Lecture: ; Flipped: ; Project: ). These findings indicate that the observed differences in unadjusted test scores were largely attributable to SES differences between schools rather than to teaching method differences per se."
Example 3: Violated Homogeneity of Slopes — Johnson-Neyman Analysis
A researcher examines the effect of three exercise programmes (Aerobic, Resistance, Combined) on depression scores, using baseline fitness level as a covariate.
per group; .
Homogeneity of regression slopes test:
, — slopes are significantly heterogeneous.
Standard ANCOVA is not appropriate. Regression lines are not parallel across groups.
DataStatPro automatically switches to Johnson-Neyman analysis.
The scatterplot shows:
- Aerobic programme: strong positive slope () — less fit individuals benefit more.
- Resistance programme: weak slope () — similar benefit regardless of fitness.
- Combined programme: moderate slope ().
Johnson-Neyman regions:
- Fitness scores : Combined and Aerobic programmes are significantly better than Resistance.
- Fitness scores –: No significant programme differences.
- Fitness scores : Resistance programme significantly outperforms Aerobic for highly fit individuals.
APA write-up: "Preliminary testing indicated that the homogeneity of regression slopes assumption was violated (, ), indicating a significant exercise programme × fitness interaction. Consequently, standard ANCOVA was not conducted. Johnson-Neyman analysis was performed to identify regions of the fitness covariate where programme differences were statistically significant. Results indicated that for individuals with fitness scores below 42.3, the Combined and Aerobic programmes produced significantly lower depression scores than Resistance training. For fitness scores above 68.7, the Resistance programme was significantly more effective than the Aerobic programme. No significant programme differences emerged for fitness scores in the range 42.3–68.7 (representing approximately 54% of the sample). These findings suggest that optimal exercise programme selection depends on baseline fitness level."
Example 4: Non-Significant Result with Sensitivity Analysis
A nutritionist tests whether three diets (Mediterranean, Low-carb, Standard) produce different weight loss over 12 weeks, controlling for baseline BMI. per group (; ; ).
Results: , , [95% CI: 0.000, 0.115].
Homogeneity of slopes: , ✅
Levene's (adjusted residuals): , ✅
Covariate : , — baseline BMI is significantly related to weight loss ✅
Sensitivity analysis:
The study had 80% power to detect only. The observed is well below this threshold; the study was underpowered for small effects.
APA write-up: "A one-way ANCOVA with diet type as the IV, 12-week weight loss as the DV, and baseline BMI as the covariate revealed no significant effect of diet type after controlling for BMI, , , [95% CI: 0.000, 0.115]. Baseline BMI was a significant predictor of weight loss, , . All ANCOVA assumptions were satisfied. This study had 80% power to detect effects of (); the observed effect () falls below this detection threshold, indicating the study was underpowered to detect small diet effects. A larger sample ( per group for 80% power at ) would be required to draw conclusions about small-to-medium diet differences after BMI adjustment."
15. Common Mistakes and How to Avoid Them
Mistake 1: Not Testing the Homogeneity of Regression Slopes
Problem: Running ANCOVA without testing whether the within-group regression slopes are equal across groups. If slopes differ substantially, the ANCOVA F-test is invalid because a single pooled slope cannot adequately represent the covariate-DV relationship across all groups.
Solution: Always run the interaction F-test (group × covariate) before standard ANCOVA. Report the interaction test result. If it is significant (), do not use standard ANCOVA — use Johnson-Neyman analysis or moderated regression instead. DataStatPro flags this automatically with a red warning when the interaction is significant.
Mistake 2: Interpreting Adjusted Means Without Reporting Unadjusted Means
Problem: Reporting only adjusted (covariate-controlled) means without reporting observed (unadjusted) means. Readers need both to understand how the covariate changed the picture.
Solution: Always report a table containing both unadjusted (, ) and adjusted means (, ) for each group. Note the direction and magnitude of adjustment. Large adjustments signal that groups differed substantially on the covariate.
Mistake 3: Selecting Covariates Based on the Data (Post-Hoc Covariate Selection)
Problem: Including a covariate because it improves the significance of the group effect, or because it reduces a non-significant result to non-significance. This is p-hacking and produces results that do not replicate.
Solution: Specify covariates on theoretical grounds before data collection. Pre-register the covariate choice. If exploratory analyses suggest additional covariates, report them as exploratory and cross-validate in an independent sample.
Mistake 4: Using ANCOVA in Non-Randomised Designs as a Full Substitute for Randomisation
Problem: Claiming that ANCOVA "removes all confounding" in observational studies, yielding valid causal comparisons. ANCOVA controls only for the measured covariate and only if that covariate is measured without error and the linearity/slopes assumptions are met. Unmeasured confounders and measurement error in the covariate leave residual confounding.
Solution: Explicitly acknowledge the limitations of ANCOVA as a statistical control in non-randomised designs. Use causal language cautiously ("after adjusting for..." rather than "controlling for confounding from..."). Report reliability of the covariate. Conduct sensitivity analyses (e.g., E-value analysis for unmeasured confounding).
Mistake 5: Running Post-Hoc Tests on Unadjusted Means After ANCOVA
Problem: After a significant ANCOVA omnibus F, comparing groups using observed (unadjusted) means in pairwise t-tests or ANOVA post-hoc tests. This ignores the covariate adjustment and produces incorrect comparisons.
Solution: Always run post-hoc tests on adjusted means using the ANCOVA error term (, ). DataStatPro automatically applies post-hoc tests to adjusted means when run after ANCOVA.
Mistake 6: Including a Covariate That is Caused by the Treatment
Problem: Using a post-treatment variable (measured after the treatment began) as an ANCOVA covariate. If the treatment affected the covariate, adjusting for it removes part of the treatment effect — over-controlling bias that can reverse or eliminate genuine treatment effects.
Solution: Only include covariates measured before treatment began (or measured concurrently but logically independent of the treatment). Use causal diagrams (DAGs) to identify appropriate adjustment sets.
Mistake 7: Reporting Partial Effect Sizes Without Labelling Them as Partial
Problem: Reporting without labelling it as "partial" or distinguishing it from total . Partial is always larger than total in ANCOVA (because the covariate variance is removed from the denominator). Readers familiar only with ANOVA will overestimate the effect.
Solution: Always label partial effect sizes explicitly: " [value] (partial)" and note that this is the proportion of covariate-adjusted DV variance explained by group membership. Report total when comparing across ANOVA and ANCOVA analyses.
Mistake 8: Failing to Report the Covariate's Effect
Problem: Reporting only the group F-test from ANCOVA and ignoring the covariate F-test and slope. The covariate result is important for validating the ANCOVA approach (a non-significant covariate suggests it should not have been included) and for quantifying the covariate's relationship with the DV.
Solution: Always report: the covariate -test, degrees of freedom, p-value, the regression coefficient with its 95% CI, and the partial for the covariate. A non-significant covariate F-test () is a warning sign that the covariate may not be worth including.
Mistake 9: Applying ANCOVA When Groups Differ Substantially on the Covariate (Extrapolation Risk)
Problem: In observational studies where groups differ substantially on the covariate, the adjusted means are estimated at the grand mean of the covariate — a value that may lie outside the observed range of one or more groups. This constitutes extrapolation beyond the data, producing adjusted means that are model-dependent and potentially meaningless.
Solution: Check whether the grand mean covariate value falls within the observed range of each group's covariate scores. If not, consider restricting the analysis to participants within the common support region (matching or trimming). Report the covariate range for each group and acknowledge extrapolation concerns when they arise.
Mistake 10: Not Reporting Both Unadjusted and Adjusted Analyses for Observational Studies
Problem: In observational research, reporting only the ANCOVA results without showing the unadjusted ANOVA results makes it impossible for readers to assess how much the covariate changed the conclusions.
Solution: Report both unadjusted (ANOVA) and adjusted (ANCOVA) results in parallel, including both observed and adjusted group means. Explicitly describe the direction and magnitude of covariate adjustment and discuss what it implies about the group difference.
16. Troubleshooting
| Problem | Likely Cause | Solution |
|---|---|---|
| Homogeneity of slopes test is significant | True group × covariate interaction; different relationships in different groups | Use Johnson-Neyman analysis; report interaction as a finding; do not use standard ANCOVA |
| Adjusted means are outside the observed range | Large covariate mean differences between groups; extrapolation | Check common support; restrict to overlap region; acknowledge extrapolation |
| Non-significant group F in ANCOVA but significant in ANOVA | Covariate removes variance that mediated or confounded group differences | Report both analyses; discuss whether adjustment is appropriate (randomised vs. observational) |
| Significant group F in ANCOVA but not in ANOVA | Covariate reduces error variance substantially; ANCOVA is more powerful | Expected outcome; ANCOVA result is preferred when covariate assumptions are met |
| Adjusted means barely differ from unadjusted means | Groups have similar covariate means (especially in RCTs); little adjustment needed | This is fine and expected in randomised designs; ANCOVA still increases power via error reduction |
| is negative | True partial effect near zero; small sample; bias correction overshoots | Report as 0 by convention; note small or negligible effect; increase |
| Covariate F is non-significant | Covariate not linearly related to DV within groups | Consider whether covariate was correctly measured; adding it consumes without power gain — consider removing it |
| Very large with wide CI | Small (little within-group covariate variation) | Check covariate distribution; if groups are very similar on covariate, the slope is poorly estimated |
| Cook's distance flags many influential points | Extreme covariate or DV values; potentially meaningful outliers | Investigate each flagged observation; report analyses with and without influential points |
| Levene's test significant on adjusted residuals | Heteroscedastic groups after covariate adjustment | Use heteroscedastic ANCOVA (HC3); use Games-Howell for post-hoc tests |
| Groups overlap perfectly on adjusted means despite significant F | Very small pairwise differences with high power; omnibus driven by overall pattern | Report all pairwise comparisons; some effects may be very small but statistically significant with large |
| Ranked ANCOVA and standard ANCOVA give contradictory results | Non-normality or outliers distorting parametric results; ranked analysis more robust | Report both; prefer ranked ANCOVA when normality is violated; investigate outliers |
| Post-hoc tests all non-significant despite | Effect distributed across many similar pairwise differences; no single pair drives it | Inspect all adjusted means; the omnibus test can be sensitive to a pattern of small consistent differences across many pairs |
| Multiple covariates produce collinearity warnings | Covariates are highly intercorrelated; redundant information | Remove the least theoretically important correlated covariate; or use dimension reduction (PCA) as a single composite covariate |
| Adjusted much larger than unadjusted | Large covariate mean differences between groups; additional adjustment uncertainty | This is expected in unbalanced observational designs; report both SEs; acknowledge wide CIs |
| ANCOVA and gain score analysis reach different conclusions | Lord's Paradox; pre-test means differ between groups | Use causal diagram to determine which analysis addresses the research question; report both with explicit interpretation of each |
17. Quick Reference Cheat Sheet
Core ANCOVA Formulas
| Formula | Description |
|---|---|
| Pooled within-group regression coefficient | |
| Adjusted group mean | |
| Adjusted error SS | |
| Adjusted total SS | |
| Adjusted between-groups SS | |
| Covariate SS (reduction in error) | |
| ; ; | Degrees of freedom |
| Adjusted between-groups MS | |
| Adjusted error MS | |
| ANCOVA F-ratio for group effect | |
| F-ratio for covariate | |
| SE of adjusted mean | |
| p-value for group effect |
Effect Size Formulas
| Formula | Description |
|---|---|
| Partial eta squared (biased) | |
| Partial omega squared (preferred) | |
| Partial epsilon squared | |
| Cohen's (from ) | |
| from | |
| from (approximate) | |
| Cohen's for pairwise | |
| Pooled within-group | |
| Partial for covariate |
ANCOVA Source Table Template
| Source | SS | MS | |||
|---|---|---|---|---|---|
| Covariate(s) | [value] | ||||
| Between groups (adjusted) | [value] | ||||
| Error (adjusted) | |||||
| Total (adjusted) |
ANCOVA vs. ANOVA Comparison
| Feature | One-Way ANOVA | ANCOVA |
|---|---|---|
| Covariate | None | continuous covariates |
| Error | ||
| Error SS | ||
| Group means tested | Observed | Adjusted |
| Additional assumptions | None | Homogeneity of slopes; linearity; covariate independence |
| Power vs. ANOVA | Baseline | Higher when $ |
| Effect size metric | (partial) | |
| Post-hoc tests | On observed means | On adjusted means |
ANCOVA Reporting Checklist
| Item | Required |
|---|---|
| Statement of ANCOVA variant used (standard vs. heteroscedastic) | ✅ Always |
| Covariate(s) named with justification for inclusion | ✅ Always |
| Homogeneity of regression slopes test result | ✅ Always |
| Levene's test on adjusted residuals | ✅ Always |
| Shapiro-Wilk on adjusted residuals | ✅ When |
| Covariate , , , , CI, and partial | ✅ Always |
| Group , exact | ✅ Always |
| with 95% CI (primary partial effect size) | ✅ Always |
| (labelled as biased, partial) | ✅ When journals require it |
| Both unadjusted AND adjusted group means | ✅ Always |
| SDs (unadjusted) and SEs (adjusted) for all groups | ✅ Always |
| Sample sizes per group | ✅ Always |
| Post-hoc test name and correction method | ✅ When omnibus significant |
| Post-hoc tests applied to adjusted means | ✅ When omnibus significant |
| All pairwise adjusted mean differences with and | ✅ When omnibus significant |
| 95% CI for each pairwise adjusted mean difference | ✅ Recommended |
| Covariate balance test result | ✅ Always |
| Linearity check result | ✅ Recommended |
| Cook's distance / influential observations check | ✅ Recommended |
| alongside | ✅ Recommended |
| Cohen's for power analysis reference | ✅ When reporting power |
| Sensitivity analysis (min detectable effect) | ✅ For null results |
| Acknowledgement of covariate limitations (reliability, residual confounding) | ✅ For observational studies |
| Johnson-Neyman region if slopes heterogeneous | ✅ When slopes violated |
| Scatterplot with per-group regression lines | ✅ Strongly recommended |
| Adjusted means plot (EMM plot) with 95% CIs | ✅ Strongly recommended |
APA 7th Edition Reporting Templates
Standard ANCOVA (significant result):
"A one-way ANCOVA was conducted to examine the effect of [IV] on [DV], with [covariate name] included as a covariate [rationale: e.g., 'to control for pre-existing differences in...']. The homogeneity of regression slopes assumption was met ( [value], [value]). Levene's test on adjusted residuals indicated [equal / unequal] variances ( [value], [value]). The covariate was significantly / not significantly related to the outcome after controlling for group, [value], [value], [value] [95% CI: LB, UB], [value]. After controlling for [covariate], there was a significant effect of [IV] on [DV], [value], [value], [value] [95% CI: LB, UB], indicating a [small / medium / large] effect. Adjusted means were: [Group 1] ( [value], [value]), [Group 2] ( [value], [value]), [etc.]. [Post-hoc test] comparisons on adjusted means revealed that [describe pairwise results]."
Heteroscedastic ANCOVA (unequal variances):
"Due to significant heterogeneity of variance on adjusted residuals (Levene's [value], [value]), a heteroscedastic ANCOVA using HC3 variance estimation was applied. The test revealed a [significant / non-significant] effect of [IV] on [DV] after controlling for [covariate], [value], [value], [value] [95% CI: LB, UB]. Games-Howell post-hoc comparisons on adjusted means revealed [describe results]."
Violated homogeneity of slopes (Johnson-Neyman):
"Preliminary testing indicated that the homogeneity of regression slopes assumption was violated ( [value], [value]), indicating a significant [IV] × [covariate] interaction. Consequently, standard ANCOVA was not conducted. Johnson-Neyman analysis revealed that [group difference] was statistically significant when [covariate name] was below [J-N value] and above [J-N value], but not for [covariate] values in the range [lower, upper]."
Non-significant result with sensitivity analysis:
"A one-way ANCOVA revealed no significant effect of [IV] on [DV] after controlling for [covariate], [value], [value], [value] [95% CI: LB, UB]. Given the sample sizes ( [value] per group), this study had power to detect partial effects of [value] ( [value]) at 80% power. The observed [value] falls below this detection threshold."
Conversion Formulas
| From | To | Formula |
|---|---|---|
| , , | ||
| , , | (approx.) | |
| ANOVA + | ANCOVA (approx.) | |
| Pairwise adj. difference | ||
| (Hedges') |
Power Gain from Covariate Reference
| Error variance retained | Power multiplier (approx.) | ||
|---|---|---|---|
Power multiplier relative to ANOVA (ignores cost).
Assumption Checks Reference
| Assumption | Test | Action if Violated |
|---|---|---|
| Homogeneity of regression slopes | Group × covariate interaction F-test | Johnson-Neyman; moderated regression |
| Normality (adjusted residuals) | Shapiro-Wilk, Q-Q plot | Quade test; ranked ANCOVA; transform DV |
| Homoscedasticity (adjusted residuals) | Levene's, Brown-Forsythe | Heteroscedastic ANCOVA (HC3) + Games-Howell |
| Independence | Design review | Multilevel ANCOVA |
| Linearity | Scatterplot; residual vs. CV; polynomial F | Add ; transform covariate |
| Covariate independence from treatment | Covariate balance test; timing check | Use pre-treatment covariates only; acknowledge confounding |
| Covariate reliability | Report or | Reliability-corrected ANCOVA; report as limitation |
| No influential outliers | Cook's ; leverage; studentised residuals | Investigate; report sensitivity; Quade test |
Post-Hoc Test Selection Guide (ANCOVA)
| Condition | Recommended Test | Controls FWER |
|---|---|---|
| Balanced, equal adj. variances | Tukey HSD on adjusted means | ✅ Exactly |
| Unbalanced, equal adj. variances | Tukey-Kramer on adjusted means | ✅ Approximately |
| Unequal adj. variances | Games-Howell on adjusted means | ✅ Approximately |
| All groups vs. one control | Dunnett's on adjusted means | ✅ Optimal |
| Any design, conservative | Bonferroni on adjusted means | ✅ Conservative |
| Any design, less conservative | Holm-Bonferroni on adjusted means | ✅ Sequential |
| Pre-planned specific contrasts | Planned contrasts on adjusted means | ✅ Reduced |
| Heterogeneous slopes | Johnson-Neyman analysis | N/A (regions, not FWER) |
| Non-parametric DV | Dunn + Holm on rank residuals | ✅ Sequential |
Degrees of Freedom Reference
| Source | (1 covariate) | ( covariates) |
|---|---|---|
| Between groups (adjusted) | ||
| Covariate | ||
| Error (adjusted) | ||
| Total (adjusted) | ||
| Slopes interaction test |
Cohen's Benchmarks — ANCOVA Partial Effect Sizes
| Label | ||||
|---|---|---|---|---|
| Small | ||||
| Medium | ||||
| Large |
These benchmarks are identical to ANOVA benchmarks for partial effect sizes but apply to the covariate-adjusted variance proportion.
This tutorial provides a comprehensive foundation for understanding, conducting, and reporting ANCOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for accessible applied coverage of ANCOVA; Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth including regression slopes homogeneity and planned contrasts; Rutherford's "Introducing ANOVA and ANCOVA: A GLM Approach" (2001) for a focused GLM-framework treatment; Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for the Quade test and robust ANCOVA alternatives; Miller & Chapman (2001) in the Journal of Abnormal Psychology for a lucid discussion of misuse of ANCOVA in non-randomised designs; Senn (2006) in Statistics in Medicine for Lord's Paradox and the ANCOVA vs. gain score debate; Bauer & Curran (2005) in Psychological Methods for probing interactions and Johnson-Neyman regions; and Lakens (2013) in Frontiers in Psychology for the vs. discussion applied to ANCOVA. For feature requests or support, contact the DataStatPro team.