ANCOVA: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of covariance adjustment all the way through the mathematics, assumptions, effect sizes, post-hoc testing, non-parametric alternatives, interpretation, reporting, and practical usage of the Analysis of Covariance (ANCOVA) within the DataStatPro application. Whether you are encountering ANCOVA for the first time or seeking a rigorous, unified understanding of covariate-adjusted between-groups inference, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is ANCOVA?
The Mathematics Behind ANCOVA
Assumptions of ANCOVA
Variants of ANCOVA
Using the ANCOVA Calculator Component
Full Step-by-Step Procedure
Effect Sizes for ANCOVA
Post-Hoc Tests and Planned Contrasts
Confidence Intervals
Power Analysis and Sample Size Planning
Non-Parametric Alternative: Quade and Ranked ANCOVA
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into ANCOVA, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 From One-Way ANOVA to ANCOVA

One-Way ANOVA tests whether group means on a dependent variable (DV) differ beyond what chance alone would produce. However, ANOVA assumes that groups are equivalent on all variables except the independent variable (IV) — an assumption satisfied by random assignment in experiments, but rarely in observational research.

ANCOVA extends ANOVA by:

Statistically controlling for one or more continuous covariates (CVs) that are correlated with the DV.
Removing covariate-related variance from the error term, increasing statistical power.
Adjusting group means to what they would be if all groups had identical covariate scores — producing adjusted means (also called estimated marginal means).

1.2 What is a Covariate?

A covariate (also called a concomitant variable) is a continuous variable that:

Is correlated with the DV.
Is not manipulated by the researcher (it is measured, not assigned).
Is measured before the treatment or is logically prior to the treatment effect.

Examples:

Pre-test score as a covariate when the DV is a post-test score.
Age as a covariate when the DV is a cognitive outcome.
Baseline anxiety when the DV is post-treatment anxiety.
IQ when the DV is academic achievement.

The covariate should be chosen on theoretical grounds, not selected post-hoc because it "improves" the results. Including irrelevant covariates reduces power by consuming degrees of freedom.

1.3 The Two Goals of ANCOVA

ANCOVA serves two distinct but related purposes, and understanding which goal applies to your study is critical for correct interpretation:

Goal 1 — Increase Statistical Power (Experimental Designs)

In randomised experiments, groups are equal on average at baseline (including on the covariate). Including a covariate reduces $MS_{error}$ by removing variance explained by the covariate from the residual. This shrinks the denominator of the F-ratio, increasing power to detect treatment effects.

Goal 2 — Statistical Control (Quasi-Experimental and Observational Designs)

In non-randomised designs, groups may differ on the covariate at baseline. ANCOVA adjusts group means to a common covariate value, providing a partial statistical control for pre-existing differences. However, this control is imperfect and cannot fully substitute for randomisation.

⚠️ These two goals have different interpretational requirements. For Goal 1 (randomised experiments), ANCOVA assumptions are easily met and interpretation is straightforward. For Goal 2 (observational designs), the assumption of covariate independence from group membership is violated by design, requiring careful interpretational caveats about residual confounding.

1.4 The Regression Foundation of ANCOVA

ANCOVA is a special case of the General Linear Model (GLM):

$Y_i = \mu + \tau_j + \beta(X_i - \bar{X}) + \varepsilon_i$

Where:

$Y_i$ is the DV score for participant $i$ .
$\tau_j$ is the effect of group $j$ ( $\sum_j \tau_j = 0$ ).
$X_i$ is the covariate score for participant $i$ .
$\bar{X}$ is the grand mean of the covariate.
$\beta$ is the common within-group regression coefficient (slope) of $Y$ on $X$ .
$\varepsilon_i \sim \mathcal{N}(0, \sigma^2)$ is the residual error.

This is equivalent to a multiple regression model with group dummy codes and the covariate as predictors. The F-test for the group effect in ANCOVA tests whether groups differ after partialling out the covariate.

1.5 Variance Partitioning in ANCOVA

ANCOVA partitions the total sum of squares differently from ANOVA:

$SS_{total} = SS_{between(adj)} + SS_{covariate} + SS_{within(adj)}$

The covariate "absorbs" variance from the error term. The adjusted within-groups SS ( $SS_{within(adj)}$ ) is smaller than the unadjusted $SS_{within}$ , leading to a smaller $MS_{error}$ and greater power — provided the covariate is genuinely correlated with the DV.

1.6 Adjusted Means

The core output of ANCOVA is the adjusted group mean — the estimated group mean after removing the linear effect of the covariate:

$\bar{Y}_{j(adj)} = \bar{Y}_j - \hat{\beta}(\bar{X}_j - \bar{X}_{..})$

Where:

$\bar{Y}_j$ is the observed (unadjusted) mean of group $j$ .
$\hat{\beta}$ is the pooled within-group regression coefficient.
$\bar{X}_j$ is the mean covariate score in group $j$ .
$\bar{X}_{..}$ is the grand mean of the covariate.

The adjusted mean represents what the group mean would have been if all groups had the same average covariate score ( $\bar{X}_{..}$ ). These are also called Estimated Marginal Means (EMMs) and are the primary means for interpretation and post-hoc testing in ANCOVA.

1.7 The Homogeneity of Regression Slopes Assumption

Unlike ANOVA, ANCOVA carries a critical additional assumption: the within-group regression slope of $Y$ on $X$ must be the same in all $K$ groups. If the slope varies across groups, the covariate adjustment is non-uniform, the ANCOVA F-test is invalid, and an interaction model (covariate × group) is more appropriate. This is the most commonly violated and overlooked ANCOVA assumption.

1.8 ANCOVA vs. Gain Score Analysis

A common alternative to ANCOVA for pre-post designs is to compute gain scores (post − pre) and run a one-way ANOVA. The choice between ANCOVA and gain score analysis depends on the reliability and variability of the pre-test:

ANCOVA is preferred when the pre-test is highly reliable and when groups have different pre-test means (common in quasi-experimental designs).
Gain score ANOVA is preferred when pre-test scores are unreliable (Lord's paradox arises under certain conditions).
Both approaches are valid for fully randomised experiments; ANCOVA generally has more power.

2. What is ANCOVA?

2.1 The Core Idea

Analysis of Covariance (ANCOVA) is a parametric inferential procedure that combines one-way ANOVA with linear regression. It tests whether the adjusted means of $K \geq 2$ independent groups are simultaneously equal, after statistically controlling for one or more continuous covariates.

The ANCOVA omnibus null hypothesis:

$H_0: \mu_{1(adj)} = \mu_{2(adj)} = \cdots = \mu_{K(adj)}$

$H_1: \text{At least one adjusted mean } \mu_{j(adj)} \text{ differs from the others}$

The adjusted means are population means evaluated at the grand mean of the covariate:

$\mu_{j(adj)} = \mu_j - \beta(\mu_{X_j} - \mu_X)$

2.2 What ANCOVA Tests and Does Not Test

ANCOVA tells you:

Whether adjusted group mean differences are larger than expected by chance, after accounting for the covariate (omnibus test).
How much of the covariate-adjusted outcome variance is explained by group membership ( $\omega^2_p$ , $\eta^2_p$ ).
The direction and statistical significance of the covariate's relationship with the DV.
What group means would look like if groups were equated on the covariate.

ANCOVA does NOT tell you:

Which specific adjusted group means differ (requires post-hoc tests on adjusted means).
Whether the treatment caused the DV change in non-randomised designs (residual confounding may remain).
The effect size for individual pairwise adjusted mean differences (requires Cohen's $d_{adj}$ ).
Whether the homogeneity of regression slopes assumption holds (must be tested separately).

2.3 Design Requirements

For a one-way between-subjects ANCOVA, the design must satisfy:

One continuous DV (interval or ratio scale).
One categorical IV with $K \geq 2$ levels (groups).
One or more continuous covariates (interval or ratio scale), measured prior to or independently of the treatment.
Different participants in each group (independent samples).
Each participant contributes exactly one score to exactly one group.

2.4 ANCOVA in Context

Situation	Test
$K \geq 2$ groups, no covariates, normal, equal variances	One-Way ANOVA
$K \geq 2$ groups, no covariates, normal, unequal variances	Welch's One-Way ANOVA
$K \geq 2$ groups, one or more continuous covariates, normal	ANCOVA
$K \geq 2$ groups, covariate, unequal slopes across groups	ANCOVA with interaction (Johnson-Neyman)
$K \geq 2$ groups, covariate, non-normal or ordinal DV	Quade test / Ranked ANCOVA
$\geq 2$ IVs, one or more covariates	Factorial ANCOVA
$K \geq 2$ conditions, repeated measures, covariate	ANCOVA with repeated measures
Continuous IV, continuous DV, continuous moderator	Moderated regression
$\geq 2$ DVs, one or more covariates	MANCOVA

2.5 Real-World Applications

Field	Example Application	IV (Levels)	Covariate	DV
Clinical Psychology	CBT vs. BA vs. Waitlist	3 therapy conditions	Pre-treatment PHQ-9	Post-treatment PHQ-9
Education	3 teaching methods	3 conditions	Pre-test score	Post-test score
Medicine	3 drug doses vs. placebo	4 groups	Baseline BP	Post-treatment BP
Neuroscience	3 sleep conditions	3 groups	Age	Reaction time
HR/OB	3 leadership styles	3 groups	Job experience	Productivity
Nutrition	4 diet types	4 groups	Baseline weight	Weight loss
Marketing	5 ad formats	5 groups	Prior brand attitude	Purchase intent
Epidemiology	3 intervention programmes	3 groups	SES score	Health outcome

3. The Mathematics Behind ANCOVA

3.1 Notation

Symbol	Meaning
$K$	Number of groups
$n_j$	Sample size in group $j$
$N = \sum_{j=1}^K n_j$	Total sample size
$p$	Number of covariates
$x_{ij}$	Covariate score for participant $i$ in group $j$ (single covariate)
$y_{ij}$	DV score for participant $i$ in group $j$
$\bar{x}_j$	Mean covariate score in group $j$
$\bar{x}_{..}$	Grand mean of the covariate
$\bar{y}_j$	Observed (unadjusted) mean of DV in group $j$
$\bar{y}_{j(adj)}$	Adjusted mean of DV in group $j$
$\hat{\beta}$	Pooled within-group regression slope of $Y$ on $X$
$\beta_j$	Within-group slope in group $j$ (for homogeneity test)
$MS_{error(adj)}$	Adjusted within-groups mean square (error after covariate removal)

3.2 The Within-Group Regression Coefficient

ANCOVA uses the pooled within-group regression coefficient $\hat{\beta}$ , computed from the pooled within-group sums of cross-products:

Within-group sum of squares for the covariate:

$SS_{XX(W)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_j)^2$

Within-group sum of cross-products (covariate × DV):

$SP_{XY(W)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_j)(y_{ij} - \bar{y}_j)$

Pooled within-group regression coefficient:

$\hat{\beta} = \frac{SP_{XY(W)}}{SS_{XX(W)}}$

This slope represents the average linear relationship between the covariate and DV within groups, pooled across all $K$ groups. It is equivalent to the slope obtained from a regression of $Y$ on $X$ with all between-group variance removed.

3.3 Adjusted Group Means

The adjusted group mean for group $j$ :

$\bar{y}_{j(adj)} = \bar{y}_j - \hat{\beta}(\bar{x}_j - \bar{x}_{..})$

Interpretation: The adjusted mean is the estimated group mean when the group's covariate mean equals the grand mean of the covariate. It answers: "What would Group $j$ 's mean DV score be if they had, on average, the same covariate score as the entire sample?"

Adjusted grand mean:

$\bar{y}_{..(adj)} = \frac{\sum_{j=1}^K n_j \bar{y}_{j(adj)}}{N} = \bar{y}_{..}$

The adjusted grand mean equals the unadjusted grand mean (covariate adjustment preserves the overall mean).

3.4 Sum of Squares Decomposition in ANCOVA

ANCOVA involves computing adjusted sums of squares by removing the linear effect of the covariate from both the total and within-group SS.

Total sum of squares for DV (unadjusted):

$SS_{YY(T)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(y_{ij} - \bar{y}_{..})^2$

Within-group sum of squares for DV (unadjusted):

$SS_{YY(W)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(y_{ij} - \bar{y}_j)^2 = \sum_{j=1}^K(n_j-1)s_{y_j}^2$

Total sum of squares for covariate:

$SS_{XX(T)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_{..})^2$

Total sum of cross-products:

$SP_{XY(T)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij} - \bar{x}_{..})(y_{ij} - \bar{y}_{..})$

Adjusted within-groups SS (error after covariate removal):

$SS_{within(adj)} = SS_{YY(W)} - \frac{SP_{XY(W)}^2}{SS_{XX(W)}}$

The second term is the reduction in error SS due to the covariate. It equals $\hat{\beta}^2 \times SS_{XX(W)}$ — the within-group regression of $Y$ on $X$ .

Adjusted total SS:

$SS_{total(adj)} = SS_{YY(T)} - \frac{SP_{XY(T)}^2}{SS_{XX(T)}}$

Adjusted between-groups SS:

$SS_{between(adj)} = SS_{total(adj)} - SS_{within(adj)}$

Verification:

$SS_{between(adj)} = SS_{total(adj)} - SS_{within(adj)}$

Note: In ANCOVA, $SS_{total(adj)} \neq SS_{between(adj)} + SS_{within(adj)}$ if computed from raw unadjusted SS — it is the adjusted versions that sum correctly.

3.5 Degrees of Freedom

Source	$df$
Between groups (adjusted)	$K - 1$
Covariate	$p$ (number of covariates)
Within groups / Error (adjusted)	$N - K - p$
Total (adjusted)	$N - 1 - p$

The key difference from ANOVA: the error $df$ is reduced by $p$ (one per covariate), because each covariate costs one degree of freedom to estimate its slope. This is why including irrelevant covariates (low correlation with DV) reduces power despite removing some variance.

Power gain from covariate requires:

$\frac{r^2_{XY}(N - K - p)}{1} > \frac{p}{1}$

Simplified: the covariate must explain more variance than the $df$ it consumes. For a single covariate ( $p = 1$ ), any $|r_{XY}| > 0$ provides power gain when $N$ is large enough.

Break-even correlation for a single covariate:

$r^2_{break-even} = \frac{1}{N - K}$

For $N = 60$ , $K = 3$ : $r_{break-even} = \sqrt{1/57} = 0.133$ . Any $|r_{XY}| > 0.133$ within groups yields higher power from ANCOVA than from one-way ANOVA.

3.6 Mean Squares and the F-Ratio

Adjusted between-groups mean square:

$MS_{between(adj)} = \frac{SS_{between(adj)}}{K-1}$

Adjusted within-groups mean square (adjusted error variance):

$MS_{error(adj)} = \frac{SS_{within(adj)}}{N - K - p}$

The ANCOVA F-statistic:

$F = \frac{MS_{between(adj)}}{MS_{error(adj)}}$

Under $H_0$ : $F \sim F_{K-1,\;N-K-p}$

p-value:

$p = P(F_{K-1,\;N-K-p} \geq F_{obs})$

3.7 The F-Test for the Covariate

ANCOVA also produces an F-test for the covariate itself:

$F_{cov} = \frac{SS_{covariate}/p}{MS_{error(adj)}}$

Where:

$SS_{covariate} = \hat{\beta}^2 \times SS_{XX(W)} = \frac{SP_{XY(W)}^2}{SS_{XX(W)}}$

This tests whether the pooled within-group regression slope $\hat{\beta}$ is significantly different from zero. A non-significant covariate F-test suggests the covariate is not linearly related to the DV within groups, and including it may reduce power.

3.8 The ANCOVA Source Table

Source	SS	$df$	MS	$F$	$p$
Covariate ( $X$ )	$SS_{cov}$	$p$	$MS_{cov} = SS_{cov}/p$	$MS_{cov}/MS_{err(adj)}$	$P(F \geq F_{cov})$
Between groups (adjusted)	$SS_{B(adj)}$	$K-1$	$MS_{B(adj)}$	$MS_{B(adj)}/MS_{err(adj)}$	$P(F \geq F_{obs})$
Error (adjusted)	$SS_{W(adj)}$	$N-K-p$	$MS_{err(adj)}$
Total (adjusted)	$SS_{T(adj)}$	$N-1-p$

3.9 Computing the Pooled Within-Group Correlation

The pooled within-group correlation between the covariate and DV:

$r_{XY(W)} = \frac{SP_{XY(W)}}{\sqrt{SS_{XX(W)} \times SS_{YY(W)}}}$

This correlation quantifies the linear relationship between covariate and DV within groups, pooled across groups. It determines how much variance the covariate removes from the error term.

Percentage of within-group variance explained by covariate:

$r^2_{XY(W)} = \frac{SP_{XY(W)}^2}{SS_{XX(W)} \times SS_{YY(W)}}$

Percentage reduction in error SS:

$\frac{SS_{YY(W)} - SS_{within(adj)}}{SS_{YY(W)}} = r^2_{XY(W)}$

3.10 Adjusted Standard Error for Adjusted Means

The standard error of an adjusted group mean:

$SE_{j(adj)} = \sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{(\bar{x}_j - \bar{x}_{..})^2}{SS_{XX(W)}}\right)}$

The second term inside the square root captures additional uncertainty from the fact that the covariate mean of group $j$ may deviate from the grand mean. When $\bar{x}_j = \bar{x}_{..}$ , this term vanishes and $SE_{j(adj)} = \sqrt{MS_{error(adj)}/n_j}$ .

3.11 Multiple Covariates

With $p$ covariates, the ANCOVA model becomes:

$Y_i = \mu + \tau_j + \beta_1(X_{1i} - \bar{X}_1) + \beta_2(X_{2i} - \bar{X}_2) + \cdots + \beta_p(X_{pi} - \bar{X}_p) + \varepsilon_i$

The adjusted SS are computed using matrix algebra (the GLM framework):

$SS_{within(adj)} = SS_{YY(W)} - \mathbf{b}'\mathbf{S}_{XX(W)}\mathbf{b}$

Where $\mathbf{b} = \mathbf{S}_{XX(W)}^{-1}\mathbf{s}_{XY(W)}$ is the vector of pooled within-group regression coefficients, $\mathbf{S}_{XX(W)}$ is the pooled within-group covariance matrix of covariates, and $\mathbf{s}_{XY(W)}$ is the pooled within-group covariance vector between covariates and DV.

DataStatPro handles multiple covariates automatically using matrix algebra in its GLM engine.

3.12 Computing Effect Sizes from the ANCOVA Table

Partial eta squared (from F):

$\eta^2_p = \frac{F \cdot df_B}{F \cdot df_B + df_{error(adj)}}$

Partial omega squared (bias-corrected, preferred):

$\omega^2_p \approx \frac{(F-1)(K-1)}{(F-1)(K-1) + N}$

Exact partial omega squared (from SS):

$\omega^2_p = \frac{SS_{B(adj)} - (K-1)MS_{error(adj)}}{SS_{T(adj)} + MS_{error(adj)}}$

4. Assumptions of ANCOVA

ANCOVA carries all the assumptions of one-way ANOVA plus three additional covariate-specific assumptions. Violating the covariate assumptions is more consequential than violating standard ANOVA assumptions.

4.1 Normality of Residuals

The adjusted residuals $e_{ij} = y_{ij} - \bar{y}_{j(adj)} - \hat{\beta}(x_{ij} - \bar{x}_j)$ must be normally distributed within each group.

How to check:

Shapiro-Wilk test on adjusted residuals ( $H_0$ : residuals are normal).
Q-Q plot of adjusted residuals: points should follow the diagonal.
Histograms per group of adjusted residuals.
Skewness and kurtosis of adjusted residuals: $|z_{skew}| < 2$ , $|z_{kurt}| < 7$ .

Robustness: ANCOVA is robust to mild non-normality, especially with balanced designs and $n_j \geq 20$ per group. The robustness applies to the F-test for the group effect; tests on the covariate coefficient are also robust with moderate $N$ .

When violated: Use the Quade test (non-parametric ANCOVA) or ranked ANCOVA as described in Section 12. Consider log or square root transformations for right-skewed DVs.

4.2 Homogeneity of Variance (Homoscedasticity)

The adjusted within-group variances must be equal across all $K$ groups:

$\sigma^2_{1(adj)} = \sigma^2_{2(adj)} = \cdots = \sigma^2_{K(adj)}$

This applies to the residuals after removing the covariate effect.

How to check:

Levene's test on the adjusted residuals (most commonly used).
Brown-Forsythe test on adjusted residuals (more robust to non-normality).
Variance ratio rule: If $s^2_{max(adj)}/s^2_{min(adj)} > 4$ , heterogeneity is potentially problematic.

When violated: Use a heteroscedastic ANCOVA (Welch-type adjustment) or non-parametric alternatives. Report the violation and the robust alternative results alongside standard ANCOVA results.

4.3 Independence of Observations

All observations must be independent within and across groups. This is a design assumption that cannot be tested from data.

Common violations:

Clustered data (students nested in classrooms).
Repeated measurements from the same participant.
Social network dependencies among participants.

When violated: Use multilevel ANCOVA (covariate in mixed-effects model) or repeated measures ANCOVA.

4.4 Interval Scale of Measurement

Both the DV and the covariate(s) must be measured on at least an interval scale. The covariate must be continuous or at minimum have many ordered categories.

When violated for DV: Use the Quade test or ranked ANCOVA.

When violated for covariate: If the covariate is binary (0/1), include it as a categorical factor rather than a continuous covariate (use two-way ANOVA or ANCOVA with the binary variable as a blocking factor).

4.5 Homogeneity of Regression Slopes ⭐ CRITICAL

The most important ANCOVA-specific assumption: the within-group regression slope of $Y$ on $X$ must be the same in all $K$ groups:

$\beta_1 = \beta_2 = \cdots = \beta_K = \beta$

Why this matters: ANCOVA uses a single pooled slope $\hat{\beta}$ to adjust all group means. If the true slopes differ across groups, the single adjustment is incorrect for at least some groups — the adjusted means are meaningless.

Conceptually: The homogeneity of regression slopes assumption requires that the covariate-DV relationship is parallel across groups. Heterogeneous slopes indicate a covariate × group interaction — the effect of the covariate on the DV differs depending on group membership.

How to check:

Test the covariate × group interaction in a separate model:

$Y_i = \mu + \tau_j + \beta(X_i - \bar{X}) + \gamma_j(X_i - \bar{X}) + \varepsilon_i$

where $\gamma_j$ represents the deviation of group $j$ 's slope from the common slope.
The interaction $F$ -test ( $H_0$ : all $\gamma_j = 0$ ) tests homogeneity of slopes.

$F_{interaction} = \frac{(SS_{within(adj,\; homogeneous)} - SS_{within(adj,\; heterogeneous)})/(K-1)}{MS_{error(heterogeneous)}}$
Rule: If the interaction $F$ is significant at $p < .05$ , the homogeneity of regression slopes assumption is violated and standard ANCOVA should not be used.
Visual check: Plot the regression lines of $Y$ on $X$ separately for each group. Lines should be approximately parallel.

When violated:

Do not use standard ANCOVA.
Use Johnson-Neyman analysis (see Section 13.3) to identify regions of the covariate where groups differ significantly.
Use moderated regression with the group × covariate interaction term.
Report the interaction as a finding in its own right.

4.6 Independence of Covariate and Treatment (Group Membership)

For ANCOVA to provide valid adjusted means, the covariate must be independent of (i.e., not caused by) the treatment. Specifically:

In randomised experiments: The covariate must be measured before random assignment. A pre-test score measured before treatment satisfies this assumption. Post-randomisation covariates (measured after treatment begins) may be influenced by the treatment, making adjustment inappropriate.
In observational studies: Groups systematically differ on the covariate by design (e.g., younger vs. older participants). ANCOVA adjusts for this, but the adjustment is conditional on the model being correctly specified.

How to check:

Verify that the covariate was measured before treatment.
Check whether groups differ significantly on the covariate: run a one-way ANOVA with the covariate as the DV and the IV as the grouping factor.
- In randomised experiments: this test should be non-significant (covariate balance is expected from randomisation). A significant result may indicate randomisation failure.
- In observational studies: this test will typically be significant (groups differ on the covariate by design). ANCOVA provides statistical control, not a substitute for randomisation.

When violated (covariate influenced by treatment):

Removing the variance explained by a post-treatment covariate may remove part of the treatment effect itself — over-controlling bias.
Use causal diagram analysis (DAG) to determine appropriate covariate adjustment.

4.7 Linearity of Covariate-DV Relationship

ANCOVA assumes the relationship between the covariate and DV is linear within each group. Non-linear relationships are not fully removed by the linear adjustment, leaving residual covariate variance in the error term.

How to check:

Scatterplots of $Y$ vs. $X$ within each group: relationship should appear linear.
Residual plots: Plot adjusted residuals vs. covariate scores. A U-shaped or inverted-U pattern indicates non-linearity.
Polynomial test: Add $X^2$ to the ANCOVA model and test whether it contributes significantly. If $F(X^2)$ is significant, the linear assumption may be violated.

When violated:

Add a quadratic term $X^2$ to the ANCOVA model (polynomial ANCOVA).
Use spline regression for flexible non-linear adjustment.
Apply a transformation to the covariate (e.g., log $X$ for right-skewed covariates).

4.8 Reliability of the Covariate

ANCOVA assumes the covariate is measured without error. In practice, all psychological and behavioural measures contain measurement error. Measurement error in the covariate causes incomplete adjustment — residual confounding remains even after ANCOVA.

Consequences of covariate unreliability:

Under-adjustment: The adjusted means do not fully reflect what group means would be at a common true covariate score.
In randomised experiments: small downward bias in power (measurement error in the covariate leaves some adjustable variance in the error term).
In observational studies: systematic bias in adjusted means — groups that score higher on the unreliable covariate will be over-adjusted; lower-scoring groups will be under-adjusted. This can produce misleading conclusions about treatment effects.

Remedy:

Use reliability-corrected ANCOVA (Porter & Raudenbush, 1987).
Report the reliability (Cronbach's $\alpha$ , test-retest $r$ ) of the covariate.
Acknowledge unreliability as a limitation when it is low ( $\alpha < 0.80$ ).

4.9 Absence of Influential Outliers

Outliers on the covariate or DV can distort $\hat{\beta}$ and the adjusted means substantially.

How to check:

Cook's distance for each observation in the ANCOVA regression model: $D_i > 1$ flags influential observations.
Leverage values ( $h_{ii}$ ): High leverage points have extreme covariate scores that pull the regression slope.
Studentised deleted residuals: $|t_i| > 3$ flags potential outliers on the DV after covariate adjustment.
Scatterplots of $Y$ vs. $X$ with group labels to visually identify extreme points.

4.10 Assumption Summary Table

Assumption	Description	How to Check	Remedy if Violated
Normality	Adjusted residuals $\sim \mathcal{N}(0, \sigma^2)$	Shapiro-Wilk, Q-Q plot	Quade test; transform DV
Homoscedasticity	Equal adjusted within-group variances	Levene's (on residuals)	Heteroscedastic ANCOVA
Independence	Observations independent within and across groups	Design review	Multilevel ANCOVA
Interval scale (DV & CV)	Both DV and covariate have equal-interval properties	Measurement theory	Ranked ANCOVA; Quade test
Homogeneity of regression slopes	Same $\beta$ in all groups	Interaction F-test; parallel scatterplots	Johnson-Neyman; moderated regression
Independence of covariate and treatment	Covariate not caused by treatment	Covariate balance test; timing of measurement	Use pre-treatment covariates only
Linearity	Linear $Y$ -on- $X$ relationship within groups	Scatterplots; residual plots; polynomial test	Add $X^2$ ; transform covariate
Covariate reliability	Covariate measured without substantial error	Report reliability coefficient	Reliability-corrected ANCOVA
No outliers	No extreme influential observations	Cook's $D$ , leverage, studentised residuals	Investigate; report sensitivity analysis

5. Variants of ANCOVA

5.1 Standard One-Way ANCOVA

The default ANCOVA with a single continuous covariate, assuming homogeneity of regression slopes, normality of adjusted residuals, and homoscedasticity. Uses the pooled within-group regression slope for adjustment. This is appropriate when all assumptions are met.

5.2 ANCOVA with Multiple Covariates

When two or more covariates are available and each contributes unique variance to the DV, including all of them in ANCOVA maximises power. The mathematical extension uses multiple regression within the GLM framework (Section 3.11).

Guidelines for multiple covariates:

Include only theoretically motivated covariates chosen before data collection.
Avoid including covariates that are highly correlated with each other (multicollinearity degrades slope estimation).
Each additional covariate costs one $df_{error}$ ; the power benefit must exceed this cost ( $r^2_{X_p Y(W)} > 1/(N-K-p+1)$ ).
With $p$ covariates and small $N$ , ANCOVA can become over-parameterised.

5.3 Welch-Type Heteroscedastic ANCOVA

Analogous to Welch's one-way ANOVA, this variant relaxes the assumption of equal adjusted within-group variances. It uses group-specific variance estimates in the F-test denominator, with Welch-Satterthwaite degrees of freedom correction.

DataStatPro implements heteroscedastic ANCOVA using the HC3 heteroscedasticity- consistent variance estimator for the group effect test.

Use when: Levene's test on adjusted residuals is significant (especially with unequal $n_j$ ).

5.4 ANCOVA with Categorical Covariate (Blocking Factor)

When the "covariate" is categorical (e.g., site, school, gender), it functions as a blocking factor rather than a continuous covariate. This is handled as a two-way ANOVA (main effect of group + main effect of block) rather than ANCOVA.

DataStatPro handles this automatically: selecting a categorical variable as a covariate prompts the user to reclassify it as a blocking factor in a factorial ANOVA design.

5.5 ANCOVA for Pre-Post Designs (Pre-Test as Covariate)

The most common ANCOVA application in clinical and educational research:

DV = post-test score
Covariate = pre-test score (same measure, administered before treatment)
IV = treatment group

This design removes pre-existing individual differences (captured by the pre-test) from the error term, isolating treatment effects on change from baseline while controlling for regression to the mean.

Advantages over gain score analysis:

More power when the pre-post correlation is moderate to high ( $r > 0.50$ ).
Corrects for regression to the mean more effectively than gain scores.
Adjusted means are interpretable as "what the post-test would be if groups had equal pre-test scores."

5.6 Johnson-Neyman ANCOVA (Heterogeneous Slopes)

When the homogeneity of regression slopes assumption is violated (significant group × covariate interaction), Johnson-Neyman analysis identifies the region of the covariate where the group difference is statistically significant and the region where it is not.

The Johnson-Neyman boundary point(s) are the covariate values at which the adjusted group difference transitions from significant to non-significant. Between $K = 2$ groups:

$X_{JN} = \bar{X} \pm \frac{t_{crit}\sqrt{SE^2_{b_1-b_2} \cdot F_{crit} - (\bar{X}_1 - \bar{X}_2)^2 \cdot \text{something}}}{b_1 - b_2}$

DataStatPro computes Johnson-Neyman regions numerically and displays them as a floodlight plot (significance region shaded along the covariate axis).

5.7 Choosing Between Variants

Condition	Recommended Test
Normal, equal slopes, equal variances	Standard ANCOVA
Normal, equal slopes, unequal variances	Heteroscedastic ANCOVA (HC3)
Non-normal, small $n_j$	Quade test or ranked ANCOVA
Unequal slopes across groups	Johnson-Neyman analysis / moderated regression
Multiple theoretically-motivated covariates	ANCOVA with multiple covariates
Pre-post design, same DV measured twice	ANCOVA (pre-test as covariate)
Categorical covariate	Two-way ANOVA (blocking factor)

6. Using the ANCOVA Calculator Component

The ANCOVA Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting ANCOVA and its alternatives.

Step-by-Step Guide

Step 1 — Select "ANCOVA"

From the "Test Type" dropdown, choose:

ANCOVA (Standard): Equal regression slopes and equal variances assumed.
ANCOVA (Heteroscedastic): HC3 variance-corrected; no equal variance assumption.
ANCOVA (Auto): Runs standard ANCOVA, then switches to heteroscedastic if Levene's test on adjusted residuals is significant.
Quade Test: Non-parametric ANCOVA alternative.

Step 2 — Input Method

Choose how to provide the data:

Raw data (long format): Three or more columns — DV values, group membership, and covariate(s). DataStatPro computes all statistics, runs all assumption checks, and generates full output automatically.
Raw data (wide format): One column per group for DV, plus covariate column(s). DataStatPro converts to long format.
Summary statistics: Enter $K$ , $n_j$ , $\bar{y}_j$ , $s_{y_j}$ , $\bar{x}_j$ , $s_{x_j}$ , and within-group covariance $s_{xy_j}$ for each group. Full assumption checks (especially homogeneity of slopes) are limited; inferential statistics and effect sizes are computed.

Step 3 — Specify Variables

Dependent Variable (DV): Select the continuous outcome column.
Independent Variable (IV / Group): Select the categorical grouping column.
Covariate(s): Select one or more continuous covariate columns. For multiple covariates, select all relevant columns; DataStatPro enters them simultaneously in the GLM.
Group Labels: Enter descriptive names for each group level.

Step 4 — Select Assumption Checks

DataStatPro automatically runs and displays:

✅ Homogeneity of regression slopes test (group × covariate interaction F-test) — most critical ANCOVA assumption.
✅ Shapiro-Wilk test on adjusted residuals.
✅ Levene's test on adjusted residuals for homoscedasticity.
✅ Brown-Forsythe test on adjusted residuals.
✅ Q-Q plot of adjusted residuals.
✅ Scatterplot of DV vs. covariate with separate regression lines per group (visual check for parallel slopes).
✅ Linearity check: Residual vs. covariate plot; optional polynomial test.
✅ Cook's distance and leverage plot for influential observations.
✅ Covariate balance test (one-way ANOVA with covariate as DV).
✅ Variance ratio ( $s^2_{max(adj)}/s^2_{min(adj)}$ ) with warning if $> 4$ .

Step 5 — Select Post-Hoc Tests

When the omnibus F is significant, select post-hoc tests on adjusted means:

Tukey HSD on adjusted means (balanced designs, equal variances; default).
Tukey-Kramer on adjusted means (unbalanced, equal variances).
Games-Howell on adjusted means (unequal variances; use with heteroscedastic ANCOVA).
Bonferroni on adjusted means (conservative; any design).
Holm-Bonferroni on adjusted means (less conservative than Bonferroni).
Dunnett's on adjusted means (all vs. one control group).
Scheffé on adjusted means (all possible contrasts).
Custom planned contrasts on adjusted means (specify contrast weights $c_j$ ).

Step 6 — Select Effect Sizes

✅ Partial $\omega^2_p$ (bias-corrected; primary for group effect).
✅ Partial $\varepsilon^2_p$ (alternative bias correction).
✅ Partial $\eta^2_p$ (biased; provided for comparison and journal requirements).
✅ Cohen's $f_p$ (for power analysis).
✅ $\eta^2_p$ for covariate (variance explained by covariate, partialling group).
✅ 95% CIs for $\omega^2_p$ and $\eta^2_p$ via non-central F-distribution.
✅ Cohen's $d_{adj,jk}$ with 95% CI for each post-hoc pairwise comparison on adjusted means.

Step 7 — Select Display Options

✅ Full ANCOVA source table with $F$ , df, $p$ (covariate row + group row + error row).
✅ Unadjusted and adjusted means table ( $n_j$ , $\bar{y}_j$ , $\bar{y}_{j(adj)}$ , $SE_{j(adj)}$ , 95% CI per group).
✅ Covariate statistics ( $\hat{\beta}$ , $SE_\beta$ , $t$ , $p$ , $r_{XY(W)}$ ).
✅ Effect size table ( $\omega^2_p$ , $\varepsilon^2_p$ , $\eta^2_p$ , $f_p$ ) with CIs.
✅ Post-hoc comparison table (adjusted mean differences, $SE$ , adjusted $p$ , $d_{adj,jk}$ , 95% CI).
✅ Assumption test results panel (colour-coded: green/yellow/red).
✅ Scatterplot with per-group regression lines (parallel slopes check).
✅ Adjusted means plot with 95% CIs (EMM plot).
✅ Q-Q plot of adjusted residuals.
✅ Cook's distance plot.
✅ Unadjusted vs. adjusted means comparison plot.
✅ Johnson-Neyman floodlight plot (if slopes are heterogeneous).
✅ Power curve: power vs. $n$ for observed $\omega^2_p$ .
✅ APA 7th edition-compliant results paragraph (auto-generated).

Step 8 — Run the Analysis

Click "Run ANCOVA". DataStatPro will:

Test homogeneity of regression slopes; warn if violated and offer Johnson-Neyman analysis as an alternative.
Compute the full ANCOVA source table (adjusted SS, adjusted MS, F-ratios, p-values).
Run all assumption tests and display colour-coded warnings.
Compute all adjusted group means ( $\bar{y}_{j(adj)}$ ) and their standard errors.
Compute all effect sizes with exact non-central F-based CIs.
Run all selected post-hoc tests on adjusted means with adjusted p-values and individual $d_{adj,jk}$ .
Generate all visualisations.
Auto-generate the APA-compliant results paragraph.

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

This section walks through every computational step for ANCOVA, from raw data to a complete APA-style conclusion. A single covariate ( $p = 1$ ) is assumed.

Given: $K$ groups, DV $y_{ij}$ , covariate $x_{ij}$ , $i = 1,\ldots,n_j$ , $j = 1,\ldots,K$ . Total $N = \sum_j n_j$ .

Step 1 — State the Hypotheses

$H_0: \mu_{1(adj)} = \mu_{2(adj)} = \cdots = \mu_{K(adj)}$

$H_1:$ At least one adjusted population mean $\mu_{j(adj)}$ differs from the others.

Choose $\alpha$ (default: $.05$ ).

Step 2 — Compute Descriptive Statistics per Group

For each group $j$ , compute for both $y_{ij}$ and $x_{ij}$ :

$\bar{y}_j = \frac{1}{n_j}\sum_{i=1}^{n_j}y_{ij}, \quad s_{y_j} = \sqrt{\frac{\sum_{i=1}^{n_j}(y_{ij}-\bar{y}_j)^2}{n_j-1}}$

$\bar{x}_j = \frac{1}{n_j}\sum_{i=1}^{n_j}x_{ij}, \quad s_{x_j} = \sqrt{\frac{\sum_{i=1}^{n_j}(x_{ij}-\bar{x}_j)^2}{n_j-1}}$

$r_{xy_j} = \frac{\sum_i(x_{ij}-\bar{x}_j)(y_{ij}-\bar{y}_j)}{(n_j-1)s_{x_j}s_{y_j}}$

Grand means:

$\bar{y}_{..} = \frac{\sum_j n_j\bar{y}_j}{N}, \quad \bar{x}_{..} = \frac{\sum_j n_j\bar{x}_j}{N}$

Step 3 — Check Assumption: Homogeneity of Regression Slopes

Fit the full interaction model:

$Y_i = \mu + \tau_j + \beta X_i + \gamma_j X_i + \varepsilon_i$

Test $H_0$ : all $\gamma_j = 0$ using the interaction F-test.

If $p_{interaction} < .05$ : stop standard ANCOVA; use Johnson-Neyman or moderated regression instead. Report this finding.

If $p_{interaction} \geq .05$ : proceed with standard ANCOVA.

Step 4 — Compute Within-Group Sums of Squares and Cross-Products

$SS_{YY(W)} = \sum_{j=1}^K(n_j-1)s_{y_j}^2$

$SS_{XX(W)} = \sum_{j=1}^K(n_j-1)s_{x_j}^2$

$SP_{XY(W)} = \sum_{j=1}^K(n_j-1)s_{x_j}s_{y_j}r_{xy_j}$

Step 5 — Compute Total Sums of Squares and Cross-Products

$SS_{YY(T)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(y_{ij}-\bar{y}_{..})^2$

$SS_{XX(T)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij}-\bar{x}_{..})^2$

$SP_{XY(T)} = \sum_{j=1}^K\sum_{i=1}^{n_j}(x_{ij}-\bar{x}_{..})(y_{ij}-\bar{y}_{..})$

Step 6 — Compute Between-Group Sums of Squares and Cross-Products

$SS_{YY(B)} = SS_{YY(T)} - SS_{YY(W)}$

$SS_{XX(B)} = SS_{XX(T)} - SS_{XX(W)}$

$SP_{XY(B)} = SP_{XY(T)} - SP_{XY(W)}$

Step 7 — Compute the Pooled Within-Group Regression Coefficient

$\hat{\beta} = \frac{SP_{XY(W)}}{SS_{XX(W)}}$

Step 8 — Compute Adjusted Sums of Squares

Adjusted within-groups SS (error):

$SS_{within(adj)} = SS_{YY(W)} - \frac{SP_{XY(W)}^2}{SS_{XX(W)}} = SS_{YY(W)} - \hat{\beta} \cdot SP_{XY(W)}$

Adjusted total SS:

$SS_{total(adj)} = SS_{YY(T)} - \frac{SP_{XY(T)}^2}{SS_{XX(T)}}$

Adjusted between-groups SS:

$SS_{between(adj)} = SS_{total(adj)} - SS_{within(adj)}$

Covariate SS:

$SS_{covariate} = SS_{YY(W)} - SS_{within(adj)} = \hat{\beta} \cdot SP_{XY(W)}$

Step 9 — Compute Degrees of Freedom

$df_{between} = K-1$

$df_{covariate} = 1$ (for single covariate; $p$ in general)

$df_{error} = N - K - 1$ (for single covariate; $N-K-p$ in general)

$df_{total} = N - 2$ (for single covariate; $N-1-p$ in general)

Step 10 — Compute Mean Squares and F-Ratios

$MS_{between(adj)} = SS_{between(adj)}/(K-1)$

$MS_{error(adj)} = SS_{within(adj)}/(N-K-1)$

$MS_{covariate} = SS_{covariate}/1$

$F_{group} = MS_{between(adj)}/MS_{error(adj)}$ with $df = (K-1,\;N-K-1)$

$F_{covariate} = MS_{covariate}/MS_{error(adj)}$ with $df = (1,\;N-K-1)$

Step 11 — Compute Adjusted Group Means

$\bar{y}_{j(adj)} = \bar{y}_j - \hat{\beta}(\bar{x}_j - \bar{x}_{..})$

$SE_{j(adj)} = \sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{(\bar{x}_j-\bar{x}_{..})^2}{SS_{XX(W)}}\right)}$

Step 12 — Check Remaining Assumptions

Run Shapiro-Wilk on adjusted residuals.
Run Levene's test on adjusted residuals.
Run linearity check (residual vs. covariate plot).
Inspect Cook's distances for influential observations.
Run covariate balance test (ANOVA with covariate as DV).

Step 13 — Compute Effect Sizes

Partial eta squared:

$\eta^2_p = \frac{SS_{between(adj)}}{SS_{between(adj)} + SS_{within(adj)}}$

Partial omega squared (bias-corrected, preferred):

$\omega^2_p = \frac{SS_{between(adj)} - (K-1)MS_{error(adj)}}{SS_{between(adj)} + SS_{within(adj)} + MS_{error(adj)}}$

Partial epsilon squared:

$\varepsilon^2_p = \frac{SS_{between(adj)} - (K-1)MS_{error(adj)}}{SS_{between(adj)} + SS_{within(adj)}}$

Cohen's $f_p$ :

$f_p = \sqrt{\frac{\omega^2_p}{1-\omega^2_p}}$

Step 14 — Compute 95% CI for $\omega^2_p$

Using the non-central F-distribution with $df_1 = K-1$ and $df_2 = N-K-1$ . DataStatPro performs this computation numerically (see Section 8.5 for details).

Step 15 — Conduct Post-Hoc Tests on Adjusted Means (if F significant)

Select the appropriate post-hoc test (Section 9). Compute pairwise differences of adjusted means, standard errors, adjusted p-values, and individual Cohen's $d_{adj,jk}$ for each pair.

Step 16 — Interpret and Report

Combine all results into an APA-compliant report (Section 13.8).

8. Effect Sizes for ANCOVA

8.1 Partial vs. Total Effect Sizes in ANCOVA

In ANCOVA, effect sizes are partial — they express the proportion of variance explained by group membership after removing the variance explained by the covariate. Partial effect sizes are appropriate because the covariate is not the substantive effect of interest; it is only a control variable.

Important distinction:

Total $\eta^2$ (non-partial): Proportion of total variance in $Y$ explained by group membership. This ignores the covariate and is equivalent to the ANOVA effect size calculated as if the covariate were not in the model.
Partial $\eta^2_p$ : Proportion of DV variance not explained by the covariate that is explained by group membership. Always $\geq$ total $\eta^2$ .
Always report partial effect sizes for ANCOVA and label them explicitly as partial.

8.2 Partial Eta Squared ( $\eta^2_p$ ) — Common but Biased

$\eta^2_p = \frac{SS_{between(adj)}}{SS_{between(adj)} + SS_{within(adj)}}$

$\eta^2_p$ is the proportion of adjusted DV variance accounted for by group membership. It is the most commonly reported ANCOVA effect size (default in SPSS and most software).

Critical limitation: $\eta^2_p$ is positively biased, overestimating the true population partial effect, particularly in small samples with many groups.

From F (direct computation):

$\eta^2_p = \frac{F \cdot df_B}{F \cdot df_B + df_{error(adj)}}$

⚠️ Report $\eta^2_p$ only when explicitly required by a journal or for historical comparison. Always report $\omega^2_p$ (or $\varepsilon^2_p$ ) as the primary effect size and label $\eta^2_p$ as "biased" in your manuscript.

8.3 Partial Omega Squared ( $\omega^2_p$ ) — Preferred

$\omega^2_p = \frac{SS_{between(adj)} - (K-1)MS_{error(adj)}}{SS_{between(adj)} + SS_{within(adj)} + MS_{error(adj)}}$

$\omega^2_p$ is the bias-corrected estimate of the population partial proportion of variance explained. It is the recommended primary effect size for ANCOVA.

Properties:

Can be slightly negative in small samples when the true effect is zero — report as 0 by convention.
Always $\leq \eta^2_p$ .
Converges to $\eta^2_p$ as $N \to \infty$ .
Accounts for the reduction in $df_{error}$ due to the covariate.

From F (approximate):

$\omega^2_p \approx \frac{(F-1)(K-1)}{(F-1)(K-1) + N}$

8.4 Partial Epsilon Squared ( $\varepsilon^2_p$ ) — Alternative Correction

$\varepsilon^2_p = \frac{SS_{between(adj)} - (K-1)MS_{error(adj)}}{SS_{between(adj)} + SS_{within(adj)}}$

Properties:

Always: $\omega^2_p \leq \varepsilon^2_p \leq \eta^2_p$ .
Slightly less bias-correction than $\omega^2_p$ .
Computationally simpler (no $MS_{error(adj)}$ in denominator).

8.5 Cohen's $f_p$ — For Power Analysis

$f_p = \sqrt{\frac{\eta^2_p}{1-\eta^2_p}}$ or $f_p = \sqrt{\frac{\omega^2_p}{1-\omega^2_p}}$

Cohen's $f_p$ is used as the effect size input for ANCOVA power analysis. It represents the ratio of between-groups adjusted SD to within-groups adjusted SD.

Benchmarks: Small = 0.10, Medium = 0.25, Large = 0.40 (Cohen, 1988).

8.6 Cohen's $d_{adj}$ for Pairwise Comparisons

For each pairwise comparison of adjusted means, report Cohen's $d_{adj}$ :

$d_{adj,jk} = \frac{\bar{y}_{j(adj)} - \bar{y}_{k(adj)}}{\sqrt{MS_{error(adj)}}}$

Using $\sqrt{MS_{error(adj)}}$ as the standardiser is preferred because it is the ANCOVA-based estimate of the common within-group SD after covariate adjustment.

95% CI for the pairwise adjusted mean difference:

$(\bar{y}_{j(adj)} - \bar{y}_{k(adj)}) \pm t_{\alpha/2,\;N-K-1} \times \sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{1}{n_k} + \frac{(\bar{x}_j-\bar{x}_k)^2}{SS_{XX(W)}}\right)}$

Note: The CI is wider than in ANOVA because of the additional uncertainty from groups potentially differing on the covariate.

8.7 Variance Explained by the Covariate

The partial $\eta^2$ for the covariate quantifies how much of the adjusted DV variance the covariate explains:

$\eta^2_{p,cov} = \frac{SS_{covariate}}{SS_{covariate} + SS_{within(adj)}}$

This is the squared semi-partial correlation between the covariate and DV, partialling the group effect.

Proportion of within-group variance explained (squared pooled within-group correlation):

$r^2_{XY(W)} = \frac{SP_{XY(W)}^2}{SS_{XX(W)} \cdot SS_{YY(W)}}$

8.8 Comparison of ANOVA vs. ANCOVA Effect Sizes

ANCOVA effect sizes ( $\omega^2_p$ , $\eta^2_p$ ) are generally larger than ANOVA effect sizes ( $\omega^2$ , $\eta^2$ ) on the same data, because the denominator (error variance) is reduced by covariate adjustment. This inflated appearance is appropriate — it reflects the genuine increase in precision from including the covariate. However, it is not valid to compare effect sizes across ANOVA and ANCOVA analyses without acknowledging this difference.

Power improvement from covariate (approximate):

$\frac{\text{Power}_{ANCOVA}}{\text{Power}_{ANOVA}} \approx \sqrt{\frac{1}{1-r^2_{XY(W)}}} \quad \text{(for large }N\text{)}$

For $r_{XY(W)} = 0.70$ ( $r^2 = 0.49$ ): Power ratio $\approx \sqrt{1/0.51} = 1.40$ — ANCOVA has approximately 40% more power than ANOVA.

9. Post-Hoc Tests and Planned Contrasts

9.1 Post-Hoc Tests in ANCOVA Are Applied to Adjusted Means

The critical distinction between ANOVA and ANCOVA post-hoc testing is that in ANCOVA, all pairwise comparisons are made on the adjusted means ( $\bar{y}_{j(adj)}$ ), not the observed means. Using observed means for post-hoc tests after a significant ANCOVA F-test is incorrect and invalidates the comparison.

Standard errors for pairwise adjusted mean differences include an additional term for covariate mean differences between groups (Section 8.6).

9.2 Tukey HSD on Adjusted Means

Tukey's HSD for ANCOVA balanced designs:

$\text{HSD}_{jk} = q_{K,\;N-K-1,\;\alpha} \times \sqrt{\frac{MS_{error(adj)}}{2}\left(\frac{1}{n_j} + \frac{1}{n_k} + \frac{(\bar{x}_j-\bar{x}_k)^2}{SS_{XX(W)}}\right)}$

The studentised range distribution uses $df_2 = N-K-1$ (ANCOVA error df) rather than $N-K$ (ANOVA error df).

For unequal sample sizes (Tukey-Kramer extension):

$\text{HSD}_{jk} = \frac{q_{K,\;N-K-1,\;\alpha}}{\sqrt{2}} \times \sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{1}{n_k} + \frac{(\bar{x}_j-\bar{x}_k)^2}{SS_{XX(W)}}\right)}$

Declare groups $j$ and $k$ different if $|\bar{y}{j(adj)} - \bar{y}{k(adj)}|

\text{HSD}_{jk}$.

9.3 Games-Howell on Adjusted Means

When Levene's test on adjusted residuals is significant, use Games-Howell with group-specific variance estimates:

$t_{jk} = \frac{\bar{y}_{j(adj)} - \bar{y}_{k(adj)}}{\sqrt{s^2_{j(adj)}/n_j + s^2_{k(adj)}/n_k}}$

with Welch-Satterthwaite df.

9.4 Bonferroni and Holm-Bonferroni on Adjusted Means

Bonferroni: Compare each pairwise adjusted-mean p-value to $\alpha^* = \alpha/m$ where $m = K(K-1)/2$ .

Holm-Bonferroni (preferred): Sequential procedure applied to sorted p-values from adjusted mean comparisons. Uniformly more powerful than Bonferroni.

9.5 Dunnett's Test on Adjusted Means

When the primary interest is comparing $K-1$ experimental groups to a single control group, Dunnett's test on adjusted means provides optimal power:

$t_{j,control} = \frac{\bar{y}_{j(adj)} - \bar{y}_{control(adj)}}{\sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{1}{n_{control}} + \frac{(\bar{x}_j-\bar{x}_{control})^2}{SS_{XX(W)}}\right)}}$

Compared against Dunnett's critical values with $df_{error} = N-K-1$ .

9.6 Planned Contrasts on Adjusted Means

For pre-planned comparisons, the contrast on adjusted means is:

$\hat{\psi}_{adj} = \sum_{j=1}^K c_j \bar{y}_{j(adj)}$

with $\sum_j c_j = 0$ .

Contrast SS and F:

$SS_{\psi(adj)} = \frac{\hat{\psi}_{adj}^2}{\sum_j c_j^2/n_j + \hat{\beta}^2 \sum_j c_j^2(\bar{x}_j-\bar{x}_{..})^2/SS_{XX(W)}}$

$F_{\psi(adj)} = SS_{\psi(adj)}/MS_{error(adj)}$ with $df = (1,\;N-K-1)$

For orthogonal contrasts on adjusted means, the orthogonality condition is:

$\sum_{j=1}^K \frac{c_j c'_j}{n_j} + \hat{\beta}^2 \frac{\sum_j c_j(\bar{x}_j-\bar{x}_{..}) \cdot \sum_j c'_j(\bar{x}_j-\bar{x}_{..})}{SS_{XX(W)}} = 0$

(Slightly different from ANOVA orthogonality due to covariate terms.)

DataStatPro computes these automatically when contrast weights are entered.

10. Confidence Intervals

10.1 95% CI for Each Adjusted Group Mean

The 95% CI for the adjusted population mean $\mu_{j(adj)}$ :

$\bar{y}_{j(adj)} \pm t_{\alpha/2,\;N-K-1} \times \sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{(\bar{x}_j-\bar{x}_{..})^2}{SS_{XX(W)}}\right)}$

Note: When $\bar{x}_j = \bar{x}_{..}$ (group $j$ has the same covariate mean as the grand mean), the CI simplifies to the ANOVA formula. When groups differ on the covariate, the CI is wider — reflecting additional uncertainty in the adjustment.

10.2 95% CI for Pairwise Adjusted Mean Differences

The 95% CI for $\mu_{j(adj)} - \mu_{k(adj)}$ :

$(\bar{y}_{j(adj)}-\bar{y}_{k(adj)}) \pm t_{\alpha/2,\;N-K-1} \times \sqrt{MS_{error(adj)}\left(\frac{1}{n_j} + \frac{1}{n_k} + \frac{(\bar{x}_j-\bar{x}_k)^2}{SS_{XX(W)}}\right)}$

Using Tukey-adjusted critical values (simultaneous CIs):

Replace $t_{\alpha/2,\;N-K-1}$ with $q_{K,\;N-K-1,\;\alpha}/\sqrt{2}$ .

10.3 95% CI for the Regression Coefficient $\hat{\beta}$

The CI for the pooled within-group slope:

$\hat{\beta} \pm t_{\alpha/2,\;N-K-1} \times \sqrt{\frac{MS_{error(adj)}}{SS_{XX(W)}}}$

10.4 95% CI for $\omega^2_p$

Using the non-central F-distribution (computed numerically by DataStatPro).

The non-centrality parameter: $\hat{\lambda} = (F_{group}-1) \times df_B$

Find $\lambda_L$ , $\lambda_U$ such that:

$P(F_{K-1,\;N-K-1}(\lambda_L) \geq F_{obs}) = 0.025$

$P(F_{K-1,\;N-K-1}(\lambda_U) \leq F_{obs}) = 0.025$

Convert to $\eta^2_p$ : $\eta^2_{p,L} = \lambda_L/(\lambda_L + N)$ ; $\eta^2_{p,U} = \lambda_U/(\lambda_U + N)$ .

Then apply bias correction to obtain $\omega^2_p$ bounds.

10.5 CI Width and the Covariate

The CI for adjusted means is wider than the CI for observed means in ANOVA unless groups have equal covariate means. The additional width from the covariate term:

$\Delta SE_{j(adj)}^2 = \frac{MS_{error(adj)} \cdot (\bar{x}_j - \bar{x}_{..})^2}{SS_{XX(W)}}$

This additional uncertainty vanishes in randomised experiments where random assignment ensures $\bar{x}_j \approx \bar{x}_{..}$ on average. In observational studies, large covariate mean differences between groups produce substantially wider CIs for adjusted means.

11. Power Analysis and Sample Size Planning

11.1 Power Advantage of ANCOVA over ANOVA

The power advantage of ANCOVA over one-way ANOVA depends on the within-group correlation between the covariate and DV:

Effective sample size multiplier:

$N_{eff} = \frac{N}{1 - r^2_{XY(W)}}$

ANCOVA with $N$ participants has equivalent power to ANOVA with $N_{eff}$ participants. For $r_{XY(W)} = 0.60$ : $N_{eff} = N/0.64 = 1.56N$ — ANCOVA with 100 participants has the same power as ANOVA with 156 participants.

However, ANCOVA costs one $df_{error}$ per covariate, slightly reducing this gain. The net effect on power:

$\text{Power gain} \approx \frac{\sqrt{1-r^2_{XY(W)}} \cdot (N-K)}{\sqrt{1} \cdot (N-K-1)} \approx \sqrt{1-r^2_{XY(W)}} \cdot \frac{N-K}{N-K-1}$

For any $|r_{XY(W)}| > r_{break-even} = \sqrt{1/(N-K)}$ , ANCOVA is more powerful.

11.2 A Priori Power Analysis for ANCOVA

Non-centrality parameter for ANCOVA:

$\lambda = \frac{f^2_p \cdot N}{1} = f^2_p \cdot N$

Where $f^2_p = \omega^2_p/(1-\omega^2_p)$ is computed from the partial effect size.

Power computation (exact, using non-central F):

$\text{Power} = P\!\left(F_{K-1,\;K(n-1)-1}(\lambda) > F_{crit}\right)$

Where $F_{crit} = F_{\alpha,\;K-1,\;K(n-1)-1}$ and $\lambda = f^2_p \times Kn$ .

Note: The critical F uses $df_{error} = N-K-1$ (ANCOVA df), not $N-K$ (ANOVA df).

11.3 Required Sample Size

For ANCOVA power analysis, specify:

Expected $\omega^2_p$ (or Cohen's $f_p$ ) for the group effect on adjusted means.
Expected within-group correlation $r_{XY(W)}$ between covariate and DV.
Number of groups $K$ .
Number of covariates $p$ .
Desired power ( $1-\beta$ ; typically 0.80 or 0.90).
Significance level $\alpha$ (typically .05).

Required $n$ per group for 80% power ( $\alpha = .05$ , one covariate):

$f_p$	$\omega^2_p$	$r_{XY(W)}$	$K=3$	$K=4$	$K=5$
0.10	0.010	0.30	315	268	236
0.10	0.010	0.60	211	180	159
0.25	0.059	0.30	50	43	38
0.25	0.059	0.60	34	29	26
0.40	0.138	0.30	20	17	15
0.40	0.138	0.60	13	12	10
0.50	0.200	0.50	11	10	9
0.60	0.265	0.50	8	7	6

All values are $n$ per group. Total $N = n \times K$ . Higher $r_{XY(W)}$ requires fewer participants because the covariate removes more error variance.

11.4 Determining Effect Size Inputs

From prior ANOVA literature (convert ANOVA $\omega^2$ to ANCOVA $\omega^2_p$ ):

If a prior ANOVA found $\omega^2_{ANOVA}$ and you plan to add a covariate with expected $r^2_{XY(W)}$ :

$\omega^2_p \approx \frac{\omega^2_{ANOVA}}{1 - r^2_{XY(W)}}$ (approximate)

Because the covariate removes within-group variance from the denominator, the partial effect size is amplified.

From pilot data: Run ANCOVA on the pilot sample and use the observed $\omega^2_p$ (with appropriate shrinkage, as pilot estimates are noisy).

From theory: Specify minimum practically meaningful differences between adjusted means and estimate $\sigma_{adj} = \sqrt{MS_{error(adj)}} \approx \sigma\sqrt{1-r^2_{XY(W)}}$ .

11.5 Sensitivity Analysis for ANCOVA

The minimum detectable partial $f_p$ for a given $N$ , $K$ , $p$ , and 80% power:

$f_{p,min} \approx \sqrt{\frac{7.849}{N - p}}$ (single covariate; approximate)

For $N = 90$ , $K = 3$ , $p = 1$ : $f_{p,min} \approx \sqrt{7.849/89} = 0.297$

Corresponding $\omega^2_{p,min} \approx 0.081$ .

⚠️ Report sensitivity analysis for null ANCOVA results. The additional covariate df reduces minimum detectable effects compared to ANOVA — ANCOVA needs to be powered for the partial effect size, which may be larger than the corresponding ANOVA effect.

12. Non-Parametric Alternative: Quade and Ranked ANCOVA

12.1 When to Use Non-Parametric ANCOVA Alternatives

The Quade test and ranked ANCOVA are appropriate when:

The DV is ordinal or severely non-normally distributed.
There are extreme outliers on the DV that cannot be removed.
The normality assumption for ANCOVA is severely violated with small $n_j < 15$ .
Homoscedasticity of adjusted residuals is seriously violated.

12.2 The Quade Test

The Quade test (Quade, 1967) is a non-parametric extension of ANCOVA for ranked data with a single covariate.

Procedure:

Step 1 — Rank the covariate: Rank all $N$ covariate scores from 1 to $N$ . Assign average ranks for ties.

Step 2 — Rank the DV within blocks: Using the ranked covariate to define blocks (sort participants by covariate rank and divide into $b$ blocks of approximately equal size), rank the DV scores within each block from 1 to $K$ .

Step 3 — Compute residuals within blocks: For each participant, compute the deviation of their DV rank from the expected rank under $H_0$ within their block.

Step 4 — Compute the Quade F-statistic:

$F_Q = \frac{(N-1)B}{A - B}$

Where:

$B = \frac{1}{b(b-1)}\left[\sum_{j=1}^K T_j^2 - \frac{(\sum_{j=1}^K T_j)^2}{K}\right]$

$A = \frac{1}{b-1}\sum_{j=1}^K\sum_{i=1}^{b}S_{ij}^2$

and $T_j = \sum_i Q_i R_{ij}$ (weighted rank sums; $Q_i$ = block weight proportional to block range; $R_{ij}$ = rank of group $j$ in block $i$ ).

Step 5 — p-value:

$F_Q \sim F_{K-1,\;(K-1)(b-1)}$ approximately for large $N$ .

Effect size:

$\eta^2_Q = \frac{B}{A/(K-1)}$ (analogous to $\eta^2_H$ for Kruskal-Wallis)

12.3 Ranked ANCOVA (General Approach)

A more flexible non-parametric alternative is to:

Rank both the DV and the covariate (Conover & Iman, 1982).
Run standard ANCOVA on the ranks using the ranked covariate as the covariate and the ranked DV as the dependent variable.
The resulting F-test is approximately distribution-free for large samples.

This approach is simpler to implement, available in DataStatPro, and handles multiple covariates naturally.

Advantages over Quade test:

Handles multiple covariates.
No need to define blocks.
Compatible with all standard ANCOVA post-hoc procedures applied to ranked data.

Limitations:

Less powerful than Quade test for small samples.
Effect size interpretation is on the rank scale.

12.4 Post-Hoc Tests for Quade and Ranked ANCOVA

After a significant Quade test, use Dunn-type pairwise comparisons on the rank residuals with Holm-Bonferroni correction. DataStatPro computes these automatically.

After a significant ranked ANCOVA, apply standard ANCOVA post-hoc tests to the ranked adjusted means with appropriate corrections.

12.5 Efficiency of Non-Parametric ANCOVA

For normal data, ranked ANCOVA has ARE $\approx 3/\pi \approx 0.955$ relative to standard ANCOVA — negligible loss. For non-normal data (especially heavy-tailed or skewed distributions), ranked ANCOVA can be substantially more powerful.

13. Advanced Topics

13.1 ANCOVA as a Special Case of the General Linear Model

ANCOVA is a restricted version of the GLM:

$\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}$

Where $\mathbf{X}$ is the design matrix containing group indicator columns (effect-coded or dummy-coded) and the covariate column(s). In matrix form:

$\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}$

$SS_{between(adj)} = \hat{\boldsymbol{\beta}}_\tau' \mathbf{X}_\tau' \mathbf{M}_{cov} \mathbf{X}_\tau \hat{\boldsymbol{\beta}}_\tau$

Where $\mathbf{M}_{cov}$ is the residual maker matrix after partialling the covariate.

This GLM framework naturally handles:

Unbalanced designs (unequal $n_j$ ).
Multiple covariates.
Factorial ANCOVA (multiple IVs with covariates).
MANCOVA (multiple DVs with covariates).

13.2 Type I, II, and III Sums of Squares in ANCOVA

For balanced designs, Types I, II, and III SS give identical results. For unbalanced designs, they differ:

Type I SS (sequential): Order-dependent; tests the group effect after the covariate, but also includes group-by-covariate overlap. Not recommended for ANCOVA.
Type II SS: Tests the group effect after the covariate, ignoring the group × covariate interaction (assumes no interaction). Recommended when no interaction is present.
Type III SS (partial): Tests the group effect after the covariate and all other terms in the model. Recommended as default for ANCOVA — matches the interpretation of "adjusted group means." DataStatPro uses Type III SS by default.

⚠️ Always report which type of SS was used. Most statistical software defaults to Type III; SPSS, R (car package), and DataStatPro all use Type III by default. Using Type I SS in unbalanced ANCOVA designs leads to incorrect group effect tests.

13.3 Johnson-Neyman Analysis for Heterogeneous Slopes

When the homogeneity of regression slopes assumption is violated, Johnson-Neyman (J-N) analysis identifies the specific values of the covariate $X$ at which the group difference in $Y$ transitions between statistically significant and non-significant.

For $K = 2$ groups with slopes $\hat{\beta}_1$ and $\hat{\beta}_2$ :

The adjusted mean difference as a function of $X$ is:

$\Delta\hat{Y}(X) = (\bar{y}_{1(adj)} - \bar{y}_{2(adj)}) + (\hat{\beta}_1 - \hat{\beta}_2)(X - \bar{X})$

The J-N boundary $X_{JN}$ is where $|t(X_{JN})| = t_{crit,\;\alpha/2}$ :

$X_{JN} = \bar{X} + \frac{(\bar{y}_1 - \bar{y}_2) \pm t_{crit}\sqrt{t^2_{crit}V_{22} - V_{11} + 2(\bar{y}_1-\bar{y}_2)t_{crit}V_{12}}}{(\hat{\beta}_1-\hat{\beta}_2)^2 - t^2_{crit}V_{33}}$

(Where $V_{ij}$ are variance-covariance terms from the heterogeneous ANCOVA model.)

DataStatPro computes J-N regions numerically and displays:

Floodlight plot: The covariate range is displayed on the x-axis; regions where the group difference is significant are shaded.
Region of significance: Exact J-N boundary values with 95% confidence bands.

13.4 ANCOVA with Multiple Covariates: Stepwise vs. Simultaneous Entry

When multiple covariates are available:

Simultaneous entry (recommended): All $p$ theoretically-motivated covariates are entered together in a single ANCOVA model. This is appropriate when all covariates are chosen based on theory before data collection.

Hierarchical entry: Covariates are entered in theoretically-motivated blocks (e.g., demographic covariates first, then psychological covariates). Tests whether each block explains incremental variance after prior blocks.

Stepwise entry (not recommended): Automated variable selection based on statistical criteria (e.g., forward, backward, stepwise). This inflates Type I error, produces unstable models, and overfits the sample. DataStatPro does not support stepwise ANCOVA but supports hierarchical block entry.

13.5 Lord's Paradox

Lord's Paradox (Lord, 1967) is a famous conceptual puzzle arising in pre-post ANCOVA designs. It demonstrates that two different but seemingly valid analyses can produce contradictory conclusions:

Analysis 1 (gain scores): No significant difference in change scores between groups.
Analysis 2 (ANCOVA with pre-test as covariate): Significant group difference after adjusting for pre-test.

Resolution: The two analyses answer different questions:

Gain score analysis: Do groups change by the same absolute amount?
ANCOVA: Would groups differ on the post-test if they had the same pre-test score?

In randomised experiments with equal pre-test means, both analyses give equivalent results. In observational studies with pre-existing group differences, the two analyses address fundamentally different causal questions. Use directed acyclic graphs (DAGs) to determine which question is scientifically appropriate.

13.6 ANCOVA in Randomised vs. Observational Studies

Feature	Randomised Experiment	Observational Study
Purpose of covariate	Increase power	Statistical control for confounding
Covariate-group independence	Guaranteed by randomisation	Violated by design
Interpretation of adjusted means	Interpretable causal effect	Conditional association; residual confounding possible
Measurement error consequences	Minor power reduction	Systematic bias (under-adjustment)
Homogeneity of slopes importance	Standard check	Critical; more likely to be violated
Validity of causal inference	Strong (covariate adds precision)	Weak without strong assumptions

13.7 ANCOVA for Factorial Designs (Factorial ANCOVA)

When there are two or more IVs and one or more covariates, Factorial ANCOVA (also called Two-Way ANCOVA) extends the model:

$Y_i = \mu + \alpha_j + \beta_k + (\alpha\beta)_{jk} + \gamma(X_i - \bar{X}) + \varepsilon_i$

Where $\alpha_j$ = effect of IV $_1$ level $j$ , $\beta_k$ = effect of IV $_2$ level $k$ , $(\alpha\beta)_{jk}$ = interaction, and $\gamma$ = covariate slope.

The homogeneity of regression slopes assumption requires the covariate slope to be homogeneous across all $j \times k$ cells, not just across levels of one IV.

DataStatPro handles factorial ANCOVA within its GLM engine; see the Factorial ANOVA tutorial for the base factorial framework.

13.8 Bayesian ANCOVA

Bayesian ANCOVA computes Bayes Factors comparing models that do and do not include the group effect, while including the covariate in both models:

$BF_{10} = \frac{P(\text{data} \mid \text{group effect + covariate})}{P(\text{data} \mid \text{covariate only})}$

This tests whether there is evidence for group differences beyond what the covariate explains. DataStatPro implements Bayesian ANCOVA using the BayesFactor method (Rouder et al., 2012) with Cauchy priors on standardised group effects ( $r = \sqrt{2}/2$ ).

Reporting: "A Bayesian ANCOVA (Cauchy prior, $r = \sqrt{2}/2}$ ) with [covariate name] as covariate provided [strong/moderate/anecdotal] evidence for [the group effect / the null hypothesis] after covariate adjustment, $BF_{10} =$ [value]."

13.9 Reporting ANCOVA According to APA 7th Edition

Full minimum reporting set (APA 7th ed.):

Statement of which test (standard ANCOVA or heteroscedastic), the covariate(s), and the rationale for including each covariate.
Homogeneity of regression slopes test result.
Levene's test result on adjusted residuals.
Covariate $F$ -test, $\beta$ coefficient, and partial $\eta^2$ for the covariate.
Group $F(df_B, df_{error}) =$ [value], $p =$ [value].
$\omega^2_p =$ [value] [95% CI: LB, UB].
Which effect size was computed ( $\omega^2_p$ not just "partial effect size").
Both unadjusted AND adjusted group means, SDs, and SEs for all $K$ groups.
Post-hoc test results on adjusted means with adjusted p-values and $d_{adj,jk}$ per pair.
95% CI for each significant pairwise adjusted mean difference.

14. Worked Examples

Example 1: Pre-Post ANCOVA — Therapy Type on Depression

A clinical researcher randomly assigns $N = 90$ participants to three therapy conditions: CBT ( $n_1 = 30$ ), Behavioural Activation (BA; $n_2 = 30$ ), and Waitlist Control (WL; $n_3 = 30$ ). Pre-treatment PHQ-9 scores are recorded as the covariate; post- treatment PHQ-9 scores are the DV.

Descriptive statistics:

Group	$n_j$	$\bar{x}_j$ (pre)	$s_{x_j}$	$\bar{y}_j$ (post)	$s_{y_j}$	$r_{xy_j}$
CBT	30	18.20	3.80	9.80	4.20	$0.72$
BA	30	17.90	4.10	11.40	4.60	$0.68$
WL	30	18.40	3.90	16.30	5.10	$0.61$

Grand means: $\bar{x}_{..} = (30\times18.20 + 30\times17.90 + 30\times18.40)/90 = 18.167$

$\bar{y}_{..} = (30\times9.80 + 30\times11.40 + 30\times16.30)/90 = 12.500$

Homogeneity of regression slopes test:

Testing group × covariate interaction: $F(2, 84) = 0.47$ , $p = .626$ — slopes are homogeneous. Proceed with standard ANCOVA.

Within-group SS and cross-products:

$SS_{XX(W)} = 29(3.80^2) + 29(4.10^2) + 29(3.90^2)$

$= 29(14.44) + 29(16.81) + 29(15.21) = 418.76 + 487.49 + 441.09 = 1347.34$

$SP_{XY(W)} = 29(3.80)(4.20)(0.72) + 29(4.10)(4.60)(0.68) + 29(3.90)(5.10)(0.61)$

$= 29(11.491) + 29(12.835) + 29(12.143) = 333.24 + 372.22 + 352.15 = 1057.61$

$SS_{YY(W)} = 29(4.20^2) + 29(4.60^2) + 29(5.10^2)$

$= 29(17.64) + 29(21.16) + 29(26.01) = 511.56 + 613.64 + 754.29 = 1879.49$

Pooled within-group regression coefficient:

$\hat{\beta} = 1057.61/1347.34 = 0.7849$

Adjusted within-groups SS:

$SS_{within(adj)} = 1879.49 - (1057.61)^2/1347.34 = 1879.49 - 830.46 = 1049.03$

Total SS and cross-products:

$SS_{YY(T)} = \sum(y_{ij}-\bar{y}_{..})^2 = 2567.69$ (computed from raw data)

$SS_{XX(T)} = \sum(x_{ij}-\bar{x}_{..})^2 = 1372.17$

$SP_{XY(T)} = \sum(x_{ij}-\bar{x}_{..})(y_{ij}-\bar{y}_{..}) = 1068.43$

Adjusted total SS:

$SS_{total(adj)} = 2567.69 - (1068.43)^2/1372.17 = 2567.69 - 831.39 = 1736.30$

Adjusted between-groups SS:

$SS_{between(adj)} = 1736.30 - 1049.03 = 687.27$

Covariate SS:

$SS_{covariate} = (1057.61)^2/1347.34 = 830.46$

Degrees of freedom:

$df_{between} = 2$ ; $df_{covariate} = 1$ ; $df_{error} = 90-3-1 = 86$

Mean squares and F-ratios:

$MS_{between(adj)} = 687.27/2 = 343.64$

$MS_{error(adj)} = 1049.03/86 = 12.198$

$MS_{covariate} = 830.46/1 = 830.46$

$F_{group} = 343.64/12.198 = 28.17$ , $p < .001$

$F_{covariate} = 830.46/12.198 = 68.08$ , $p < .001$

ANCOVA source table:

Source	SS	$df$	MS	$F$	$p$
Covariate (Pre-PHQ-9)	$830.46$	$1$	$830.46$	$68.08$	$< .001$
Group (Therapy)	$687.27$	$2$	$343.64$	$28.17$	$< .001$
Error	$1049.03$	$86$	$12.198$
Total (adjusted)	$1736.30$	$89$

Adjusted group means:

$\bar{y}_{1(adj)} = 9.80 - 0.7849(18.20-18.167) = 9.80 - 0.026 = 9.774$

$\bar{y}_{2(adj)} = 11.40 - 0.7849(17.90-18.167) = 11.40 + 0.210 = 11.610$

$\bar{y}_{3(adj)} = 16.30 - 0.7849(18.40-18.167) = 16.30 - 0.183 = 16.117$

Standard errors of adjusted means:

$SE_{j(adj)} = \sqrt{12.198\left(\frac{1}{30} + \frac{(\bar{x}_j-18.167)^2}{1347.34}\right)}$

CBT: $SE_{1(adj)} = \sqrt{12.198(0.0333+0.0000)} = \sqrt{0.406} = 0.637$

BA: $SE_{2(adj)} = \sqrt{12.198(0.0333+0.0000)} \approx 0.637$

WL: $SE_{3(adj)} = \sqrt{12.198(0.0333+0.0000)} \approx 0.637$

(Adjustments are minimal here because pre-test means are nearly equal — expected in a randomised design.)

Effect sizes:

$\eta^2_p = 687.27/(687.27+1049.03) = 687.27/1736.30 = 0.396$

$\omega^2_p = (687.27 - 2\times12.198)/(1736.30+12.198) = 662.87/1748.50 = 0.379$

$\varepsilon^2_p = (687.27 - 2\times12.198)/1736.30 = 662.87/1736.30 = 0.382$

$f_p = \sqrt{0.379/0.621} = \sqrt{0.610} = 0.781$

95% CI for $\omega^2_p$ (non-central F, $F_{obs}=28.17$ , $df_1=2$ , $df_2=86$ ):

$\hat{\lambda} = (28.17-1)\times2 = 54.34$

95% CI for $\lambda$ : $[31.48, 83.20]$ (DataStatPro numerical)

$\omega^2_{p,L} = 0.261$ ; $\omega^2_{p,U} = 0.482$

Comparison with one-way ANOVA (without covariate):

From the One-Way ANOVA tutorial, the same data gave $F(2,87)=15.96$ , $\omega^2=0.249$ .

ANCOVA gives $F(2,86)=28.17$ , $\omega^2_p=0.379$ — the pre-test covariate ( $r_{XY(W)} \approx 0.67$ pooled) substantially increased power and effect size.

Post-hoc tests (Tukey HSD on adjusted means):

$q_{3,\;86,\;.05} = 3.369$

$\text{HSD}_{jk} = 3.369 \times \sqrt{12.198(1/30+1/30+(\bar{x}_j-\bar{x}_k)^2/1347.34)/2}$

For CBT vs. BA: $\bar{x}_1-\bar{x}_2 = 0.30$ , $\text{HSD}_{12} \approx 3.369 \times 0.641 = 2.160$

Comparison	Adj. Diff	$SE_{adj}$	$q$	$p_{adj}$	$d_{adj,jk}$	95% CI
CBT vs. BA	$1.836$	$0.641$	$2.864$	$.136$	$0.526$	$[-0.32, 3.99]$
CBT vs. WL	$6.343$	$0.641$	$9.896$	$< .001$	$1.815$	$[4.18, 8.50]$
BA vs. WL	$4.507$	$0.641$	$7.031$	$< .001$	$1.290$	$[2.35, 6.67]$

Where $d_{adj,jk} = |\bar{y}_{j(adj)}-\bar{y}_{k(adj)}|/\sqrt{12.198} = |\cdot|/3.493$

Assumption checks:

Shapiro-Wilk (adjusted residuals): $W = 0.981$ , $p = .259$ — normality not violated.

Levene's test (adjusted residuals): $F(2,87) = 0.71$ , $p = .494$ — homoscedasticity holds.

Covariate balance: $F(2,87) = 0.14$ , $p = .872$ — groups do not differ on pre-PHQ-9 (expected from randomisation). ✅

APA write-up: "A one-way ANCOVA was conducted with therapy type (CBT, BA, Waitlist Control) as the independent variable, post-treatment PHQ-9 as the dependent variable, and pre-treatment PHQ-9 as the covariate. The homogeneity of regression slopes assumption was met ( $F(2, 84) = 0.47$ , $p = .626$ ), and Levene's test indicated equal variances across groups on the adjusted residuals ( $F(2, 87) = 0.71$ , $p = .494$ ). The covariate (pre-treatment PHQ-9) was significantly related to the outcome after controlling for therapy type, $F(1, 86) = 68.08$ , $p < .001$ , $\eta^2_p = .442$ . After controlling for pre-treatment depression, there was a significant effect of therapy type on post-treatment PHQ-9, $F(2, 86) = 28.17$ , $p < .001$ , $\omega^2_p = 0.379$ [95% CI: 0.261, 0.482], indicating a large effect. Adjusted post-treatment means were: CBT ( $M_{adj} = 9.77$ , $SE = 0.64$ ), BA ( $M_{adj} = 11.61$ , $SE = 0.64$ ), and Waitlist Control ( $M_{adj} = 16.12$ , $SE = 0.64$ ). Tukey HSD post-hoc comparisons on adjusted means revealed that both CBT and BA produced significantly lower adjusted post-treatment scores than the Waitlist Control (both $p < .001$ ), with large effects ( $d_{CBT-WL} = 1.82$ [95% CI: 1.19, 2.44]; $d_{BA-WL} = 1.29$ [95% CI: 0.67, 1.90]). CBT and BA did not differ significantly, $d = 0.53$ [95% CI: $-$ 0.09, 1.15], $p = .136$ ."

Example 2: Observational ANCOVA — Teaching Methods with SES Covariate

An educational researcher compares three teaching methods (Lecture, Flipped, Project-based) on standardised test scores in a non-randomised study. Socioeconomic Status (SES) index (0–100) is included as a covariate because it is known to correlate with academic outcomes and groups were self-selected into schools with different SES distributions.

$n_j = 35$ per group; $N = 105$ ; $K = 3$ .

Descriptive statistics:

Group	$n_j$	$\bar{x}_j$ (SES)	$s_{x_j}$	$\bar{y}_j$ (score)	$s_{y_j}$	$r_{xy_j}$
Lecture	35	42.80	12.40	64.20	9.80	$0.53$
Flipped	35	58.60	11.90	72.40	10.20	$0.57$
Project	35	71.30	10.80	78.10	11.40	$0.49$

$\bar{x}_{..} = (35\times42.80+35\times58.60+35\times71.30)/105 = 57.567$

Covariate balance test:

ANOVA with SES as DV: $F(2,102) = 23.84$ , $p < .001$ — groups differ significantly on SES. This is expected in an observational design. ANCOVA will adjust for these pre-existing SES differences.

Homogeneity of regression slopes: $F(2, 99) = 1.18$ , $p = .311$ — slopes are homogeneous. Standard ANCOVA is appropriate.

Pooled within-group sums:

$SS_{XX(W)} = 34(12.40^2)+34(11.90^2)+34(10.80^2)$

$= 34(153.76)+34(141.61)+34(116.64) = 5227.84+4814.74+3965.76 = 14008.34$

$SP_{XY(W)} = 34(12.40)(9.80)(0.53)+34(11.90)(10.20)(0.57)+34(10.80)(11.40)(0.49)$

$= 34(64.426)+34(69.177)+34(60.328) = 2190.48+2352.02+2051.15 = 6593.65$

$SS_{YY(W)} = 34(9.80^2)+34(10.20^2)+34(11.40^2)$

$= 34(96.04)+34(104.04)+34(129.96) = 3265.36+3537.36+4418.64 = 11221.36$

$\hat{\beta} = 6593.65/14008.34 = 0.4707$

$SS_{within(adj)} = 11221.36 - (6593.65)^2/14008.34 = 11221.36 - 3103.24 = 8118.12$

Unadjusted between-groups SS: $SS_{YY(T)} = 12846.91$ ; $SS_{between} = 12846.91-11221.36 = 1625.55$

Total adjusted SS (computed from full data): $SS_{total(adj)} = 9204.87$

$SS_{between(adj)} = 9204.87 - 8118.12 = 1086.75$

$df_{error} = 105-3-1 = 101$ ; $df_B = 2$

$MS_{between(adj)} = 1086.75/2 = 543.38$

$MS_{error(adj)} = 8118.12/101 = 80.377$

$F_{group} = 543.38/80.377 = 6.76$ , $p = .002$

Compare with unadjusted ANOVA:

$MS_{between} = 1625.55/2 = 812.78$ ; $MS_{within} = 11221.36/102 = 110.013$ ; $F_{ANOVA} = 7.39$ , $p = .001$

Note: Here the unadjusted F is slightly larger because the observed group differences on the DV are partly inflated by SES differences. ANCOVA removes the SES contribution, giving a more accurate (though still significant) test.

Adjusted group means:

$\bar{y}_{1(adj)} = 64.20 - 0.4707(42.80-57.567) = 64.20 + 6.947 = 71.147$

$\bar{y}_{2(adj)} = 72.40 - 0.4707(58.60-57.567) = 72.40 - 0.486 = 71.914$

$\bar{y}_{3(adj)} = 78.10 - 0.4707(71.30-57.567) = 78.10 - 6.462 = 71.638$

Effect sizes:

$\eta^2_p = 1086.75/(1086.75+8118.12) = 0.118$

$\omega^2_p = (1086.75-2\times80.377)/(1086.75+8118.12+80.377) = 925.996/9285.247 = 0.100$

$f_p = \sqrt{0.100/0.900} = 0.333$

Post-hoc tests (Tukey HSD on adjusted means):

With adjusted means of 71.15, 71.91, and 71.64 — these are very close.

Comparison	Adj. Diff	$p_{adj}$	$d_{adj,jk}$	95% CI
Lecture vs. Flipped	$0.767$	$.898$	$0.086$	$[-3.42, 4.95]$
Lecture vs. Project	$0.491$	$.960$	$0.055$	$[-3.69, 4.67]$
Flipped vs. Project	$0.276$	$.989$	$0.031$	$[-3.90, 4.46]$

After controlling for SES, none of the pairwise adjusted mean differences are statistically significant. The apparent differences in observed means were largely due to pre-existing SES differences between the self-selected school groups.

APA write-up: "A one-way ANCOVA was conducted to examine the effect of teaching method (Lecture, Flipped, Project-based) on standardised test scores, with SES index as the covariate to control for pre-existing socioeconomic differences between schools. As expected in this non-randomised design, groups differed significantly on the covariate ( $F(2, 102) = 23.84$ , $p < .001$ ). The homogeneity of regression slopes assumption was satisfied ( $F(2, 99) = 1.18$ , $p = .311$ ). The covariate was significantly related to test scores, $F(1, 101) = 38.61$ , $p < .001$ , $\eta^2_p = .277$ . After controlling for SES, there was a statistically significant effect of teaching method, $F(2, 101) = 6.76$ , $p = .002$ , $\omega^2_p = 0.100$ [95% CI: 0.022, 0.196], indicating a medium effect. However, Tukey HSD post-hoc comparisons on the SES-adjusted means revealed no significant pairwise differences between any teaching method pair (all $p > .89$ , $d_{adj}$ range: 0.03–0.09). The adjusted means were nearly identical across methods (Lecture: $M_{adj} = 71.15$ ; Flipped: $M_{adj} = 71.91$ ; Project: $M_{adj} = 71.64$ ). These findings indicate that the observed differences in unadjusted test scores were largely attributable to SES differences between schools rather than to teaching method differences per se."

Example 3: Violated Homogeneity of Slopes — Johnson-Neyman Analysis

A researcher examines the effect of three exercise programmes (Aerobic, Resistance, Combined) on depression scores, using baseline fitness level as a covariate.

$n_j = 25$ per group; $N = 75$ .

Homogeneity of regression slopes test:

$F(2, 69) = 4.82$ , $p = .011$ — slopes are significantly heterogeneous.

Standard ANCOVA is not appropriate. Regression lines are not parallel across groups.

DataStatPro automatically switches to Johnson-Neyman analysis.

The scatterplot shows:

Aerobic programme: strong positive slope ( $\hat{\beta}_1 = 0.84$ ) — less fit individuals benefit more.
Resistance programme: weak slope ( $\hat{\beta}_2 = 0.12$ ) — similar benefit regardless of fitness.
Combined programme: moderate slope ( $\hat{\beta}_3 = 0.51$ ).

Johnson-Neyman regions:

Fitness scores $< 42.3$ : Combined and Aerobic programmes are significantly better than Resistance.
Fitness scores $42.3$ – $68.7$ : No significant programme differences.
Fitness scores $> 68.7$ : Resistance programme significantly outperforms Aerobic for highly fit individuals.

APA write-up: "Preliminary testing indicated that the homogeneity of regression slopes assumption was violated ( $F(2, 69) = 4.82$ , $p = .011$ ), indicating a significant exercise programme × fitness interaction. Consequently, standard ANCOVA was not conducted. Johnson-Neyman analysis was performed to identify regions of the fitness covariate where programme differences were statistically significant. Results indicated that for individuals with fitness scores below 42.3, the Combined and Aerobic programmes produced significantly lower depression scores than Resistance training. For fitness scores above 68.7, the Resistance programme was significantly more effective than the Aerobic programme. No significant programme differences emerged for fitness scores in the range 42.3–68.7 (representing approximately 54% of the sample). These findings suggest that optimal exercise programme selection depends on baseline fitness level."

Example 4: Non-Significant Result with Sensitivity Analysis

A nutritionist tests whether three diets (Mediterranean, Low-carb, Standard) produce different weight loss over 12 weeks, controlling for baseline BMI. $n_j = 20$ per group ( $N = 60$ ; $K = 3$ ; $p = 1$ ).

Results: $F(2, 56) = 2.14$ , $p = .128$ , $\omega^2_p = 0.025$ [95% CI: 0.000, 0.115].

Homogeneity of slopes: $F(2, 54) = 0.88$ , $p = .421$ ✅

Levene's (adjusted residuals): $F(2, 57) = 1.23$ , $p = .299$ ✅

Covariate $F$ : $F(1, 56) = 11.42$ , $p = .001$ — baseline BMI is significantly related to weight loss ✅

Sensitivity analysis:

$f_{p,min} = \sqrt{7.849/(60-1)} = \sqrt{0.133} = 0.365$

$\omega^2_{p,min} = 0.365^2/(1+0.365^2) = 0.133/1.133 = 0.118$

The study had 80% power to detect $\omega^2_p \geq 0.118$ only. The observed $\omega^2_p = 0.025$ is well below this threshold; the study was underpowered for small effects.

APA write-up: "A one-way ANCOVA with diet type as the IV, 12-week weight loss as the DV, and baseline BMI as the covariate revealed no significant effect of diet type after controlling for BMI, $F(2, 56) = 2.14$ , $p = .128$ , $\omega^2_p = 0.025$ [95% CI: 0.000, 0.115]. Baseline BMI was a significant predictor of weight loss, $F(1, 56) = 11.42$ , $p = .001$ . All ANCOVA assumptions were satisfied. This study had 80% power to detect effects of $f_p \geq 0.365$ ( $\omega^2_p \geq 0.118$ ); the observed effect ( $\omega^2_p = 0.025$ ) falls below this detection threshold, indicating the study was underpowered to detect small diet effects. A larger sample ( $n \geq 85$ per group for 80% power at $f_p = 0.25$ ) would be required to draw conclusions about small-to-medium diet differences after BMI adjustment."

15. Common Mistakes and How to Avoid Them

Mistake 1: Not Testing the Homogeneity of Regression Slopes

Problem: Running ANCOVA without testing whether the within-group regression slopes are equal across groups. If slopes differ substantially, the ANCOVA F-test is invalid because a single pooled slope cannot adequately represent the covariate-DV relationship across all groups.

Solution: Always run the interaction F-test (group × covariate) before standard ANCOVA. Report the interaction test result. If it is significant ( $p < .05$ ), do not use standard ANCOVA — use Johnson-Neyman analysis or moderated regression instead. DataStatPro flags this automatically with a red warning when the interaction is significant.

Mistake 2: Interpreting Adjusted Means Without Reporting Unadjusted Means

Problem: Reporting only adjusted (covariate-controlled) means without reporting observed (unadjusted) means. Readers need both to understand how the covariate changed the picture.

Solution: Always report a table containing both unadjusted ( $\bar{y}_j$ , $SD_j$ ) and adjusted means ( $\bar{y}_{j(adj)}$ , $SE_{j(adj)}$ ) for each group. Note the direction and magnitude of adjustment. Large adjustments signal that groups differed substantially on the covariate.

Mistake 3: Selecting Covariates Based on the Data (Post-Hoc Covariate Selection)

Problem: Including a covariate because it improves the significance of the group effect, or because it reduces a non-significant result to non-significance. This is p-hacking and produces results that do not replicate.

Solution: Specify covariates on theoretical grounds before data collection. Pre-register the covariate choice. If exploratory analyses suggest additional covariates, report them as exploratory and cross-validate in an independent sample.

Mistake 4: Using ANCOVA in Non-Randomised Designs as a Full Substitute for Randomisation

Problem: Claiming that ANCOVA "removes all confounding" in observational studies, yielding valid causal comparisons. ANCOVA controls only for the measured covariate and only if that covariate is measured without error and the linearity/slopes assumptions are met. Unmeasured confounders and measurement error in the covariate leave residual confounding.

Solution: Explicitly acknowledge the limitations of ANCOVA as a statistical control in non-randomised designs. Use causal language cautiously ("after adjusting for..." rather than "controlling for confounding from..."). Report reliability of the covariate. Conduct sensitivity analyses (e.g., E-value analysis for unmeasured confounding).

Mistake 5: Running Post-Hoc Tests on Unadjusted Means After ANCOVA

Problem: After a significant ANCOVA omnibus F, comparing groups using observed (unadjusted) means in pairwise t-tests or ANOVA post-hoc tests. This ignores the covariate adjustment and produces incorrect comparisons.

Solution: Always run post-hoc tests on adjusted means using the ANCOVA error term ( $MS_{error(adj)}$ , $df = N-K-p$ ). DataStatPro automatically applies post-hoc tests to adjusted means when run after ANCOVA.

Mistake 6: Including a Covariate That is Caused by the Treatment

Problem: Using a post-treatment variable (measured after the treatment began) as an ANCOVA covariate. If the treatment affected the covariate, adjusting for it removes part of the treatment effect — over-controlling bias that can reverse or eliminate genuine treatment effects.

Solution: Only include covariates measured before treatment began (or measured concurrently but logically independent of the treatment). Use causal diagrams (DAGs) to identify appropriate adjustment sets.

Mistake 7: Reporting Partial Effect Sizes Without Labelling Them as Partial

Problem: Reporting $\eta^2_p = 0.35$ without labelling it as "partial" or distinguishing it from total $\eta^2$ . Partial $\eta^2_p$ is always larger than total $\eta^2$ in ANCOVA (because the covariate variance is removed from the denominator). Readers familiar only with ANOVA $\eta^2$ will overestimate the effect.

Solution: Always label partial effect sizes explicitly: " $\omega^2_p =$ [value] (partial)" and note that this is the proportion of covariate-adjusted DV variance explained by group membership. Report total $\eta^2$ when comparing across ANOVA and ANCOVA analyses.

Mistake 8: Failing to Report the Covariate's Effect

Problem: Reporting only the group F-test from ANCOVA and ignoring the covariate F-test and slope. The covariate result is important for validating the ANCOVA approach (a non-significant covariate suggests it should not have been included) and for quantifying the covariate's relationship with the DV.

Solution: Always report: the covariate $F$ -test, degrees of freedom, p-value, the regression coefficient $\hat{\beta}$ with its 95% CI, and the partial $\eta^2$ for the covariate. A non-significant covariate F-test ( $p > .20$ ) is a warning sign that the covariate may not be worth including.

Mistake 9: Applying ANCOVA When Groups Differ Substantially on the Covariate (Extrapolation Risk)

Problem: In observational studies where groups differ substantially on the covariate, the adjusted means are estimated at the grand mean of the covariate — a value that may lie outside the observed range of one or more groups. This constitutes extrapolation beyond the data, producing adjusted means that are model-dependent and potentially meaningless.

Solution: Check whether the grand mean covariate value $\bar{x}_{..}$ falls within the observed range of each group's covariate scores. If not, consider restricting the analysis to participants within the common support region (matching or trimming). Report the covariate range for each group and acknowledge extrapolation concerns when they arise.

Mistake 10: Not Reporting Both Unadjusted and Adjusted Analyses for Observational Studies

Problem: In observational research, reporting only the ANCOVA results without showing the unadjusted ANOVA results makes it impossible for readers to assess how much the covariate changed the conclusions.

Solution: Report both unadjusted (ANOVA) and adjusted (ANCOVA) results in parallel, including both observed and adjusted group means. Explicitly describe the direction and magnitude of covariate adjustment and discuss what it implies about the group difference.

16. Troubleshooting

Problem	Likely Cause	Solution
Homogeneity of slopes test is significant	True group × covariate interaction; different relationships in different groups	Use Johnson-Neyman analysis; report interaction as a finding; do not use standard ANCOVA
Adjusted means are outside the observed range	Large covariate mean differences between groups; extrapolation	Check common support; restrict to overlap region; acknowledge extrapolation
Non-significant group F in ANCOVA but significant in ANOVA	Covariate removes variance that mediated or confounded group differences	Report both analyses; discuss whether adjustment is appropriate (randomised vs. observational)
Significant group F in ANCOVA but not in ANOVA	Covariate reduces error variance substantially; ANCOVA is more powerful	Expected outcome; ANCOVA result is preferred when covariate assumptions are met
Adjusted means barely differ from unadjusted means	Groups have similar covariate means (especially in RCTs); little adjustment needed	This is fine and expected in randomised designs; ANCOVA still increases power via error reduction
$\omega^2_p$ is negative	True partial effect near zero; small sample; bias correction overshoots	Report as 0 by convention; note small or negligible effect; increase $N$
Covariate F is non-significant	Covariate not linearly related to DV within groups	Consider whether covariate was correctly measured; adding it consumes $df$ without power gain — consider removing it
Very large $\hat{\beta}$ with wide CI	Small $SS_{XX(W)}$ (little within-group covariate variation)	Check covariate distribution; if groups are very similar on covariate, the slope is poorly estimated
Cook's distance flags many influential points	Extreme covariate or DV values; potentially meaningful outliers	Investigate each flagged observation; report analyses with and without influential points
Levene's test significant on adjusted residuals	Heteroscedastic groups after covariate adjustment	Use heteroscedastic ANCOVA (HC3); use Games-Howell for post-hoc tests
Groups overlap perfectly on adjusted means despite significant F	Very small pairwise differences with high power; omnibus driven by overall pattern	Report all pairwise comparisons; some effects may be very small but statistically significant with large $N$
Ranked ANCOVA and standard ANCOVA give contradictory results	Non-normality or outliers distorting parametric results; ranked analysis more robust	Report both; prefer ranked ANCOVA when normality is violated; investigate outliers
Post-hoc tests all non-significant despite $F < .001$	Effect distributed across many similar pairwise differences; no single pair drives it	Inspect all adjusted means; the omnibus test can be sensitive to a pattern of small consistent differences across many pairs
Multiple covariates produce collinearity warnings	Covariates are highly intercorrelated; redundant information	Remove the least theoretically important correlated covariate; or use dimension reduction (PCA) as a single composite covariate
Adjusted $SE$ much larger than unadjusted $SE$	Large covariate mean differences between groups; additional adjustment uncertainty	This is expected in unbalanced observational designs; report both SEs; acknowledge wide CIs
ANCOVA and gain score analysis reach different conclusions	Lord's Paradox; pre-test means differ between groups	Use causal diagram to determine which analysis addresses the research question; report both with explicit interpretation of each

17. Quick Reference Cheat Sheet

Core ANCOVA Formulas

Formula	Description
$\hat{\beta} = SP_{XY(W)}/SS_{XX(W)}$	Pooled within-group regression coefficient
$\bar{y}_{j(adj)} = \bar{y}_j - \hat{\beta}(\bar{x}_j-\bar{x}_{..})$	Adjusted group mean
$SS_{within(adj)} = SS_{YY(W)} - SP_{XY(W)}^2/SS_{XX(W)}$	Adjusted error SS
$SS_{total(adj)} = SS_{YY(T)} - SP_{XY(T)}^2/SS_{XX(T)}$	Adjusted total SS
$SS_{between(adj)} = SS_{total(adj)} - SS_{within(adj)}$	Adjusted between-groups SS
$SS_{covariate} = SP_{XY(W)}^2/SS_{XX(W)}$	Covariate SS (reduction in error)
$df_B = K-1$ ; $df_{err} = N-K-p$ ; $df_{cov} = p$	Degrees of freedom
$MS_{B(adj)} = SS_{B(adj)}/(K-1)$	Adjusted between-groups MS
$MS_{err(adj)} = SS_{W(adj)}/(N-K-p)$	Adjusted error MS
$F = MS_{B(adj)}/MS_{err(adj)}$	ANCOVA F-ratio for group effect
$F_{cov} = MS_{cov}/MS_{err(adj)}$	F-ratio for covariate
$SE_{j(adj)} = \sqrt{MS_{err(adj)}(1/n_j+(\bar{x}_j-\bar{x}_{..})^2/SS_{XX(W)})}$	SE of adjusted mean
$p = P(F_{K-1,\;N-K-p} \geq F_{obs})$	p-value for group effect

Effect Size Formulas

Formula	Description
$\eta^2_p = SS_{B(adj)}/(SS_{B(adj)}+SS_{W(adj)})$	Partial eta squared (biased)
$\omega^2_p = (SS_{B(adj)}-(K-1)MS_{err})/(SS_{B(adj)}+SS_{W(adj)}+MS_{err})$	Partial omega squared (preferred)
$\varepsilon^2_p = (SS_{B(adj)}-(K-1)MS_{err})/(SS_{B(adj)}+SS_{W(adj)})$	Partial epsilon squared
$f_p = \sqrt{\omega^2_p/(1-\omega^2_p)}$	Cohen's $f_p$ (from $\omega^2_p$ )
$\eta^2_p = F\cdot df_B/(F\cdot df_B + df_{err})$	$\eta^2_p$ from $F$
$\omega^2_p \approx (F-1)df_B/((F-1)df_B + N)$	$\omega^2_p$ from $F$ (approximate)
$d_{adj,jk} = (\bar{y}_{j(adj)}-\bar{y}_{k(adj)})/\sqrt{MS_{err(adj)}}$	Cohen's $d_{adj}$ for pairwise
$r^2_{XY(W)} = SP_{XY(W)}^2/(SS_{XX(W)}\cdot SS_{YY(W)})$	Pooled within-group $r^2$
$\eta^2_{p,cov} = SS_{cov}/(SS_{cov}+SS_{W(adj)})$	Partial $\eta^2$ for covariate

ANCOVA Source Table Template

Source	SS	$df$	MS	$F$	$p$
Covariate(s)	$SS_{cov}$	$p$	$MS_{cov}$	$MS_{cov}/MS_{err}$	[value]
Between groups (adjusted)	$SS_{B(adj)}$	$K-1$	$MS_{B(adj)}$	$MS_{B(adj)}/MS_{err}$	[value]
Error (adjusted)	$SS_{W(adj)}$	$N-K-p$	$MS_{err(adj)}$
Total (adjusted)	$SS_{T(adj)}$	$N-1-p$

ANCOVA vs. ANOVA Comparison

Feature	One-Way ANOVA	ANCOVA
Covariate	None	$p \geq 1$ continuous covariates
Error $df$	$N-K$	$N-K-p$
Error SS	$SS_{YY(W)}$	$SS_{YY(W)} - SP^2_{XY(W)}/SS_{XX(W)}$
Group means tested	Observed $\bar{y}_j$	Adjusted $\bar{y}_{j(adj)}$
Additional assumptions	None	Homogeneity of slopes; linearity; covariate independence
Power vs. ANOVA	Baseline	Higher when $
Effect size metric	$\omega^2$	$\omega^2_p$ (partial)
Post-hoc tests	On observed means	On adjusted means

ANCOVA Reporting Checklist

Item	Required
Statement of ANCOVA variant used (standard vs. heteroscedastic)	✅ Always
Covariate(s) named with justification for inclusion	✅ Always
Homogeneity of regression slopes test result	✅ Always
Levene's test on adjusted residuals	✅ Always
Shapiro-Wilk on adjusted residuals	✅ When $n_j < 50$
Covariate $F$ , $df$ , $p$ , $\hat{\beta}$ , CI, and partial $\eta^2$	✅ Always
Group $F(df_B, df_{err})$ , exact $p$	✅ Always
$\omega^2_p$ with 95% CI (primary partial effect size)	✅ Always
$\eta^2_p$ (labelled as biased, partial)	✅ When journals require it
Both unadjusted AND adjusted group means	✅ Always
SDs (unadjusted) and SEs (adjusted) for all groups	✅ Always
Sample sizes per group	✅ Always
Post-hoc test name and correction method	✅ When omnibus $F$ significant
Post-hoc tests applied to adjusted means	✅ When omnibus $F$ significant
All pairwise adjusted mean differences with $p_{adj}$ and $d_{adj,jk}$	✅ When omnibus $F$ significant
95% CI for each pairwise adjusted mean difference	✅ Recommended
Covariate balance test result	✅ Always
Linearity check result	✅ Recommended
Cook's distance / influential observations check	✅ Recommended
$\varepsilon^2_p$ alongside $\omega^2_p$	✅ Recommended
Cohen's $f_p$ for power analysis reference	✅ When reporting power
Sensitivity analysis (min detectable effect)	✅ For null results
Acknowledgement of covariate limitations (reliability, residual confounding)	✅ For observational studies
Johnson-Neyman region if slopes heterogeneous	✅ When slopes violated
Scatterplot with per-group regression lines	✅ Strongly recommended
Adjusted means plot (EMM plot) with 95% CIs	✅ Strongly recommended

APA 7th Edition Reporting Templates

Standard ANCOVA (significant result):

"A one-way ANCOVA was conducted to examine the effect of [IV] on [DV], with [covariate name] included as a covariate [rationale: e.g., 'to control for pre-existing differences in...']. The homogeneity of regression slopes assumption was met ( $F([K-1], [N-2K]) =$ [value], $p =$ [value]). Levene's test on adjusted residuals indicated [equal / unequal] variances ( $F([K-1], [N-K]) =$ [value], $p =$ [value]). The covariate was significantly / not significantly related to the outcome after controlling for group, $F(1, [N-K-1]) =$ [value], $p =$ [value], $\hat{\beta} =$ [value] [95% CI: LB, UB], $\eta^2_p =$ [value]. After controlling for [covariate], there was a significant effect of [IV] on [DV], $F([K-1], [N-K-1]) =$ [value], $p =$ [value], $\omega^2_p =$ [value] [95% CI: LB, UB], indicating a [small / medium / large] effect. Adjusted means were: [Group 1] ( $M_{adj} =$ [value], $SE =$ [value]), [Group 2] ( $M_{adj} =$ [value], $SE =$ [value]), [etc.]. [Post-hoc test] comparisons on adjusted means revealed that [describe pairwise results]."

Heteroscedastic ANCOVA (unequal variances):

"Due to significant heterogeneity of variance on adjusted residuals (Levene's $F([K-1], [N-K]) =$ [value], $p =$ [value]), a heteroscedastic ANCOVA using HC3 variance estimation was applied. The test revealed a [significant / non-significant] effect of [IV] on [DV] after controlling for [covariate], $F([K-1], [df_W]) =$ [value], $p =$ [value], $\omega^2_p =$ [value] [95% CI: LB, UB]. Games-Howell post-hoc comparisons on adjusted means revealed [describe results]."

Violated homogeneity of slopes (Johnson-Neyman):

"Preliminary testing indicated that the homogeneity of regression slopes assumption was violated ( $F([K-1], [N-2K]) =$ [value], $p =$ [value]), indicating a significant [IV] × [covariate] interaction. Consequently, standard ANCOVA was not conducted. Johnson-Neyman analysis revealed that [group difference] was statistically significant when [covariate name] was below [J-N value] and above [J-N value], but not for [covariate] values in the range [lower, upper]."

Non-significant result with sensitivity analysis:

"A one-way ANCOVA revealed no significant effect of [IV] on [DV] after controlling for [covariate], $F([K-1], [N-K-1]) =$ [value], $p =$ [value], $\omega^2_p =$ [value] [95% CI: LB, UB]. Given the sample sizes ( $n_j =$ [value] per group), this study had power to detect partial effects of $f_p \geq$ [value] ( $\omega^2_p \geq$ [value]) at 80% power. The observed $\omega^2_p =$ [value] falls below this detection threshold."

Conversion Formulas

From	To	Formula
$F$ , $df_B$ , $df_{err}$	$\eta^2_p$	$\eta^2_p = F\cdot df_B/(F\cdot df_B+df_{err})$
$F$ , $df_B$ , $N$	$\omega^2_p$ (approx.)	$\omega^2_p \approx (F-1)df_B/((F-1)df_B+N)$
$\eta^2_p$	$f_p$	$f_p = \sqrt{\eta^2_p/(1-\eta^2_p)}$
$\omega^2_p$	$f_p$	$f_p = \sqrt{\omega^2_p/(1-\omega^2_p)}$
$f_p$	$\eta^2_p$	$\eta^2_p = f_p^2/(1+f_p^2)$
ANOVA $\omega^2$ + $r^2_{XY(W)}$	ANCOVA $\omega^2_p$ (approx.)	$\omega^2_p \approx \omega^2/(1-r^2_{XY(W)})$
Pairwise adj. difference	$d_{adj,jk}$	$d_{adj,jk} = \Delta\bar{y}_{adj}/\sqrt{MS_{err(adj)}}$
$d_{adj,jk}$	$g_{adj,jk}$ (Hedges')	$g_{adj,jk} = d_{adj,jk}\times(1-3/(4(N-K-p)-1))$

Power Gain from Covariate Reference

$r_{XY(W)}$	$r^2_{XY(W)}$	Error variance retained	Power multiplier (approx.)
$0.10$	$0.010$	$99.0\%$	$1.005$
$0.20$	$0.040$	$96.0\%$	$1.021$
$0.30$	$0.090$	$91.0\%$	$1.049$
$0.40$	$0.160$	$84.0\%$	$1.091$
$0.50$	$0.250$	$75.0\%$	$1.155$
$0.60$	$0.360$	$64.0\%$	$1.250$
$0.70$	$0.490$	$51.0\%$	$1.401$
$0.80$	$0.640$	$36.0\%$	$1.667$
$0.90$	$0.810$	$19.0\%$	$2.294$

Power multiplier $\approx 1/\sqrt{1-r^2_{XY(W)}}$ relative to ANOVA (ignores $df$ cost).

Assumption Checks Reference

Assumption	Test	Action if Violated
Homogeneity of regression slopes	Group × covariate interaction F-test	Johnson-Neyman; moderated regression
Normality (adjusted residuals)	Shapiro-Wilk, Q-Q plot	Quade test; ranked ANCOVA; transform DV
Homoscedasticity (adjusted residuals)	Levene's, Brown-Forsythe	Heteroscedastic ANCOVA (HC3) + Games-Howell
Independence	Design review	Multilevel ANCOVA
Linearity	Scatterplot; residual vs. CV; polynomial F	Add $X^2$ ; transform covariate
Covariate independence from treatment	Covariate balance test; timing check	Use pre-treatment covariates only; acknowledge confounding
Covariate reliability	Report $\alpha$ or $r_{tt}$	Reliability-corrected ANCOVA; report as limitation
No influential outliers	Cook's $D$ ; leverage; studentised residuals	Investigate; report sensitivity; Quade test

Post-Hoc Test Selection Guide (ANCOVA)

Condition	Recommended Test	Controls FWER
Balanced, equal adj. variances	Tukey HSD on adjusted means	✅ Exactly
Unbalanced, equal adj. variances	Tukey-Kramer on adjusted means	✅ Approximately
Unequal adj. variances	Games-Howell on adjusted means	✅ Approximately
All groups vs. one control	Dunnett's on adjusted means	✅ Optimal
Any design, conservative	Bonferroni on adjusted means	✅ Conservative
Any design, less conservative	Holm-Bonferroni on adjusted means	✅ Sequential
Pre-planned specific contrasts	Planned contrasts on adjusted means	✅ Reduced $m$
Heterogeneous slopes	Johnson-Neyman analysis	N/A (regions, not FWER)
Non-parametric DV	Dunn + Holm on rank residuals	✅ Sequential

Degrees of Freedom Reference

Source	$df$ (1 covariate)	$df$ ( $p$ covariates)
Between groups (adjusted)	$K-1$	$K-1$
Covariate	$1$	$p$
Error (adjusted)	$N-K-1$	$N-K-p$
Total (adjusted)	$N-2$	$N-1-p$
Slopes interaction test	$K-1$	$p(K-1)$

Cohen's Benchmarks — ANCOVA Partial Effect Sizes

Label	$\omega^2_p$	$\varepsilon^2_p$	$\eta^2_p$	$f_p$
Small	$0.01$	$0.01$	$0.01$	$0.10$
Medium	$0.06$	$0.06$	$0.06$	$0.25$
Large	$0.14$	$0.14$	$0.14$	$0.40$

These benchmarks are identical to ANOVA benchmarks for partial effect sizes but apply to the covariate-adjusted variance proportion.

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting ANCOVA and its alternatives within the DataStatPro application. For further reading, consult Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for accessible applied coverage of ANCOVA; Maxwell, Delaney & Kelley's "Designing Experiments and Analyzing Data" (3rd ed., 2018) for rigorous methodological depth including regression slopes homogeneity and planned contrasts; Rutherford's "Introducing ANOVA and ANCOVA: A GLM Approach" (2001) for a focused GLM-framework treatment; Wilcox's "Introduction to Robust Estimation and Hypothesis Testing" (4th ed., 2017) for the Quade test and robust ANCOVA alternatives; Miller & Chapman (2001) in the Journal of Abnormal Psychology for a lucid discussion of misuse of ANCOVA in non-randomised designs; Senn (2006) in Statistics in Medicine for Lord's Paradox and the ANCOVA vs. gain score debate; Bauer & Curran (2005) in Psychological Methods for probing interactions and Johnson-Neyman regions; and Lakens (2013) in Frontiers in Psychology for the $\omega^2$ vs. $\eta^2$ discussion applied to ANCOVA. For feature requests or support, contact the DataStatPro team.

Analysis of Covariance (ANCOVA)

ANCOVA: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 From One-Way ANOVA to ANCOVA

1.2 What is a Covariate?

1.3 The Two Goals of ANCOVA

1.4 The Regression Foundation of ANCOVA

1.5 Variance Partitioning in ANCOVA

1.6 Adjusted Means

1.7 The Homogeneity of Regression Slopes Assumption

1.8 ANCOVA vs. Gain Score Analysis

2. What is ANCOVA?

2.1 The Core Idea

2.2 What ANCOVA Tests and Does Not Test

2.3 Design Requirements

2.4 ANCOVA in Context

2.5 Real-World Applications

3. The Mathematics Behind ANCOVA

3.1 Notation

3.2 The Within-Group Regression Coefficient

3.3 Adjusted Group Means

3.4 Sum of Squares Decomposition in ANCOVA

3.5 Degrees of Freedom

3.6 Mean Squares and the F-Ratio

3.7 The F-Test for the Covariate

3.8 The ANCOVA Source Table

3.9 Computing the Pooled Within-Group Correlation

3.10 Adjusted Standard Error for Adjusted Means

3.11 Multiple Covariates

3.12 Computing Effect Sizes from the ANCOVA Table

4. Assumptions of ANCOVA

4.1 Normality of Residuals

4.2 Homogeneity of Variance (Homoscedasticity)

4.3 Independence of Observations

4.4 Interval Scale of Measurement

4.5 Homogeneity of Regression Slopes ⭐ CRITICAL

4.6 Independence of Covariate and Treatment (Group Membership)

4.7 Linearity of Covariate-DV Relationship

4.8 Reliability of the Covariate

4.9 Absence of Influential Outliers

4.10 Assumption Summary Table

5. Variants of ANCOVA

5.1 Standard One-Way ANCOVA

5.2 ANCOVA with Multiple Covariates

5.3 Welch-Type Heteroscedastic ANCOVA

5.4 ANCOVA with Categorical Covariate (Blocking Factor)

5.5 ANCOVA for Pre-Post Designs (Pre-Test as Covariate)

5.6 Johnson-Neyman ANCOVA (Heterogeneous Slopes)

5.7 Choosing Between Variants

6. Using the ANCOVA Calculator Component

Step-by-Step Guide

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

Step 1 — State the Hypotheses

Step 2 — Compute Descriptive Statistics per Group

Step 3 — Check Assumption: Homogeneity of Regression Slopes

Step 4 — Compute Within-Group Sums of Squares and Cross-Products

Step 5 — Compute Total Sums of Squares and Cross-Products

Step 6 — Compute Between-Group Sums of Squares and Cross-Products

Step 7 — Compute the Pooled Within-Group Regression Coefficient

Step 8 — Compute Adjusted Sums of Squares

Step 9 — Compute Degrees of Freedom

Step 10 — Compute Mean Squares and F-Ratios

Step 11 — Compute Adjusted Group Means

Step 12 — Check Remaining Assumptions

Step 13 — Compute Effect Sizes

Step 14 — Compute 95% CI for ωp2\omega^2_pωp2​

Step 15 — Conduct Post-Hoc Tests on Adjusted Means (if F significant)

Step 16 — Interpret and Report

8. Effect Sizes for ANCOVA

8.1 Partial vs. Total Effect Sizes in ANCOVA

8.2 Partial Eta Squared (ηp2\eta^2_pηp2​) — Common but Biased

8.3 Partial Omega Squared (ωp2\omega^2_pωp2​) — Preferred

8.4 Partial Epsilon Squared (εp2\varepsilon^2_pεp2​) — Alternative Correction

8.5 Cohen's fpf_pfp​ — For Power Analysis

8.6 Cohen's dadjd_{adj}dadj​ for Pairwise Comparisons

8.7 Variance Explained by the Covariate

8.8 Comparison of ANOVA vs. ANCOVA Effect Sizes

9. Post-Hoc Tests and Planned Contrasts

Step 14 — Compute 95% CI for $\omega^2_p$

8.2 Partial Eta Squared ( $\eta^2_p$ ) — Common but Biased

8.3 Partial Omega Squared ( $\omega^2_p$ ) — Preferred

8.4 Partial Epsilon Squared ( $\varepsilon^2_p$ ) — Alternative Correction

8.5 Cohen's $f_p$ — For Power Analysis

8.6 Cohen's $d_{adj}$ for Pairwise Comparisons

10.3 95% CI for the Regression Coefficient $\hat{\beta}$

10.4 95% CI for $\omega^2_p$