Multivariate Linear Models: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of Multivariate Linear Models (MLM) all the way through advanced estimation, hypothesis testing, model diagnostics, coefficient interpretation, and practical usage within the DataStatPro application. Whether you are a complete beginner or an experienced analyst, this guide is structured to build your understanding step by step.
Table of Contents
- Prerequisites and Background Concepts
- What are Multivariate Linear Models?
- The Mathematical Framework
- Model Specification and Design Matrices
- Assumptions of Multivariate Linear Models
- Parameter Estimation
- Hypothesis Testing and Inference
- Multivariate Test Statistics
- Effect Size Measures
- Model Fit and Evaluation
- Model Diagnostics and Residuals
- Interpretation of Coefficients
- Variable Selection and Model Comparison
- Special Cases and Connections to Other Methods
- Using the Multivariate Linear Models Component
- Computational and Formula Details
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into Multivariate Linear Models, it is helpful to be familiar with the following foundational concepts. Do not worry if you are not — each concept is briefly explained here.
1.1 Ordinary Least Squares (OLS) Regression
Ordinary Least Squares (OLS) regression models a single continuous response as a linear function of predictors :
In matrix form:
The OLS estimator minimises the sum of squared residuals:
Multivariate Linear Models extend this framework to the case where there are multiple response variables measured simultaneously.
1.2 Matrices and Linear Algebra
Key matrix operations used throughout this tutorial:
- Matrix transpose : Swaps rows and columns.
- Matrix inverse : Exists when is square and full rank; .
- Trace : Sum of diagonal elements.
- Determinant : Scalar summarising the matrix; zero if is singular.
- Kronecker product : Block matrix of all pairwise products of elements of with .
- Vec operator : Stacks the columns of into a single column vector.
1.3 The Multivariate Normal Distribution
The multivariate normal distribution has density:
Where is the mean vector and is the covariance matrix. MLMs assume that the response vectors follow this distribution conditionally on the predictors.
1.4 Covariance and Correlation Matrices
The covariance matrix () for response variables contains:
- Variances on the diagonal.
- Covariances off-diagonal.
The correlation matrix standardises covariances:
1.5 Eigenvalues and Eigenvectors
For a square matrix , eigenvalue-eigenvector pairs satisfy . They are critical for computing multivariate test statistics and understanding the structure of relationships in MLMs.
1.6 The Hat Matrix
In OLS regression, the hat (projection) matrix:
Projects the response vector onto the column space of : . The hat matrix diagonal elements measure the leverage of each observation. The same matrix plays the same role in MLMs, now projecting each response variable.
2. What are Multivariate Linear Models?
2.1 The Core Idea
A Multivariate Linear Model (MLM) — also called multivariate multiple regression or the general multivariate linear model — simultaneously models the linear relationship between a set of predictor variables (independent variables) and multiple continuous response variables (dependent variables).
Where univariate multiple regression has a single response (), the multivariate linear model has a response matrix () with response variables measured on the same observations.
The model is:
Where:
- () = matrix of response variables.
- () = design matrix of predictors (including a column of ones for the intercept).
- () = matrix of regression coefficients (one row per predictor, one column per response).
- () = matrix of errors.
2.2 Why Use MLM Instead of Separate Regressions?
A natural question is: "Why not simply run separate univariate regressions, one for each response?" There are several compelling reasons to prefer the multivariate approach:
| Reason | Explanation |
|---|---|
| Controls familywise Type I error | Running separate tests inflates the overall error rate. MLM tests all responses simultaneously at a single . |
| Exploits correlations among responses | Separate regressions ignore that responses are correlated. MLM uses the full error covariance structure, improving efficiency. |
| Detects multivariate effects | Predictors may not significantly relate to any single response but may have a significant combined effect on the response set. MLM detects these patterns. |
| Produces joint confidence regions | MLM yields joint confidence regions for multiple coefficients simultaneously — something separate regressions cannot provide. |
| Tests multivariate hypotheses directly | Hypotheses about linear combinations of coefficients across responses (e.g., "Does affect and equally?") can be tested directly in MLM. |
| More powerful when responses are correlated | When responses share common error variance, accounting for correlations can increase power compared to separate analyses. |
2.3 The Generality of the Multivariate Linear Model
The multivariate linear model is remarkably general. Many seemingly distinct statistical procedures are special cases:
| Special Case | Description |
|---|---|
| Univariate Multiple Regression | (single response) |
| Multivariate ANOVA (MANOVA) | contains only categorical predictors (dummy-coded) |
| Multivariate ANCOVA (MANCOVA) | contains both categorical and continuous predictors |
| Seemingly Unrelated Regression (SUR) | regression equations with different predictors per equation |
| Profile Analysis | contains group indicators; contains repeated measures |
| Canonical Correlation Analysis | Tests overall association between predictor set and response set |
| Discriminant Function Analysis | contains group indicators; follows from MLM eigenstructure |
2.4 Real-World Applications
Multivariate Linear Models are used across a wide range of applied fields:
- Neuroscience: Predicting multiple brain region activation levels simultaneously from experimental conditions, participant characteristics, or cognitive task measures.
- Environmental Science: Modelling multiple water quality indicators (pH, dissolved oxygen, turbidity, nitrate) jointly as functions of environmental predictors (rainfall, temperature, land use).
- Finance: Predicting multiple asset returns simultaneously from macroeconomic factors (interest rates, inflation, GDP growth).
- Clinical Medicine: Relating treatment variables to multiple physiological endpoints (blood pressure, cholesterol, glucose, BMI) simultaneously.
- Educational Research: Modelling multiple academic outcomes (reading, mathematics, science, writing) as joint functions of student and school characteristics.
- Agricultural Science: Predicting multiple crop yield components (grain weight, harvest index, protein content) from soil, climate, and management variables.
- Marketing Research: Modelling multiple consumer attitude dimensions simultaneously from demographic and psychographic predictors.
- Psychometrics: Relating latent constructs to multiple observed scale scores while accounting for the shared measurement structure.
3. The Mathematical Framework
3.1 The Multivariate Linear Model
The core model is:
Each row (observation ) satisfies:
Where:
- () = predictor vector for observation (including intercept).
- () = error covariance matrix (assumed common to all observations).
- Errors are independent across observations: for .
Written differently, the distribution of each row is:
3.2 The Coefficient Matrix
The coefficient matrix has dimensions :
- The -th column contains the coefficients for the -th response variable — identical to what you would get from running univariate OLS regression of on .
- The -th row (for ) contains the effects of predictor on all response variables simultaneously — the multivariate coefficient vector for predictor .
3.3 The Error Structure
The error matrix has the following properties:
This means the errors are identically and independently distributed (i.i.d.) multivariate normal. Equivalently, using the vec operator:
The Kronecker product captures the covariance structure: observations are independent (block-diagonal structure from ) but responses within each observation may be correlated (captured by ).
3.4 The Conditional Mean and Variance
For a new observation with predictor vector :
The predicted response vector is:
The predicted response matrix for all observations:
Where is the hat matrix.
3.5 The General Linear Hypothesis
The power of the multivariate linear model framework lies in the ability to test very general hypotheses of the form:
Where:
- () = hypothesis (contrast) matrix for predictors — specifies which linear combinations of rows of are being tested.
- () = response transformation matrix — specifies which linear combinations of columns of (i.e., response variables or contrasts among responses) are being tested.
- () = hypothesised value matrix (usually for testing no effect).
This general framework subsumes all the special cases mentioned in Section 2.3.
Examples of hypotheses:
| Hypothesis | ||
|---|---|---|
| Test all predictors simultaneously | (omitting intercept row) | |
| Test effect of predictor on all DVs | (unit vector) | |
| Test effect of all predictors on DV | (unit vector) | |
| Test if has equal effects on and | ||
| MANOVA group effect | Group contrast matrix |
4. Model Specification and Design Matrices
4.1 The Design Matrix
The design matrix () encodes all predictor information. The first column is typically a column of ones for the intercept:
The design matrix can accommodate:
- Continuous predictors: Entered directly as columns.
- Categorical predictors: Encoded as dummy variables ( columns for categories).
- Interaction terms: Products of two or more predictor columns.
- Polynomial terms: Powers of continuous predictors (, , etc.).
- Mixed designs: Combinations of continuous and categorical predictors.
4.2 Dummy Coding for Categorical Predictors
For a categorical predictor with categories, create dummy variables using a reference category. For a three-level factor (A, B, C) with A as reference:
| Category | ||
|---|---|---|
| A (reference) | 0 | 0 |
| B | 1 | 0 |
| C | 0 | 1 |
The coefficient for in column of represents the difference in the mean of between groups B and A, holding other predictors constant.
4.3 The Response Matrix
The response matrix () contains all response variables:
Each column is a separate response variable. Each row is the complete response profile for one observation.
4.4 Centering and Scaling
Centering predictors (subtracting their means) is generally recommended to:
- Make the intercept interpretable (the predicted response at the mean of all predictors).
- Reduce multicollinearity between main effects and interactions.
- Improve numerical stability of matrix computations.
Standardising predictors (centering + scaling to unit variance) additionally makes regression coefficients comparable across predictors with different units and scales — the standardised coefficients represent the change in the response (in standard deviation units of the response) for a one-standard-deviation change in each predictor.
4.5 Interaction Terms
An interaction term is formed as the element-wise product of the two predictor columns:
Including an interaction in the model allows the effect of on the response vector to depend on the value of (and vice versa). For categorical × continuous interactions, the interaction allows the regression slope to differ across groups.
5. Assumptions of Multivariate Linear Models
5.1 Linearity
Assumption: The relationship between each predictor and each response is linear (on the scale of the model), holding all other predictors constant:
How to check: Partial residual plots (component-plus-residual plots) for each predictor-response combination; scatter plots of residuals against each predictor.
Consequences of violation: Biased and inconsistent coefficient estimates; poor prediction.
Remedies: Polynomial terms, log transformations of predictors or responses, spline terms, generalised additive models.
5.2 Multivariate Normality of Errors
Assumption: The error vectors follow a multivariate normal distribution:
How to check:
- Mardia's tests of multivariate skewness and kurtosis on residuals.
- Q-Q plots of squared Mahalanobis distances of residuals vs. quantiles.
- Univariate normality checks (histograms, Q-Q plots, Shapiro-Wilk) on each column of .
Consequences of violation: OLS estimates remain unbiased and consistent (Gauss-Markov theorem extends to MLM) but inference (hypothesis tests, confidence intervals) based on the multivariate normal assumption is affected, particularly with small samples.
Remedies: Transformations (log, Box-Cox) of skewed responses; robust estimation; bootstrap inference.
5.3 Independence of Observations
Assumption: The error vectors are independent across observations:
How to check: Consider the study design. Look for temporal autocorrelation (Durbin-Watson statistics per response), spatial correlation, or clustering structure.
Consequences of violation: Biased standard errors, invalid hypothesis tests, false significance.
Remedies: Mixed models (multilevel MLM) for clustered data; GEE for repeated measures; time-series corrections for temporal dependence.
5.4 Homoscedasticity (Constant Error Covariance)
Assumption: The error covariance matrix is the same for all observations — it does not depend on the values of the predictors or the fitted values:
How to check:
- Plots of squared residuals vs. fitted values for each response.
- Levene's test or Bartlett's test for each response variable.
- Box's M test if comparing across groups defined by a categorical predictor.
Consequences of violation: OLS estimates remain unbiased but are no longer BLUE (Best Linear Unbiased Estimator); standard errors are biased.
Remedies: Weighted least squares (if the heteroscedasticity pattern is known); heteroscedasticity-consistent (HC) standard errors; transformations of responses.
5.5 No Perfect Multicollinearity Among Predictors
Assumption: No predictor variable is a perfect linear combination of other predictor variables. Equivalently, must be invertible (full rank).
How to check: Variance Inflation Factor (VIF) for each predictor: where is the R² from regressing on all other predictors. VIF > 10 is a common threshold for concern.
Consequences of violation: does not exist; coefficient estimates are undefined or numerically unstable with inflated standard errors.
Remedies: Remove redundant predictors; use ridge regression or PLS; combine highly correlated predictors into a composite.
5.6 No Influential Outliers
Assumption: No single observation has undue influence on the estimated coefficient matrix .
How to check: Cook's distance (multivariate extension), hat matrix diagonal , DFFITS, standardised residuals.
Consequences of violation: Estimated coefficients may be heavily distorted by a single atypical observation.
Remedies: Investigate outlying observations for data errors; use robust regression; report sensitivity analyses with and without influential points.
5.7 Sufficient Sample Size
Recommendation: For MLM to be reliable:
- (absolute minimum to estimate and ).
- (rule of thumb for adequate precision of coefficient estimates).
- The within-group covariance matrix requires for invertibility.
6. Parameter Estimation
6.1 The Ordinary Least Squares (OLS) Estimator
The OLS estimator for minimises the total sum of squared residuals simultaneously across all response variables:
Taking the matrix derivative and setting to zero:
Solving the normal equations :
Critically: This is simply the matrix of coefficients obtained by running separate OLS regressions — one for each column of . The multivariate OLS estimator is column-by-column identical to the univariate OLS estimator applied times.
6.2 Properties of the OLS Estimator
Under the standard assumptions:
Unbiasedness:
Covariance structure:
Where is the element of (the error covariance between responses and ). In full:
Gauss-Markov (multivariate): Among all linear unbiased estimators, has the minimum covariance matrix (in the Loewner partial order) — it is BLUE.
Maximum Likelihood Equivalence: Under multivariate normality, the OLS estimator is also the maximum likelihood estimator (MLE) of .
6.3 Estimation of the Error Covariance Matrix
The unbiased estimator of the error covariance matrix is:
Where is the matrix of OLS residuals.
The MLE of (biased but with maximum likelihood properties) is:
The unbiased estimator is generally preferred for inference.
Degrees of freedom: The residual SSCP matrix has degrees of freedom. Invertibility requires .
6.4 Seemingly Unrelated Regression (SUR) and GLS
When the predictor matrices differ across response variables (i.e., each response has its own set of predictors), standard OLS on each response separately is not the most efficient estimator. Seemingly Unrelated Regression (SUR) applies Generalised Least Squares (GLS) using the full covariance structure:
When all responses share the same predictor matrix (the standard MLM case), SUR and OLS give identical estimates. The GLS advantage only manifests when predictor matrices differ across equations.
6.5 The Maximum Likelihood Estimator
The log-likelihood for the multivariate linear model is:
Maximising over gives (as stated above). Maximising over gives .
The maximised log-likelihood is:
7. Hypothesis Testing and Inference
7.1 The General Linear Hypothesis Framework
The unified framework for hypothesis testing in MLM is the General Linear Hypothesis:
For testing (the most common case), the Hypothesis SSCP matrix is:
The Error SSCP matrix (the same for all hypotheses):
When , .
Test statistics are based on the eigenvalues of (see Section 8).
7.2 Testing Individual Predictor Effects (Row of )
To test whether predictor has any effect on the set of responses (, the -th row of ):
Set (row vector with 1 in position , zeros elsewhere) and .
This produces a multivariate test of whether simultaneously predicts zero for all responses. The four multivariate test statistics (Section 8) are applied.
7.3 Testing a Subset of Predictors
To test whether a subset of predictors jointly contribute to the model ( for a matrix ):
The hypothesis SSCP matrix with is compared to the error SSCP matrix with .
7.4 Testing Specific Linear Combinations of Responses
To test whether a predictor has differential effects on different responses (e.g., ):
Set and .
The transformed response is ; the test becomes a univariate -test on the difference.
7.5 Univariate Tests Within the MLM Framework
For each response variable individually, the standard univariate -test for predictor is obtained by setting and applying the Wald test:
Where is the estimated variance of .
7.6 Wald Tests for Individual Coefficients
For a single coefficient (predictor , response ):
Where:
A confidence interval for :
7.7 Simultaneous Confidence Regions
A key advantage of MLM is the ability to form joint confidence regions for multivariate coefficient vectors. The joint confidence region for the -th row of (all responses simultaneously) is an ellipsoid:
7.8 Testing the Overall Model
The overall model tests whether any predictor (excluding the intercept) has a significant effect on any response:
Where is with the intercept row removed. This is tested with (omitting the intercept column) and .
8. Multivariate Test Statistics
All multivariate tests in MLM are based on the eigenvalues of , where , = rank of (number of hypothesis df), = number of transformed responses (columns of ).
8.1 Wilks' Lambda ()
Range: ; smaller values indicate stronger effects.
F-approximation (Rao):
Where:
Exact F when , , , or .
8.2 Pillai's Trace ()
Range: .
F-approximation:
Where and .
Most robust to violations of assumptions.
8.3 Hotelling-Lawley Trace ()
Range: .
F-approximation:
Most powerful when effects are concentrated on a single dimension.
8.4 Roy's Largest Root ()
Range: .
Roy's provides an upper bound to the -value distribution. Exact tables are needed for precise -values.
8.5 Comparison of the Four Statistics in MLM Context
| Statistic | Rejected When | Best For | Robustness |
|---|---|---|---|
| Wilks' | small | General; effects spread | Moderate |
| Pillai's | large | Effects spread; small ; unequal | Highest |
| Hotelling-Lawley | large | Effects on one dimension | Lowest |
| Roy's | large | Dominant single dimension | Lowest |
8.6 Likelihood Ratio Test
An alternative formulation: Under , the likelihood ratio statistic is:
The corresponding test statistic:
Asymptotically under . This approximation is Bartlett's correction to the likelihood ratio test.
9. Effect Size Measures
9.1 Multivariate Eta-Squared ()
The most widely reported multivariate effect size for each hypothesis test:
From Wilks' Lambda:
Where is defined as in Section 8.1.
From Pillai's Trace:
Benchmarks:
| Effect Size | |
|---|---|
| Small | |
| Medium | |
| Large |
9.2 Multivariate Omega-Squared ()
A less biased (corrected) multivariate effect size:
Applied to the -approximation from the multivariate tests. Can be computed separately from each test statistic's approximation.
9.3 Univariate Effect Sizes per Response
For each response variable , report the univariate :
And univariate partial for each predictor-response combination:
9.4 Standardised Coefficients
Standardised regression coefficients () are obtained by standardising both predictors and responses before fitting:
Where and are the standard deviations of predictor and response . Standardised coefficients represent the change in (in SD units) for a one-SD change in , facilitating comparison across predictors and responses.
9.5 Canonical Correlations
From the eigenvalues of , the canonical correlations between the predictor and response spaces are:
The canonical represents the proportion of variance in the -th canonical response variate explained by the corresponding canonical predictor variate. These describe the multivariate association structure.
9.6 Multivariate R² (Generalisations)
Several generalisations of to the multivariate case exist:
Pillai's Trace-based R²:
Wilks' Lambda-based R²:
Trace correlation coefficient (average across canonical variates):
10. Model Fit and Evaluation
10.1 Univariate R² for Each Response
Report the standard for each of the response regressions:
Adjusted R² penalises for model complexity:
10.2 Trace of the Residual SSCP
The trace of the residual SSCP matrix measures total residual variation across all responses:
This is the multivariate analogue of the residual sum of squares.
10.3 The Determinant Criterion ()
The generalised variance (determinant of the estimated error covariance matrix) is a scalar measure of total residual variation that accounts for correlations among responses. Smaller values indicate better overall fit.
Log-determinant criterion:
This appears in the log-likelihood and information criteria.
10.4 AIC and BIC for Multivariate Models
Akaike Information Criterion:
Or using the maximised log-likelihood:
Where is the total number of free parameters (coefficients plus unique elements of ).
Bayesian Information Criterion:
Lower AIC/BIC indicates a better model relative to complexity. Use AIC for predictive accuracy; BIC for parsimony.
10.5 Model Comparison via Likelihood Ratio Test
To compare a full model (all predictors) vs. a reduced model ( predictors, dropping the predictors specified by ):
Or equivalently using Wilks' Lambda for the hypothesis being tested.
10.6 Comparing Univariate and Multivariate Fit
An important diagnostic is to compare the multivariate model fit with the fit from separate univariate regressions:
- Same coefficient estimates: = column-stacked OLS estimates.
- Difference in inference: MLM uses the joint error covariance structure for more powerful tests of multivariate hypotheses.
- Gain from MLM: Greatest when responses are strongly correlated and hypotheses about joint effects are of primary interest.
11. Model Diagnostics and Residuals
11.1 The Residual Matrix
The residual matrix is:
Each row is the residual vector for observation . A well-fitting model produces residuals that resemble multivariate white noise.
11.2 Univariate Residual Diagnostics (Per Response)
For each response , extract the -th column of and apply standard univariate regression diagnostics:
Residuals vs. Fitted Values: Plot against . Expect a random scatter around zero with no pattern.
Normal Q-Q Plot: Plot ordered residuals against normal quantiles. Points should fall near the diagonal.
Scale-Location Plot: Plot against . A horizontal band indicates homoscedasticity.
Residuals vs. Predictor Plots: Plot against each . Patterns indicate non-linearity or omitted variable bias.
11.3 Multivariate Residual Diagnostics
11.3.1 Mahalanobis Distance of Residuals
The squared Mahalanobis distance of the residual vector for observation :
Under the model, approximately follows an distribution. Large indicates observation is poorly fitted on the combined response profile.
Chi-squared Q-Q plot: Plot ordered against quantiles. Departures from linearity indicate non-normality of multivariate residuals or outliers.
11.3.2 Standardised Multivariate Residuals
Standardise each residual vector by the estimated error covariance:
Under correct specification, .
11.3.3 Cross-Response Residual Correlation
Compute the correlation matrix of the columns of :
Where . This estimates the within-observation correlation structure of the errors. High correlations indicate that the responses share substantial common error variance.
11.4 Leverage and Influence
Leverage (diagonal of ): The same hat matrix applies to all responses in MLM. High leverage () indicates an observation with unusual predictor values.
Multivariate Cook's Distance:
Where is the coefficient matrix estimated without observation . An efficient formula:
Observations with (or ) warrant investigation.
DFFITS (Multivariate):
11.5 Testing for Multivariate Outliers
Bonferroni-corrected Mahalanobis distance test: Compare to the critical value:
(Bonferroni-corrected at the level, e.g., , , : use .)
Robust Mahalanobis distances: Use the Minimum Covariance Determinant (MCD) or Minimum Volume Ellipsoid (MVE) estimates of location and scatter, which are resistant to masking effects (outliers hiding other outliers).
11.6 Checking Multicollinearity Among Predictors
Variance Inflation Factor (VIF) for each predictor (same for all responses, since is shared):
Where is the from regressing on all other predictors.
| VIF | Interpretation |
|---|---|
| Low multicollinearity | |
| Moderate multicollinearity (concern) | |
| Severe multicollinearity (serious problem) |
Condition number of : Ratio of largest to smallest eigenvalue. Values indicate potentially problematic multicollinearity.
11.7 Checking Homoscedasticity
For each response , check homoscedasticity using:
- Breusch-Pagan test: Tests whether is predicted by the fitted values.
- White's test: A more general test for heteroscedasticity.
- Scale-Location plot: Visual assessment.
For the multivariate setting, Box's M test can be adapted to test whether the error covariance matrix is the same across subgroups defined by categorical predictors.
12. Interpretation of Coefficients
12.1 Interpreting the Coefficient Matrix
The coefficient matrix has dimensions :
- Column of : The univariate regression coefficients for response on all predictors. Interpretation is identical to univariate regression: is the expected change in for a one-unit increase in , holding all other predictors constant.
- Row of : The multivariate coefficient vector for predictor — the simultaneous effect of on all responses. This row is tested jointly in multivariate hypothesis tests.
12.2 Ceteris Paribus Interpretation
Each coefficient represents the partial effect of on :
This is the expected change in the mean of for a one-unit increase in , with all other predictors held constant.
12.3 Comparing Coefficients Across Responses
A key feature of MLM is the ability to compare the effects of a predictor across multiple responses. If the responses are on the same scale (or are standardised), directly compare , , ..., for predictor .
To formally test whether the effect of on equals its effect on :
Where (contrast between responses and ).
12.4 Interpreting Interactions
For a model with predictors , , and their interaction :
The interaction coefficient represents the modification of the effect of on per unit change in (and vice versa):
12.5 Interpreting Dummy Variables for Categorical Predictors
For a categorical predictor with levels (reference = Level 1):
The coefficient for dummy variable in the -th response equation represents the adjusted mean difference between Level and the reference Level 1 on response , controlling for all other predictors.
12.6 The Profile of Effects Across Responses
The multivariate coefficient vector for predictor is the row:
Plotting this profile (coefficient value vs. response variable) visualises the pattern of effects of across the response set. Parallel or similar profiles across predictors indicate a common underlying structure; divergent profiles suggest differential effects.
13. Variable Selection and Model Comparison
13.1 Criteria for Variable Selection
Variable selection in MLM must account for the joint effect on all responses simultaneously. Several criteria are available:
Multivariate AIC/BIC: Compare models using AIC or BIC (see Section 10.4). Select the model with the lowest criterion value.
Sequential Likelihood Ratio Tests: Add predictors one at a time and test whether each addition significantly improves the joint model fit using the approximation to Wilks' Lambda.
Multivariate Mallow's :
Where the sum is over all responses.
13.2 All-Subsets Selection
Evaluate all possible subsets of predictors and select based on:
- Minimum multivariate AIC or BIC.
- Minimum .
- Maximum multivariate adjusted R².
Computational note: For large , all-subsets becomes infeasible. Use stepwise procedures or regularised alternatives.
13.3 Stepwise Variable Selection
Forward selection:
- Start with the intercept-only model.
- At each step, add the predictor that produces the greatest reduction in multivariate AIC (or the most significant Wilks' Lambda test).
- Stop when no addition improves the criterion.
Backward elimination:
- Start with the full model.
- At each step, remove the predictor whose removal produces the least deterioration in multivariate fit (highest p-value for Wilks' Lambda test or smallest AIC increase).
- Stop when removing any predictor significantly worsens fit.
Bidirectional: Combine forward and backward at each step.
⚠️ Stepwise selection using p-values suffers from multiple testing inflation, instability, and optimistic bias. AIC/BIC-based stepwise is preferred. Always validate the final model on held-out data and interpret with caution.
13.4 Regularised Estimation for High-Dimensional Settings
When is large relative to , standard OLS estimation becomes unstable. Regularisation methods extend to the multivariate setting:
Multivariate Ridge Regression:
The ridge penalty shrinks all coefficients toward zero, stabilising estimation under multicollinearity.
Multivariate Lasso (Group Lasso): Applies an penalty structure that encourages sparse solutions. The group lasso applies the penalty at the row level of — entire rows (effects of a predictor on all responses) are driven to zero, performing group variable selection.
Penalty:
Where is the Euclidean norm of the -th row of .
13.5 Cross-Validation for Model Selection
-fold cross-validation for MLM:
- Divide the observations into folds.
- For each fold : fit the model on all data except fold ; predict fold .
- Compute the cross-validated prediction error:
Or using the trace criterion:
Select the model minimising the cross-validated prediction error.
14. Special Cases and Connections to Other Methods
14.1 Univariate Multiple Regression ()
When , the response matrix reduces to a vector and the coefficient matrix reduces to a vector . All MLM formulas reduce to standard OLS:
All multivariate test statistics reduce to the standard -test.
14.2 MANOVA as a Special Case
When all predictors in are indicator variables (dummy coding of group membership) and no continuous covariates are included, the MLM reduces to MANOVA. The hypothesis with specifying group contrasts is the MANOVA test for group mean vector differences.
The SSCP matrices:
- (between-group)
- (within-group)
14.3 MANCOVA as a Special Case
When contains both group indicators (categorical) and continuous covariates, the MLM reduces to MANCOVA. The MANCOVA test for group differences (after adjusting for covariates) corresponds to the MLM hypothesis test where specifies contrasts among group effects.
The "adjustment" for covariates happens automatically within the MLM framework — the covariate columns of absorb the variation they explain, leaving purer group comparisons.
14.4 Canonical Correlation Analysis (CCA)
Canonical Correlation Analysis studies the association between two sets of variables: predictors and responses . It finds linear combinations:
That maximise . The canonical correlations are the square roots of the eigenvalues of:
In the MLM framework, CCA provides insight into the overall association structure between the predictor and response sets. The eigenvalues of in the MLM test of the full model are directly related to the squared canonical correlations.
14.5 Discriminant Function Analysis
When consists of group indicators, the eigenvectors of define the linear discriminant functions — the linear combinations of responses that maximally separate the groups. DFA is therefore a post-hoc analysis tool following a significant MANOVA (and thus a significant MLM with categorical predictors).
14.6 Repeated Measures and Profile Analysis
When the response variables represent the same variable measured at time points on the same observations, the MLM becomes a repeated-measures (profile analysis) model. The response transformation matrix is used to form contrasts among time points:
- Flatness: (first-difference matrix: ) tests whether the profile is flat (no time effect).
- Parallelism: Tests whether the time profiles are parallel across groups.
- Levels: Tests whether average levels differ across groups.
14.7 Seemingly Unrelated Regression (SUR)
When each response variable has its own distinct set of predictors (not the shared of standard MLM), the Seemingly Unrelated Regression model applies. Standard OLS applied separately to each equation is unbiased but inefficient; GLS using the cross-equation error covariances (Zellner's FGLS estimator) is asymptotically more efficient.
15. Using the Multivariate Linear Models Component
The Multivariate Linear Models component in the DataStatPro application provides a full end-to-end workflow for fitting, evaluating, and interpreting multivariate regression models.
Step-by-Step Guide
Step 1 — Select Dataset Choose the dataset from the "Dataset" dropdown. The dataset should contain at least two continuous response variables and one or more predictor variables.
Step 2 — Select Response Variables (Y) Select two or more continuous response variables from the "Response Variables (Y)" panel. These represent the jointly modelled outcomes.
💡 Select response variables that are theoretically related and expected to be influenced by the same set of predictors. Responses that share common error variance benefit most from the multivariate framework.
Step 3 — Select Predictor Variables (X) Select one or more predictor variables from the "Predictor Variables (X)" dropdown. Predictors can be:
- Continuous: Entered directly.
- Categorical: The application automatically creates dummy variables; you will be prompted to select the reference category.
Step 4 — Configure Interactions (Optional) Specify interaction terms by selecting pairs of predictor variables. The application creates the product terms and adds them to the design matrix.
Step 5 — Configure Polynomial Terms (Optional) For continuous predictors, specify polynomial degrees (e.g., quadratic: , cubic: ) to model non-linear relationships on the link scale.
Step 6 — Configure Centering and Scaling Select preprocessing for predictors:
- No centering/scaling
- Mean centering only (recommended for interpretability of intercepts)
- Standardisation (mean centering + unit variance scaling) (recommended for comparing coefficients across predictors)
Step 7 — Specify Hypothesis Tests (Optional) Define custom hypotheses of the form :
- : Contrast matrix for predictors (specify as a matrix of coefficients).
- : Response transformation matrix (default for all responses).
Pre-built hypothesis options:
- Overall model test (all predictors, all responses).
- Individual predictor tests (one row of at a time).
- Subset of predictors (user-specified group of rows).
- Equality of effects across responses (e.g., ).
Step 8 — Select Confidence Level Choose the confidence level for confidence intervals (default: 95%).
Step 9 — Select Display Options Choose which outputs to display:
- ✅ Coefficient Matrix Table (with SE, -values, p-values, CIs per response)
- ✅ Standardised Coefficient Matrix
- ✅ Error Covariance Matrix (with correlation matrix)
- ✅ Multivariate Test Statistics Table (Wilks', Pillai's, Hotelling-Lawley, Roy's)
- ✅ Univariate ANOVA/Regression Tables (per response)
- ✅ Univariate R² and Adjusted R² (per response)
- ✅ Multivariate Effect Sizes (, )
- ✅ Residual Diagnostics Plots (per response and multivariate)
- ✅ Mahalanobis Distance Q-Q Plot
- ✅ Leverage and Cook's Distance Plot
- ✅ VIF Table
- ✅ Predicted vs. Observed Plots (per response)
- ✅ Coefficient Profile Plot
- ✅ Correlation Structure of Residuals
- ✅ AIC / BIC
Step 10 — Run the Analysis Click "Run Multivariate Linear Model". The application will:
- Construct the design matrix (with dummy coding, interaction, polynomial terms).
- Compute .
- Compute residuals and estimate .
- Compute all four multivariate test statistics for each specified hypothesis.
- Compute univariate regression statistics for each response.
- Compute effect sizes, leverage, Cook's distances, VIFs.
- Run multivariate normality tests on residuals.
- Generate all selected visualisations and tables.
16. Computational and Formula Details
16.1 Step-by-Step Computation
Step 1: Construct the design matrix ()
Include a column of ones (for the intercept) as the first column, followed by the predictor columns (with dummy coding for categorical predictors).
Step 2: Verify rank condition
Check that (full column rank). If not, the model is under-identified — remove linearly dependent columns.
Step 3: Compute
Using Cholesky decomposition (more numerically stable than direct inversion):
Step 4: Compute the hat matrix
For large , avoid storing explicitly; compute leverages as row-wise norms:
Step 5: Compute the coefficient matrix
Step 6: Compute fitted values and residuals
Step 7: Estimate the error covariance matrix
Step 8: Compute standard errors for each coefficient
For coefficient :
Where is the estimated error variance for response .
16.2 Computing the Hypothesis SSCP Matrix
For hypothesis :
Step 1: Compute — the transformed coefficient matrix ().
Step 2: Compute — the estimated contrast ().
Step 3: Compute the hypothesis SSCP:
Step 4: Compute the error SSCP:
Step 5: Compute eigenvalues of .
Step 6: Compute test statistics (Wilks', Pillai's, Hotelling-Lawley, Roy's) from the eigenvalues.
16.3 Rao's F-Approximation to Wilks' Lambda
Given , (rank of ), (columns of ), :
This approximation is exact when or or or .
16.4 Confidence Ellipse for Two Coefficients
The joint confidence ellipse for (two coefficients in the same response equation ) is:
Where is the submatrix of corresponding to rows/columns and .
16.5 Prediction for New Observations
For a new observation (), the point prediction of the response vector is:
The prediction error covariance (accounting for both estimation uncertainty and future error variance):
A prediction ellipsoid for :
Where .
Marginal prediction intervals for individual response :
16.6 Partitioning the SSCP Matrix
For the standard MLM, the total SSCP partitions as:
Where:
The diagonal elements give the familiar univariate sums of squares: and .
17. Worked Examples
Example 1: Predicting Multiple Environmental Outcomes from Land Use Variables
Research Question: Do land use variables (% Urban cover, % Agricultural cover, Distance to nearest river in km) simultaneously predict multiple water quality indicators (pH, Dissolved Oxygen mg/L, Nitrate mg/L) across sampling sites?
Data: water sampling sites; predictors; responses.
Step 1: Descriptive Statistics
| Variable | Mean | SD | Min | Max |
|---|---|---|---|---|
| Urban (%) | 18.4 | 12.8 | 0.2 | 68.3 |
| Agriculture (%) | 41.2 | 19.6 | 3.1 | 88.7 |
| Distance (km) | 2.84 | 1.93 | 0.12 | 9.41 |
| pH | 7.21 | 0.48 | 5.82 | 8.34 |
| DO (mg/L) | 8.14 | 1.62 | 3.81 | 11.42 |
| Nitrate (mg/L) | 4.83 | 2.91 | 0.21 | 14.72 |
Step 2: Estimated Coefficient Matrix
Step 3: Standard Errors, t-values, and p-values (per response)
Response: pH
| Predictor | SE | 95% CI | |||
|---|---|---|---|---|---|
| Intercept | 7.832 | 0.241 | 32.50 | < 0.001 | [7.352, 8.312] |
| Urban (%) | -0.012 | 0.005 | -2.40 | 0.019 | [-0.022, -0.002] |
| Agriculture (%) | -0.008 | 0.004 | -2.00 | 0.049 | [-0.016, 0.000] |
| Distance (km) | 0.061 | 0.031 | 1.97 | 0.053 | [-0.001, 0.123] |
, Adjusted
Response: Dissolved Oxygen (DO)
| Predictor | SE | 95% CI | |||
|---|---|---|---|---|---|
| Intercept | 9.214 | 0.584 | 15.78 | < 0.001 | [8.051, 10.377] |
| Urban (%) | -0.031 | 0.012 | -2.58 | 0.012 | [-0.055, -0.007] |
| Agriculture (%) | -0.014 | 0.009 | -1.56 | 0.124 | [-0.032, 0.004] |
| Distance (km) | 0.218 | 0.075 | 2.91 | 0.005 | [0.069, 0.367] |
, Adjusted
Response: Nitrate (NO3)
| Predictor | SE | 95% CI | |||
|---|---|---|---|---|---|
| Intercept | 1.041 | 0.812 | 1.28 | 0.204 | [-0.573, 2.655] |
| Urban (%) | 0.083 | 0.017 | 4.88 | < 0.001 | [0.049, 0.117] |
| Agriculture (%) | 0.041 | 0.013 | 3.15 | 0.002 | [0.015, 0.067] |
| Distance (km) | -0.312 | 0.104 | -3.00 | 0.004 | [-0.519, -0.105] |
, Adjusted
Step 4: Estimated Error Covariance and Correlation Matrices
The error correlation matrix reveals substantial within-site correlations among the water quality residuals (up to ), confirming that the multivariate framework is appropriate.
Step 5: Multivariate Hypothesis Tests
Overall model test (, all three predictors, all three responses):
| Test Statistic | Value | |||||
|---|---|---|---|---|---|---|
| Wilks' | 0.4027 | 10.842 | 9 | 187.4 | < 0.001 | 0.302 |
| Pillai's Trace | 0.6512 | 9.814 | 9 | 219 | < 0.001 | 0.287 |
| Hotelling-Lawley | 1.2483 | 12.842 | 9 | 178.0 | < 0.001 | 0.322 |
| Roy's Largest Root | 0.8941 | 21.727 | 3 | 76 | < 0.001 | 0.462 |
Interpretation: The overall model is highly significant (). Land use variables jointly predict the combined water quality profile, with a large multivariate effect ( from Wilks' Lambda).
Per-predictor multivariate tests:
| Predictor | Wilks' | |||
|---|---|---|---|---|
| Urban (%) | 0.6824 | 11.43 | < 0.001 | 0.317 |
| Agriculture (%) | 0.8241 | 5.27 | 0.002 | 0.176 |
| Distance (km) | 0.7613 | 7.74 | < 0.001 | 0.239 |
All three predictors have significant multivariate effects.
Step 6: Coefficient Profile Plot Interpretation
Plotting the rows of across the three responses reveals:
- Urban cover has a consistent negative effect on pH and DO (degrading oxygen-related water quality) but a positive effect on nitrate (nutrient loading from urban runoff).
- Distance to river has a consistent positive effect on DO (more oxygenated conditions further from main channel?) and a negative effect on nitrate (dilution/attenuation further from source).
Step 7: Prediction for a New Site
New site: Urban = 25%, Agriculture = 55%, Distance = 1.5 km.
Prediction intervals (95%):
For pH:
Example 2: MLM with Categorical and Continuous Predictors — Treatment, Gender, and Biomarker Outcomes
Research Question: Do treatment condition (Drug A, Drug B, Placebo) and gender (Male/Female), along with baseline age, predict a profile of three biomarker outcomes (Cholesterol mmol/L, Blood Pressure mmHg, Glucose mmol/L)?
Data: participants; predictors (2 dummy variables for treatment, 1 dummy for gender, 1 continuous for age); responses.
Design Matrix Structure:
Reference: Drug = Placebo, Gender = Male. Age is mean-centred ( years).
Step 1: Coefficient Matrix (Selected)
| Predictor | SE | SE | SE | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 5.41 | 0.18 | < .001 | 132.4 | 2.41 | < .001 | 5.82 | 0.19 | < .001 |
| Drug A | -0.84 | 0.22 | < .001 | -8.21 | 2.94 | 0.006 | -0.61 | 0.23 | 0.009 |
| Drug B | -0.41 | 0.22 | 0.064 | -4.83 | 2.94 | 0.103 | -0.28 | 0.23 | 0.225 |
| Female | -0.32 | 0.19 | 0.094 | -5.14 | 2.54 | 0.046 | -0.18 | 0.20 | 0.369 |
| Age | 0.031 | 0.008 | < .001 | 0.412 | 0.107 | < .001 | 0.028 | 0.008 | 0.001 |
Step 2: Multivariate Tests
Treatment effect (: Drug A = Drug B = Placebo on all DVs):
| Test | Value | |||
|---|---|---|---|---|
| Wilks' | 0.7214 | 7.12 | < 0.001 | 0.158 |
| Pillai's Trace | 0.2912 | 6.78 | < 0.001 | 0.151 |
Treatment significantly and jointly predicts the biomarker profile (, medium-large effect ).
Gender effect (: Male = Female on all DVs):
| Test | Value | |||
|---|---|---|---|---|
| Wilks' | 0.9124 | 3.62 | 0.015 | 0.088 |
| Pillai's Trace | 0.0876 | 3.62 | 0.015 | 0.088 |
Gender has a significant multivariate effect (, small-medium ), driven primarily by blood pressure differences.
Age effect (: Age coefficient = 0 on all DVs):
| Test | Value | |||
|---|---|---|---|---|
| Wilks' | 0.8341 | 7.48 | < 0.001 | 0.166 |
| Pillai's Trace | 0.1659 | 7.48 | < 0.001 | 0.166 |
Age has a significant multivariate effect (, large ) — older participants have higher levels of all three biomarkers.
Step 3: Error Correlation Matrix
Substantial positive error correlations confirm the appropriateness of the multivariate approach.
Step 4: Comparing Drug A vs. Drug B (Planned Contrast)
Hypothesis: on all DVs (i.e., Drug A and Drug B have identical profiles of effects).
Using and :
Wilks' , , , .
Drug A has a significantly different (stronger) effect profile compared to Drug B across the combined biomarker set (). Examining individual coefficients: Drug A reduces cholesterol by mmol/L more and blood pressure by mmHg more than Drug B.
Conclusion: Drug A significantly and jointly reduces the three biomarker levels compared to Placebo (all ), whereas Drug B's effects are not individually significant. Age is a strong positive predictor of all biomarkers (all ). Drug A's effects are significantly stronger than Drug B's across the combined biomarker profile.
Example 3: Profile of Cognitive Outcomes — Continuous Predictors
Research Question: Do IQ, years of education, and weekly exercise hours predict a profile of four cognitive assessment scores (Memory, Attention, Processing Speed, Executive Function) in older adults?
Data: participants; continuous predictors; responses (all standardised T-scores, mean = 50, SD = 10).
Step 1: Standardised Coefficient Matrix
| Predictor | ||||
|---|---|---|---|---|
| IQ | 0.412** | 0.381** | 0.448** | 0.521** |
| Education | 0.284** | 0.211** | 0.142* | 0.318** |
| Exercise | 0.118* | 0.081 | 0.241** | 0.098 |
*; **
Step 2: Multivariate Tests
| Predictor | Wilks' | |||
|---|---|---|---|---|
| IQ | 0.5423 | 30.14 | < 0.001 | 0.458 |
| Education | 0.7841 | 9.82 | < 0.001 | 0.216 |
| Exercise | 0.9124 | 3.44 | 0.011 | 0.088 |
All three predictors significantly and jointly predict the cognitive profile. IQ has the largest multivariate effect (, large); education has a medium effect (); exercise has a small but significant effect ().
Step 3: Differential Effects Across Responses
Test: Does IQ have equal effects on all four cognitive scores?
:
Using pairwise contrasts three-column difference matrix:
Wilks' , , → IQ has differential effects across the cognitive domains. IQ most strongly predicts Executive Function and Processing Speed (standardised ) compared to Memory and Attention ().
Test: Does exercise have equal effects on Processing Speed and Attention?
:
,
, → Exercise has a significantly stronger effect on Processing Speed than Attention ( vs. ).
Step 4: Univariate R² Summary
| Response | Adjusted | Primary Predictor | |
|---|---|---|---|
| Memory | 0.348 | 0.334 | IQ () |
| Attention | 0.291 | 0.276 | IQ () |
| Processing Speed | 0.402 | 0.389 | IQ () |
| Executive Function | 0.481 | 0.470 | IQ () |
Conclusion: The three predictors jointly explain 29–48% of variance in individual cognitive scores. IQ is the dominant predictor across all cognitive domains, but exercise specifically benefits processing speed more than other domains. Education consistently predicts all cognitive outcomes.
18. Common Mistakes and How to Avoid Them
Mistake 1: Running Separate Univariate Regressions Without Multivariate Testing
Problem: Conducting separate OLS regressions and reporting results variable-by-variable without testing multivariate hypotheses. This ignores the correlational structure among responses, inflates the familywise Type I error rate, and misses effects that exist only on linear combinations of responses.
Solution: Use the MLM framework to obtain multivariate tests of each predictor's joint effect on all responses simultaneously. Report multivariate test statistics alongside univariate results. Apply Bonferroni or Holm corrections if separately reporting univariate results.
Mistake 2: Neglecting to Report and Interpret the Error Covariance Matrix
Problem: Fitting an MLM but reporting only univariate coefficients and ignoring , missing a key piece of the analysis — the residual correlation structure among responses.
Solution: Always estimate and report (or at least the correlation form ). Large within-observation error correlations are the key justification for using MLM rather than separate regressions. If error correlations are near zero, separate regressions are nearly equivalent to MLM for estimation.
Mistake 3: Confusing the Coefficient Matrix Interpretation Direction
Problem: Misinterpreting whether the -th row or the -th column of corresponds to predictor or response .
Solution: Establish and maintain a consistent convention. In the standard MLM : rows of correspond to predictors (including intercept); columns of correspond to responses. Always label the rows and columns of clearly in output tables.
Mistake 4: Ignoring Residual Diagnostics Across All Responses
Problem: Checking diagnostic plots for only one or two response variables and assuming the model is adequate for all, when violations (non-linearity, heteroscedasticity, outliers) may affect only some responses.
Solution: Examine residual diagnostic plots for every response variable individually. Also conduct multivariate residual diagnostics (Mahalanobis distance Q-Q plot, multivariate normality tests). A violation in even one response requires remediation.
Mistake 5: Using MLM When Responses Are Unrelated
Problem: Applying MLM to a set of conceptually unrelated response variables simply because they were measured, producing a statistically valid but scientifically meaningless combined analysis.
Solution: Only combine responses into an MLM when they represent a coherent theoretical construct (e.g., multiple indicators of the same underlying outcome dimension, multiple time points of the same measure, multiple facets of the same construct). Unrelated responses should be modelled separately.
Mistake 6: Ignoring Multicollinearity Among Predictors
Problem: Failing to check VIFs, leading to inflated standard errors, unstable coefficient estimates, and unreliable hypothesis tests — especially problematic in MLM because the instability propagates across all response equations simultaneously.
Solution: Always compute VIFs before finalising the model. If any , consider removing one of the collinear predictors, creating a composite, using ridge regression, or PLS. Report VIFs in the methods section.
Mistake 7: Testing Multivariate Hypotheses Without Checking the Appropriate Contrast Matrices
Problem: Specifying incorrect or incomplete and matrices, leading to tests of unintended hypotheses. For example, forgetting that the intercept column is the first row of and incorrectly indexing predictors.
Solution: Always verify the and matrices by constructing them explicitly and checking that produces the intended contrast. The DataStatPro application automatically constructs and from user specifications, but always review the hypothesis being tested.
Mistake 8: Selecting Variables Based Only on Univariate Significance
Problem: Dropping predictors from the MLM based solely on their univariate significance for individual responses (e.g., "agriculture is not significant for DO, so remove it"), potentially removing predictors that have significant multivariate effects.
Solution: Use multivariate criteria for variable selection: multivariate AIC/BIC, likelihood ratio test for the full row of (all responses simultaneously), or the multivariate -test for the predictor's row. A predictor non-significant for one response may still significantly predict other responses or combinations thereof.
Mistake 9: Over-Interpreting Large Standardised Coefficients Without Confidence Intervals
Problem: Ranking predictor importance based on standardised coefficient magnitudes alone, without reporting confidence intervals, leading to overconfident conclusions about which predictors matter most.
Solution: Always report confidence intervals for standardised coefficients. A large standardised coefficient with a wide confidence interval indicates imprecision. Use effect sizes (, ) alongside coefficient estimates for importance assessment.
Mistake 10: Neglecting Multivariate Outlier Detection
Problem: Identifying and addressing univariate outliers (per response) but missing multivariate outliers — observations that are unusual on the combination of responses even if not extreme on any individual response.
Solution: Always compute Mahalanobis distances of residuals and create a vs. Q-Q plot. Investigate observations with for data errors or genuine anomalies. Report Cook's distances and the sensitivity of results to influential observations.
19. Troubleshooting
| Issue | Likely Cause | Solution |
|---|---|---|
| does not exist | Perfect multicollinearity among predictors; rank-deficient design matrix; | Check for linearly dependent columns (e.g., sum of dummy variables = constant); remove redundant predictors; collect more data |
| is singular | (too few residual df); perfectly correlated responses | Increase ; reduce (remove highly correlated responses); use regularised estimator |
| Very large standard errors for some coefficients | Near-multicollinearity (); very small predictor variance | Check VIFs; standardise predictors; use ridge regression; remove one of pair of collinear predictors |
| Wilks' (non-significant, ) | No predictive signal; all responses uncorrelated with all predictors; data entry error | Verify data; check variable coding; consider whether the research question is plausible |
| Wilks' or | Computational error; near-singular ; numerical overflow | Check for perfect multicollinearity among responses; verify ; inspect raw data |
| Mahalanobis distance Q-Q plot strongly curved | Severe multivariate non-normality of residuals; influential outliers | Identify and investigate outliers (Cook's distance, leverage); transform skewed responses (log, Box-Cox); use bootstrap inference |
| Significant multivariate test but no significant univariate tests | Effect exists on a linear combination of DVs not captured individually | Focus on discriminant function / canonical variate analysis; do not dismiss the multivariate result — it is valid |
| Significant univariate tests but non-significant multivariate test | Insufficient power for multivariate test with many DVs; DVs poorly correlated with group effects | Consider removing uninformative DVs; increase sample size; report both results with context |
| Residual plots show clear non-linearity for one DV | Omitted non-linear term for that response; wrong functional form | Add polynomial or spline term for the offending predictor-response combination; consider transformation of the response |
| Residual plots show heteroscedasticity for one DV | Variance of one response changes with fitted values | Apply response transformation (log, square root); use heteroscedasticity-consistent (HC) standard errors; consider weighted MLM |
| Very high within-observation residual correlations ($ | r | > 0.95$) |
| Cook's distance extremely large for one observation | Data entry error; genuine anomaly; highly influential leverage point | Verify data accuracy; report analysis with and without the influential observation; use robust estimation |
| AIC increases when adding a theoretically important predictor | Predictor explains negligible variance or adds noise | Report both models; note the theoretically motivated predictor regardless; consider whether is adequate to detect the hypothesised effect |
| -approximations for Wilks', Pillai's, and Roy's give different conclusions | Complex eigenstructure (effects spread across multiple dimensions); small sample size | Report all four statistics; prioritise Pillai's Trace; investigate canonical variate structure |
20. Quick Reference Cheat Sheet
Core Formulas
| Formula | Description |
|---|---|
| Multivariate linear model | |
| OLS/MLE coefficient estimator | |
| Fitted values | |
| Residual matrix | |
| Error covariance estimator | |
| Standard error of coefficient | |
| Hypothesis SSCP matrix | |
| $\Lambda^* = | \mathbf{E} |
| Pillai's Trace | |
| Hotelling-Lawley Trace | |
| Roy's Largest Root | |
| Multivariate partial eta-squared | |
| Point prediction for new observation | |
| Mahalanobis distance of residuals |
Four Multivariate Test Statistics Summary
| Statistic | Formula | Range | Most Robust | Best When |
|---|---|---|---|---|
| Wilks' | Moderate | Standard analysis, effects spread | ||
| Pillai's Trace | Highest | Robustness needed, small/unequal | ||
| Hotelling-Lawley | Lowest | Single dominant dimension | ||
| Roy's | Lowest | Theory predicts single dimension |
General Linear Hypothesis Guide
| Hypothesis | Description | ||
|---|---|---|---|
| All predictors, all responses | (no intercept) | Overall model test | |
| Predictor , all responses | Effect of on all DVs | ||
| All predictors, response | Univariate model for | ||
| Predictor , response | Single coefficient test | ||
| equal effect on and | Differential effect test | ||
| Groups vs. , all DVs | Contrast row for groups | MANOVA pairwise comparison | |
| Subset of predictors (rows ) | -row selection matrix | Joint test of predictor subset |
Effect Size Benchmarks
| Effect Size | Small | Medium | Large |
|---|---|---|---|
| (multivariate) | 0.01 | 0.06 | 0.14 |
| 0.01 | 0.06 | 0.14 | |
| Canonical | 0.01 | 0.09 | 0.25 |
| Univariate | 0.02 | 0.13 | 0.26 |
Assumption Diagnostics Summary
| Assumption | Check | Threshold | Remedy |
|---|---|---|---|
| Linearity | Partial residual plots | Visual pattern | Polynomial/spline terms |
| Multivariate normality | Mardia's tests, Q-Q plot | Transform responses; bootstrap | |
| Independence | Study design; Durbin-Watson | or | Mixed models; GEE |
| Homoscedasticity | Scale-location plot; Breusch-Pagan | Transform responses; HC SEs | |
| No multicollinearity | VIF | Remove/combine predictors; ridge | |
| No outliers | Cook's ; Mahalanobis | ; | Investigate; robust estimation |
| Sufficient | Absolute minimum | Collect more data; reduce |
Model Selection Criteria
| Criterion | Formula | Prefer | Notes |
|---|---|---|---|
| AIC | Lower | Predictive accuracy | |
| BIC | Lower | Parsimony | |
| LRT () | to retain reduced | Nested models only | |
| CV MSPE | Lower | Out-of-sample prediction |
Special Cases of the MLM
| Special Case | Conditions | Notes |
|---|---|---|
| Univariate regression | Standard OLS | |
| MANOVA | = group indicators only | Groups compared on DVs |
| MANCOVA | = groups + covariates | Adjusted group comparisons |
| Profile analysis | = repeated measures; = difference matrix | Tests parallelism, levels, flatness |
| Canonical correlation | Test of all predictors, | CCA = MLM eigenstructure |
| Discriminant analysis | = groups; post-hoc eigenvectors | DFA follows MANOVA |
Prediction Formulas
| Target | Formula |
|---|---|
| Point prediction | |
| Prediction SE (response ) | |
| 95% prediction interval () | |
| Leverage of new point | |
| Prediction ellipsoid |
Multicollinearity Severity Guide
| VIF Range | Severity | Action |
|---|---|---|
| None | Proceed normally | |
| Low | Monitor; no action required | |
| Moderate | Check; consider remediation | |
| High | Remediation recommended | |
| Severe | Remediation required |
This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Multivariate Linear Models using the DataStatPro application. For further reading, consult Johnson & Wichern's "Applied Multivariate Statistical Analysis" (6th ed., Pearson, 2007), Rencher & Christensen's "Methods of Multivariate Analysis" (3rd ed., Wiley, 2012), or Anderson's "An Introduction to Multivariate Statistical Analysis" (3rd ed., Wiley, 2003). For feature requests or support, contact the DataStatPro team.