Knowledge Base / Multivariate Analysis Inferential Statistics 12 min read

Multivariate Analysis

Introduction to multivariate statistical techniques.

How to Perform Multivariate Analysis Introduction Using DataStatPro

Learning Objectives

By the end of this tutorial, you will be able to:

What is Multivariate Analysis?

Multivariate analysis involves statistical techniques that analyze multiple variables simultaneously to:

Advantages of Multivariate Approaches

Types of Multivariate Techniques

Dependence Techniques

One or more variables depend on others

TechniqueDependent VariablesIndependent VariablesPurpose
Multiple Regression1 ContinuousMultiplePrediction, explanation
Logistic Regression1 Binary/CategoricalMultipleClassification, prediction
MANOVAMultiple Continuous1+ CategoricalGroup differences
Discriminant Analysis1 CategoricalMultiple ContinuousClassification
Canonical CorrelationMultiple ContinuousMultiple ContinuousRelationship analysis

Interdependence Techniques

No distinction between dependent/independent variables

TechniqueData TypePurpose
Principal Component Analysis (PCA)ContinuousDimension reduction
Factor AnalysisContinuousIdentify latent factors
Cluster AnalysisAnyGroup similar observations
Multidimensional Scaling (MDS)Similarity/DistanceSpatial representation
Correspondence AnalysisCategoricalAssociation patterns

Step-by-Step Guide: Principal Component Analysis (PCA)

When to Use PCA

Use PCA when you want to:

Step 1: Data Preparation

  1. Access PCA Tools

    • Navigate to Advanced AnalysisMultivariate
    • Select Principal Component Analysis
  2. Data Requirements

    • Multiple continuous variables (typically 5+)
    • Adequate sample size (5-10 observations per variable)
    • Variables should be correlated (not independent)
    • Consider standardization for different scales
  3. Preliminary Checks

    • Examine correlation matrix
    • Check for missing data patterns
    • Assess normality (helpful but not required)
    • Identify outliers

Step 2: Assessing Suitability for PCA

Kaiser-Meyer-Olkin (KMO) Test

  1. Interpretation
    • KMO > 0.9: Excellent
    • KMO > 0.8: Good
    • KMO > 0.7: Adequate
    • KMO > 0.6: Mediocre
    • KMO < 0.5: Unacceptable

Bartlett's Test of Sphericity

  1. Purpose
    • Tests if correlation matrix differs from identity matrix
    • Significant result (p < .05) indicates PCA is appropriate
    • Non-significant suggests variables are uncorrelated

Step 3: Extracting Components

Determining Number of Components

  1. Kaiser Criterion (Eigenvalue > 1)

    • Retain components with eigenvalues > 1.0
    • Most common but sometimes over-extracts
  2. Scree Plot

    • Plot eigenvalues in descending order
    • Look for "elbow" where slope levels off
    • Retain components before the elbow
  3. Percentage of Variance

    • Retain components explaining 70-80% of variance
    • Balance between parsimony and explanation
  4. Parallel Analysis

    • Compare eigenvalues to random data
    • More accurate than Kaiser criterion
    • Retain components above random baseline

Step 4: Interpreting Components

Component Loadings

  1. Loading Interpretation

    • |Loading| > 0.7: Excellent
    • |Loading| > 0.6: Good
    • |Loading| > 0.5: Fair
    • |Loading| < 0.4: Poor
  2. Component Naming

    • Examine variables with high loadings
    • Identify common theme or construct
    • Name component based on content

Rotation Methods

  1. Orthogonal Rotation (Varimax)

    • Components remain uncorrelated
    • Maximizes variance of squared loadings
    • Easier interpretation
  2. Oblique Rotation (Promax, Oblimin)

    • Allows components to correlate
    • More realistic for psychological/social constructs
    • Provides pattern and structure matrices

Example: Personality Assessment

Scenario

Analyzing 20 personality items to identify underlying dimensions.

Data Preparation

Participant | Item1 | Item2 | ... | Item20
001         | 4     | 3     | ... | 5
002         | 2     | 4     | ... | 3
...

PCA Results

KMO = 0.85 (Good)
Bartlett's Test: χ² = 1247.3, p < .001

Component Eigenvalues:
PC1: 4.2 (21% variance)
PC2: 3.1 (15.5% variance)
PC3: 2.4 (12% variance)
PC4: 1.8 (9% variance)
PC5: 1.2 (6% variance)
Total: 63.5% variance explained

Component Interpretation

Component 1 - "Extraversion"
Item5 (Talkative): 0.78
Item12 (Outgoing): 0.74
Item18 (Social): 0.71

Component 2 - "Conscientiousness"
Item3 (Organized): 0.82
Item9 (Reliable): 0.76
Item15 (Punctual): 0.69

Step-by-Step Guide: Cluster Analysis

When to Use Cluster Analysis

Use cluster analysis to:

Types of Clustering

Hierarchical Clustering

  1. Agglomerative (Bottom-up)

    • Start with individual observations
    • Merge closest pairs iteratively
    • Creates dendrogram showing hierarchy
  2. Divisive (Top-down)

    • Start with all observations together
    • Split into smaller groups iteratively
    • Less common in practice

Non-Hierarchical Clustering

  1. K-Means Clustering

    • Specify number of clusters in advance
    • Minimizes within-cluster variance
    • Fast and efficient for large datasets
  2. Model-Based Clustering

    • Assumes clusters follow statistical distributions
    • Provides probability of cluster membership
    • Can handle different cluster shapes

Step 1: Distance Measures

For Continuous Variables

  1. Euclidean Distance

    d = √Σ(xi - yi)²
    
    • Most common measure
    • Sensitive to scale differences
    • Good for compact, spherical clusters
  2. Manhattan Distance

    d = Σ|xi - yi|
    
    • Less sensitive to outliers
    • Good for high-dimensional data

For Mixed Data Types

  1. Gower Distance
    • Handles continuous, ordinal, and nominal variables
    • Standardizes different variable types
    • Range: 0 to 1

Step 2: Linkage Methods (Hierarchical)

  1. Single Linkage (Nearest Neighbor)

    • Distance between closest points
    • Can create elongated clusters
    • Sensitive to outliers
  2. Complete Linkage (Farthest Neighbor)

    • Distance between farthest points
    • Creates compact, spherical clusters
    • Less sensitive to outliers
  3. Average Linkage

    • Average distance between all pairs
    • Compromise between single and complete
    • Generally good performance
  4. Ward's Method

    • Minimizes within-cluster sum of squares
    • Creates equal-sized, compact clusters
    • Often preferred choice

Step 3: Determining Number of Clusters

Hierarchical Methods

  1. Dendrogram Inspection

    • Look for large jumps in fusion coefficients
    • Cut dendrogram at appropriate height
    • Visual interpretation required
  2. Elbow Method

    • Plot within-cluster sum of squares vs. number of clusters
    • Look for "elbow" where improvement slows
    • Balance fit and parsimony

Statistical Criteria

  1. Silhouette Analysis

    • Measures how well observations fit their clusters
    • Range: -1 to +1 (higher is better)
    • Average silhouette width indicates optimal k
  2. Gap Statistic

    • Compares within-cluster variation to random data
    • Choose k where gap is largest
    • More objective than visual methods

Step-by-Step Guide: MANOVA (Multivariate ANOVA)

When to Use MANOVA

Use MANOVA when:

Advantages over Multiple ANOVAs

Step 1: Assumptions

Multivariate Normality

  1. Assessment
    • Check univariate normality for each DV
    • Use Mardia's test for multivariate normality
    • Examine Q-Q plots and histograms

Homogeneity of Covariance Matrices

  1. Box's M Test
    • Tests equality of covariance matrices
    • Sensitive to non-normality
    • Non-significant result preferred (p > .001)

Independence and Linearity

  1. Independence: Observations should be independent
  2. Linearity: Linear relationships among DVs
  3. No extreme outliers: Check Mahalanobis distance

Step 2: Running MANOVA

  1. Test Statistics

    • Pillai's Trace: Most robust, recommended
    • Wilks' Lambda: Most common, good power
    • Hotelling's Trace: Sensitive to assumptions
    • Roy's Largest Root: Can be liberal
  2. Effect Size

    • Partial eta-squared (ηp²)
    • Multivariate effect size measures
    • Cohen's conventions apply

Step 3: Follow-up Analyses

Univariate ANOVAs

  1. When Significant MANOVA
    • Examine which DVs differ between groups
    • Apply Bonferroni correction
    • Interpret with caution (loss of multivariate context)

Discriminant Analysis

  1. Purpose
    • Identify linear combinations that best separate groups
    • Understand nature of group differences
    • More informative than univariate follow-ups

Real-World Example: Educational Intervention Study

Scenario

Comparing three teaching methods on multiple learning outcomes: test scores, motivation, and engagement.

Design

Results

MANOVA Results:
Pillai's Trace = 0.34, F(6, 292) = 9.2, p < .001, ηp² = .17

Univariate Follow-ups:
Test Score: F(2, 147) = 12.4, p < .001, ηp² = .14
Motivation: F(2, 147) = 8.7, p < .001, ηp² = .11
Engagement: F(2, 147) = 15.2, p < .001, ηp² = .17

Discriminant Analysis:
Function 1 (68% variance): High engagement, moderate motivation
Function 2 (32% variance): High test scores, low motivation

Interpretation

Advanced Multivariate Techniques

Canonical Correlation Analysis

  1. Purpose

    • Analyze relationships between two sets of variables
    • Find linear combinations that maximize correlation
    • Extension of multiple regression to multiple DVs
  2. Example Applications

    • Academic predictors vs. success measures
    • Personality traits vs. job performance indicators
    • Environmental factors vs. health outcomes

Structural Equation Modeling (SEM)

  1. Capabilities

    • Test complex theoretical models
    • Include latent (unobserved) variables
    • Handle measurement error
    • Test mediation and moderation
  2. Components

    • Measurement model (factor analysis)
    • Structural model (path analysis)
    • Model fit assessment
    • Modification indices

Publication-Ready Reporting

PCA Results

"Principal component analysis was conducted on 20 personality items (N = 200). The Kaiser-Meyer-Olkin measure verified sampling adequacy (KMO = .85), and Bartlett's test of sphericity indicated correlations were suitable for PCA, χ²(190) = 1247.3, p < .001. Five components with eigenvalues > 1.0 were extracted, explaining 63.5% of the total variance. Varimax rotation revealed interpretable factors corresponding to the Big Five personality dimensions."

MANOVA Results

"A one-way MANOVA was conducted to examine group differences on three learning outcomes. Box's M test was non-significant (p = .08), supporting homogeneity of covariance matrices. The multivariate test revealed significant group differences, Pillai's Trace = .34, F(6, 292) = 9.2, p < .001, ηp² = .17. Follow-up univariate ANOVAs showed significant differences on all three outcomes (all ps < .001)."

APA Style Table

Table 1
Principal Component Analysis Results with Varimax Rotation

Item                    PC1    PC2    PC3    PC4    PC5    h²
Talkative              .78    .12    .05    .18    .09    .66
Outgoing               .74    .08    .15    .22    .14    .64
Social                 .71    .19    .11    .08    .26    .62
Organized              .15    .82    .09    .11    .05    .71
Reliable               .08    .76    .18    .14    .12    .65
Punctual               .22    .69    .05    .19    .08    .57

Eigenvalue             4.2    3.1    2.4    1.8    1.2
% Variance            21.0   15.5   12.0    9.0    6.0
Cumulative %          21.0   36.5   48.5   57.5   63.5

Note. Loadings > .40 are bolded. h² = communality.

Troubleshooting Common Issues

Problem: Low KMO or Non-significant Bartlett's Test

Solution: Check correlations, remove uncorrelated variables, increase sample size.

Problem: Difficult to Interpret Components/Factors

Solution: Try different rotation methods, extract different number of factors, examine residuals.

Problem: MANOVA Assumptions Violated

Solution: Transform variables, use robust methods, consider separate ANOVAs with correction.

Problem: Too Many/Few Clusters

Solution: Use multiple criteria, consider domain knowledge, validate with external criteria.

Frequently Asked Questions

Q: How many variables can I include in multivariate analysis?

A: Depends on sample size and technique. General rule: 5-10 observations per variable for PCA/FA, 20+ per group for MANOVA.

Q: Should I standardize variables before analysis?

A: Yes, if variables have different scales or units. Not necessary if all variables use same scale.

Q: Can I use multivariate techniques with missing data?

A: Some techniques handle missing data (e.g., maximum likelihood), others require complete cases or imputation.

Q: How do I validate multivariate results?

A: Use cross-validation, split-sample validation, or external criteria to confirm findings.

Q: What if my data don't meet multivariate normality?

A: Many techniques are robust to moderate violations. Consider transformations or robust alternatives.

Related Tutorials

Next Steps

After mastering basic multivariate analysis, consider exploring:


This tutorial is part of DataStatPro's comprehensive statistical analysis guide. For more advanced techniques and personalized support, explore our Pro features.