Exploratory Factor Analysis: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of Exploratory Factor Analysis (EFA) all the way through advanced interpretation, model evaluation, and practical usage within the DataStatPro application. Whether you are encountering factor analysis for the first time or looking to deepen your understanding, this guide builds your knowledge step by step.

Prerequisites and Background Concepts
What is Exploratory Factor Analysis?
The Mathematics Behind EFA
Assumptions of EFA
Types of Factor Analysis
Using the EFA Component
Factor Extraction Methods
Determining the Number of Factors
Factor Rotation
Interpreting EFA Results
Model Fit and Evaluation
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into Exploratory Factor Analysis, it is helpful to be familiar with the following foundational statistical and mathematical concepts. Each is briefly explained below.

1.1 Variables and Correlation

A variable is any measurable characteristic (e.g., age, anxiety score, test result). The correlation coefficient $r$ between two variables $X$ and $Y$ measures the strength and direction of their linear relationship:

$r_{XY} = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2 \sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$

$r = +1$ : Perfect positive linear relationship.
$r = -1$ : Perfect negative linear relationship.
$r = 0$ : No linear relationship.

In EFA, the correlation matrix $\mathbf{R}$ (or covariance matrix $\boldsymbol{\Sigma}$ ) is the primary input — the goal is to explain the patterns of correlation among many observed variables.

1.2 Variance and Covariance

The variance of a variable $X$ measures how spread out its values are:

$\text{Var}(X) = \sigma^2_X = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2$

The covariance between two variables $X$ and $Y$ measures how they vary together:

$\text{Cov}(X, Y) = \sigma_{XY} = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})$

The relationship between correlation and covariance is:

$r_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}$

1.3 Standardisation (Z-Scores)

Standardising a variable transforms it so that it has a mean of 0 and a standard deviation of 1:

$Z_i = \frac{X_i - \bar{X}}{\sigma_X}$

EFA is typically performed on standardised variables (i.e., on the correlation matrix), which ensures that variables measured on different scales contribute equally to the analysis.

1.4 Linear Combinations

A linear combination of variables $X_1, X_2, \dots, X_p$ is a weighted sum:

$L = w_1 X_1 + w_2 X_2 + \dots + w_p X_p$

Factors in EFA are linear combinations of the observed variables. Understanding this concept is fundamental to interpreting what a factor "is."

1.5 Eigenvalues and Eigenvectors

For a square matrix $\mathbf{A}$ , an eigenvector $\mathbf{v}$ and its corresponding eigenvalue $\lambda$ satisfy:

$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$

In EFA, eigenvalues of the correlation matrix indicate how much variance each factor explains. A higher eigenvalue means the corresponding factor accounts for more of the total variance. Eigenvectors define the direction (loadings) of the factors.

1.6 Matrix Notation

EFA makes heavy use of matrix algebra. Key notation used throughout this tutorial:

Symbol	Meaning
$\mathbf{R}$	Correlation matrix ( $p \times p$ )
$\boldsymbol{\Lambda}$	Factor loading matrix ( $p \times m$ )
$\boldsymbol{\Psi}$	Unique variance (uniqueness) matrix ( $p \times p$ , diagonal)
$p$	Number of observed variables
$m$	Number of extracted factors
$n$	Number of observations (sample size)

2. What is Exploratory Factor Analysis?

2.1 The Core Idea

Imagine you administer a 20-item psychological questionnaire to 500 participants. Each item measures something slightly different, but you suspect that the 20 items do not all measure independent constructs — instead, you believe they reflect a smaller number of underlying latent dimensions (e.g., "anxiety", "depression", "wellbeing").

Exploratory Factor Analysis (EFA) is a statistical technique that identifies this smaller set of unobserved (latent) variables — called factors — that explain the pattern of correlations among a larger set of observed variables (also called manifest variables or indicators).

The key insight is: if several variables are highly correlated with each other, they probably share a common underlying cause (a factor).

2.2 Formal Definition

EFA is a dimension reduction and structure discovery technique. Given $p$ observed variables $X_1, X_2, \dots, X_p$ , EFA decomposes their variance into:

Common variance (communality): Variance shared with other variables, attributable to the common factors.
Unique variance (uniqueness): Variance specific to each variable, not shared with others (includes both specific variance and measurement error).

The goal is to represent the $p$ observed variables as functions of $m \ll p$ common factors, where $m$ is much smaller than $p$ .

2.3 Exploratory vs. Confirmatory Factor Analysis

It is crucial to understand the distinction between the two main types of factor analysis:

Feature	Exploratory Factor Analysis (EFA)	Confirmatory Factor Analysis (CFA)
Purpose	Discover factor structure	Test a hypothesised factor structure
Prior theory required?	No — structure emerges from data	Yes — structure is pre-specified
Factor-variable assignments	Not pre-specified	Pre-specified by researcher
Cross-loadings	Allowed	Typically constrained to zero
Software output	Loadings, rotation, fit indices	Model fit, modification indices
Typical use	Scale development, early research	Scale validation, theory testing
DataStatPro feature	✅ This tutorial	❌ Separate module

2.4 Real-World Applications

EFA is used across a wide range of disciplines:

Psychology: Identifying dimensions of personality (e.g., the Big Five: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism).
Education: Discovering underlying abilities measured by test batteries (e.g., verbal reasoning, quantitative ability, working memory).
Marketing: Identifying latent consumer attitudes from survey ratings of product attributes.
Medicine: Grouping correlated symptoms into underlying syndromes or disease dimensions.
Finance: Identifying latent risk factors that drive correlations among asset returns.
Neuroscience: Finding patterns of co-activation in brain imaging data.

2.5 The Fundamental Goal: Parsimony

The driving principle behind EFA is parsimony — explaining as much of the observed complexity as possible with as few underlying factors as necessary. A good factor solution accounts for a large proportion of the total variance in the observed variables using a small number of interpretable factors.

3. The Mathematics Behind EFA

3.1 The Common Factor Model

The mathematical foundation of EFA is the common factor model. For $p$ standardised observed variables and $m$ common factors, the model states that each observed variable $X_j$ can be written as:

$X_j = \lambda_{j1} F_1 + \lambda_{j2} F_2 + \dots + \lambda_{jm} F_m + \epsilon_j$

Where:

$X_j$ is the $j$ -th standardised observed variable ( $j = 1, 2, \dots, p$ ).
$F_k$ is the $k$ -th common factor ( $k = 1, 2, \dots, m$ ), unobserved (latent).
$\lambda_{jk}$ is the factor loading of variable $j$ on factor $k$ — the weight connecting the factor to the variable (analogous to a regression coefficient).
$\epsilon_j$ is the unique factor (error/residual) for variable $j$ , representing variance in $X_j$ not explained by the common factors.

In matrix notation for all $p$ variables simultaneously:

$\mathbf{X} = \boldsymbol{\Lambda} \mathbf{F} + \boldsymbol{\epsilon}$

Where:

$\mathbf{X}$ is the $p \times 1$ vector of observed variables.
$\boldsymbol{\Lambda}$ is the $p \times m$ factor loading matrix.
$\mathbf{F}$ is the $m \times 1$ vector of common factors.
$\boldsymbol{\epsilon}$ is the $p \times 1$ vector of unique factors.

3.2 Model Assumptions

The common factor model is built on the following mathematical assumptions:

Factors are standardised: $E(F_k) = 0$ and $\text{Var}(F_k) = 1$ .
Unique factors have zero mean and are uncorrelated with factors: $E(\epsilon_j) = 0$ and $\text{Cov}(\mathbf{F}, \boldsymbol{\epsilon}) = \mathbf{0}$ .
Unique factors are mutually uncorrelated: $\text{Cov}(\epsilon_j, \epsilon_{j'}) = 0$ for $j \neq j'$ .
(For orthogonal rotation): Factors are uncorrelated with each other: $\text{Cov}(F_k, F_{k'}) = 0$ for $k \neq k'$ .

3.3 The Fundamental Theorem of Factor Analysis

Under the above assumptions, the reproduced correlation matrix is:

$\mathbf{R} = \boldsymbol{\Lambda}\boldsymbol{\Lambda}^T + \boldsymbol{\Psi}$

Where:

$\boldsymbol{\Lambda}\boldsymbol{\Lambda}^T$ is the matrix of correlations explained by the common factors (the common part).
$\boldsymbol{\Psi} = \text{diag}(\psi_1, \psi_2, \dots, \psi_p)$ is a diagonal matrix of unique variances (uniquenesses).

This equation is the heart of EFA. The goal is to find $\boldsymbol{\Lambda}$ and $\boldsymbol{\Psi}$ such that $\boldsymbol{\Lambda}\boldsymbol{\Lambda}^T + \boldsymbol{\Psi}$ approximates the observed correlation matrix $\mathbf{R}$ as closely as possible.

3.4 Communality and Uniqueness

For each observed variable $X_j$ , the total standardised variance (= 1) is partitioned into:

$\underbrace{1}_{\text{Total Variance}} = \underbrace{h_j^2}_{\text{Communality}} + \underbrace{\psi_j}_{\text{Uniqueness}}$

Communality ( $h_j^2$ ): The proportion of variance in $X_j$ explained by the common factors:

$h_j^2 = \sum_{k=1}^{m} \lambda_{jk}^2 = \lambda_{j1}^2 + \lambda_{j2}^2 + \dots + \lambda_{jm}^2$

Uniqueness ( $\psi_j$ ): The proportion of variance in $X_j$ NOT explained by the common factors:

$\psi_j = 1 - h_j^2$

Value	Interpretation
$h_j^2 \approx 1$	Almost all variance in $X_j$ is shared with other variables (good indicator)
$h_j^2 \approx 0$	Variable shares almost no variance with others (poor indicator of any factor)
$h_j^2 < 0.30$	Variable is a weak indicator — consider removing it
$h_j^2 > 0.70$	Variable is a strong, reliable indicator of its factor

3.5 Factor Loadings Interpretation

A factor loading $\lambda_{jk}$ is the correlation between observed variable $X_j$ and factor $F_k$ (under orthogonal rotation). Therefore:

$\lambda_{jk}^2$ is the proportion of variance in $X_j$ explained by factor $F_k$ .
The sum of squared loadings across all variables on factor $k$ is the eigenvalue $\lambda_k$ (also called the factor's sum of squared loadings or SSL):

$\text{SSL}_k = \sum_{j=1}^{p} \lambda_{jk}^2$

The proportion of total variance explained by factor $k$ is:

$\text{Proportion of Variance}_k = \frac{\text{SSL}_k}{p}$

The cumulative variance explained by the first $m$ factors is:

$\text{Cumulative Variance} = \frac{\sum_{k=1}^{m} \text{SSL}_k}{p}$

3.6 The Residual Matrix

After fitting the factor model, the residual matrix is the difference between the observed correlation matrix and the reproduced correlation matrix:

$\mathbf{E} = \mathbf{R} - (\boldsymbol{\Lambda}\boldsymbol{\Lambda}^T + \boldsymbol{\Psi})$

Ideally, residuals should be small (close to 0). Large residuals indicate that the factor model is not fitting the data well.

4. Assumptions of EFA

EFA rests on several important assumptions. Violating these can lead to misleading or uninterpretable results.

4.1 Adequate Sample Size

EFA requires a sufficiently large sample for stable, reproducible factor solutions. Common guidelines:

Rule of Thumb	Recommendation
Absolute minimum	$n \geq 100$
General guideline	$n \geq 200$
Subject-to-variable ratio	At least 5:1 (preferably 10:1 or higher)
MacCallum et al. (1999)	Larger communalities allow smaller samples

⚠️ With small samples ( $n < 100$ ), factor solutions are highly unstable and may not replicate. Always verify your solution with a new sample (cross-validation).

4.2 Adequate Correlations Among Variables (Factorability)

EFA only makes sense if the variables are sufficiently correlated with each other. If all variables are uncorrelated, there are no common factors to extract. Two formal tests assess factorability:

Bartlett's Test of Sphericity: Tests the null hypothesis that the correlation matrix is an identity matrix (all correlations = 0):

$H_0: \mathbf{R} = \mathbf{I}$

The test statistic is approximately chi-squared:

$\chi^2 \approx -\left[(n-1) - \frac{2p+5}{6}\right] \ln|\mathbf{R}|$

With degrees of freedom $df = p(p-1)/2$ .

A significant result ( $p < 0.05$ ) indicates that correlations are sufficiently large for factor analysis to be appropriate.

Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy: Compares the magnitude of observed correlations to partial correlations. For the overall sample:

$\text{KMO} = \frac{\sum_{j \neq k} r_{jk}^2}{\sum_{j \neq k} r_{jk}^2 + \sum_{j \neq k} p_{jk}^2}$

Where $r_{jk}$ are the observed correlations and $p_{jk}$ are the partial correlations between variables $j$ and $k$ after controlling for all other variables.

KMO Value	Interpretation
$\geq 0.90$	Marvellous
$0.80 - 0.89$	Meritorious
$0.70 - 0.79$	Middling
$0.60 - 0.69$	Mediocre
$0.50 - 0.59$	Miserable
$< 0.50$	Unacceptable — do not proceed with EFA

4.3 Continuous (or Approximately Continuous) Variables

EFA is most appropriate for continuous or ordinal variables with 5+ ordered categories treated as approximately continuous. For truly dichotomous or nominal variables, alternative methods such as tetrachoric/polychoric correlation matrices should be used as input.

4.4 Multivariate Normality

EFA estimation methods based on maximum likelihood (ML) assume multivariate normality of the observed variables. Other methods (e.g., Principal Axis Factoring) are more robust to violations of this assumption.

Multivariate normality can be assessed using:

Mardia's test of multivariate skewness and kurtosis.
Henze-Zirkler test.
Visual inspection of univariate distributions (histograms, Q-Q plots) as a first check.

4.5 Linearity

The common factor model assumes linear relationships between the observed variables and the underlying factors. Non-linear relationships between variables will not be captured by standard EFA.

4.6 No Extreme Multicollinearity or Singularity

While EFA requires meaningful correlations, perfect or near-perfect correlations (multicollinearity) are problematic because they make the correlation matrix singular (non-invertible), which prevents the mathematical computations required.

Check for this by examining:

Determinant of the correlation matrix: Should be $> 0.00001$ .
Variables with pairwise $|r| > 0.90$ should be considered for removal or merging.

4.7 No Significant Outliers

Outliers can distort the correlation matrix and lead to factors that reflect extreme cases rather than the true underlying structure. Identify multivariate outliers using Mahalanobis distance:

$D^2_i = (\mathbf{x}_i - \bar{\mathbf{x}})^T \mathbf{S}^{-1} (\mathbf{x}_i - \bar{\mathbf{x}})$

Where $\mathbf{S}$ is the sample covariance matrix. Under multivariate normality, $D^2_i$ follows a chi-squared distribution with $p$ degrees of freedom.

5. Types of Factor Analysis

5.1 Exploratory vs. Confirmatory

Feature	EFA	CFA
Theory required	No	Yes
Factor structure	Data-driven	Pre-specified
All variables load on all factors	Yes (in principle)	No (constrained)
Used for	Scale development	Scale validation

5.2 EFA vs. Principal Component Analysis (PCA)

EFA and PCA are frequently confused but are fundamentally different:

Feature	EFA	PCA
Goal	Identify latent factors causing correlations	Maximally summarise variance into components
Model	$\mathbf{X} = \boldsymbol{\Lambda}\mathbf{F} + \boldsymbol{\epsilon}$	$\mathbf{C} = \mathbf{W}^T\mathbf{X}$ (no error term)
Unique variance	Explicitly modelled ( $\boldsymbol{\Psi}$ )	Ignored (all variance is decomposed)
Factors/Components	Represent latent constructs	Are linear combinations of observed variables
Diagonal of R	Replaced with communality estimates	Kept as 1s
Interpretation	Causal/latent structure	Descriptive summary
When to use	Constructing scales, finding latent traits	Data compression, preprocessing for ML

💡 Rule of thumb: Use EFA when you believe there are latent constructs causing the correlations among your variables. Use PCA when you simply want to reduce dimensionality without assuming underlying causes.

5.3 R-Mode vs. Q-Mode Factor Analysis

R-mode EFA (standard): Analyses correlations among variables to find underlying factors.
Q-mode EFA: Analyses correlations among individuals to find types/clusters of people.

The DataStatPro application implements standard R-mode EFA.

6. Using the EFA Component

The EFA component in DataStatPro provides a complete end-to-end workflow for performing exploratory factor analysis on your datasets.

Step-by-Step Guide

Step 1 — Select Dataset

Choose the dataset you wish to analyse from the "Dataset" dropdown. Ensure your dataset contains multiple numeric variables that you believe may reflect underlying latent constructs.

💡 Tip: EFA works best with variables that are theoretically related to each other. Running EFA on a random collection of unrelated variables will not produce meaningful factors.

Step 2 — Select Variables

Select the observed variables to include in the analysis from the "Variables" dropdown. All selected variables should be numeric (continuous or ordinal).

⚠️ Important: Include only variables you have a theoretical reason to include. Avoid including variables that are clearly unrelated to the constructs of interest, as they can distort the factor structure.

Step 3 — Select Extraction Method

Choose the method used to extract factors from the "Extraction Method" dropdown:

Principal Axis Factoring (PAF): Most commonly used in social sciences. Robust to non-normality.
Maximum Likelihood (ML): Preferred when multivariate normality holds; provides fit statistics and allows significance testing.
Principal Components (PC): Technically PCA, not EFA — included for comparison purposes.

💡 Recommendation: Use PAF as the default. Use ML if you want model fit statistics and your data are approximately multivariate normal.

Step 4 — Select Number of Factors

Specify the number of factors $m$ to extract, or let the application determine this automatically using:

Kaiser criterion (eigenvalue > 1).
Scree plot analysis.
Parallel analysis (most rigorous).

💡 Tip: Always review the scree plot alongside the Kaiser criterion. Parallel analysis is the gold standard for determining the number of factors.

Step 5 — Select Rotation Method

Choose a rotation method from the "Rotation" dropdown:

Orthogonal: Varimax (default), Quartimax, Equamax — assumes factors are uncorrelated.
Oblique: Oblimin, Promax, Geomin — allows factors to be correlated (more realistic for psychological constructs).

💡 Recommendation: Start with Varimax. If factors are theoretically expected to correlate (e.g., cognitive abilities, personality traits), use Oblimin or Promax.

Step 6 — Set Communality Estimation

For PAF, the initial communality estimates are set in the diagonal of the correlation matrix:

Squared Multiple Correlations (SMC): The $R^2$ from regressing each variable on all others. This is the standard and recommended choice.
Maximum absolute correlation: Uses the maximum $|r|$ for each variable.

Step 7 — Display Options

Select which outputs and visualisations to display:

✅ Factor Loading Matrix (Pattern Matrix / Structure Matrix)
✅ Communalities Table
✅ Scree Plot
✅ Factor Correlation Matrix (for oblique rotations)
✅ Bartlett's Test and KMO
✅ Variance Explained Table
✅ Reproduced Correlation Matrix and Residuals

Step 8 — Run the Analysis

Click "Run EFA". The application will:

Compute the correlation matrix.
Assess factorability (KMO, Bartlett's test).
Extract factors using the chosen method.
Rotate the factor loading matrix.
Compute communalities, eigenvalues, and variance explained.
Display all selected outputs.

7. Factor Extraction Methods

Factor extraction is the process of computing the initial (unrotated) factor loading matrix $\boldsymbol{\Lambda}$ from the observed correlation matrix $\mathbf{R}$ .

7.1 Principal Axis Factoring (PAF)

Principal Axis Factoring is the most widely recommended extraction method in the social and behavioural sciences. It differs from PCA in one crucial way: instead of using 1s in the diagonal of the correlation matrix (which represents total variance), PAF replaces them with communality estimates (which represent shared variance only).

Algorithm:

Initialise communality estimates $h_j^{2(0)}$ (typically using Squared Multiple Correlations — the $R^2$ from regressing $X_j$ on all other variables).
Replace the diagonal of $\mathbf{R}$ with communality estimates to form $\mathbf{R}^*$ :

$\mathbf{R}^* = \mathbf{R} - \boldsymbol{\Psi}^{(t)}$

Eigen-decompose $\mathbf{R}^*$ :

$\mathbf{R}^* = \mathbf{V} \boldsymbol{\Delta} \mathbf{V}^T$

Where $\boldsymbol{\Delta} = \text{diag}(\delta_1, \delta_2, \dots, \delta_p)$ contains the eigenvalues and $\mathbf{V}$ contains the eigenvectors.

Extract the $m$ factors corresponding to the $m$ largest eigenvalues. The loading matrix is:

$\boldsymbol{\Lambda} = \mathbf{V}_m \boldsymbol{\Delta}_m^{1/2}$

Where $\mathbf{V}_m$ contains the first $m$ eigenvectors and $\boldsymbol{\Delta}_m^{1/2}$ is the diagonal matrix of square roots of the first $m$ eigenvalues.

Update communality estimates: $h_j^{2(t+1)} = \sum_{k=1}^{m} \lambda_{jk}^2$ .
Iterate steps 2–5 until communality estimates converge (change is below a threshold, e.g., $10^{-5}$ ).

7.2 Maximum Likelihood Factoring (MLF)

Maximum Likelihood Factoring finds the loading matrix $\boldsymbol{\Lambda}$ and uniqueness matrix $\boldsymbol{\Psi}$ that maximise the likelihood of observing the sample correlation matrix $\mathbf{R}$ , assuming multivariate normality.

The log-likelihood function is:

$\ell(\boldsymbol{\Lambda}, \boldsymbol{\Psi}) = -\frac{n}{2} \left[\ln|\boldsymbol{\Sigma}| + \text{tr}(\mathbf{S}\boldsymbol{\Sigma}^{-1}) - \ln|\mathbf{S}| - p\right]$

Where:

$\boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Lambda}^T + \boldsymbol{\Psi}$ is the model-implied covariance matrix.
$\mathbf{S}$ is the sample covariance matrix.
$\text{tr}(\cdot)$ is the matrix trace operator.

Advantages of MLF:

Provides a chi-squared goodness-of-fit test.
Yields standard errors for loadings.
Allows formal model comparison between solutions with different numbers of factors.

Disadvantages of MLF:

Sensitive to violations of multivariate normality.
May fail to converge with small samples or improper solutions (Heywood cases).

7.3 Comparison of Extraction Methods

Method	Best For	Normality Required	Provides Fit Test	Iterative
PAF	Most social science applications	No	No	Yes
MLF	Theory testing, formal fit evaluation	Yes	Yes	Yes
PC	Data reduction, preprocessing	No	No	No

8. Determining the Number of Factors

One of the most consequential decisions in EFA is choosing how many factors to extract. Too few factors leads to an under-factored solution (some structure is missed); too many leads to an over-factored solution (noise is interpreted as meaningful structure).

8.1 The Kaiser Criterion (Eigenvalue $> 1$ Rule)

The Kaiser criterion retains factors whose eigenvalues exceed 1. The logic is that a factor should account for at least as much variance as a single observed variable (which has a standardised variance of 1).

$\text{Retain factor } k \text{ if } \lambda_k > 1$

Advantages: Simple, widely used, implemented in virtually all software.
Disadvantages: Known to over-extract factors with large $p$ and under-extract with small $p$ . Should not be used as the sole criterion.

8.2 The Scree Plot

A scree plot displays the eigenvalues (y-axis) against the factor number (x-axis), sorted in descending order. The number of factors is chosen at the "elbow" — the point where the curve changes from steep to relatively flat (like the scree, or rubble, at the base of a cliff).

$\text{Retain factors before the elbow of the eigenvalue curve}$

Advantages: Visual and intuitive.
Disadvantages: The location of the elbow is subjective. With real data, the elbow may not be clearly defined.

💡 Plot the scree plot and ask: "Where does the steep descent level off?" Retain the factors in the steep part.

8.3 Parallel Analysis (Horn's Method) — Gold Standard

Parallel analysis compares the eigenvalues from your actual data against eigenvalues derived from random data matrices of the same size ( $n \times p$ ), generated many times (e.g., 1,000 replications). Retain only factors whose eigenvalues exceed the corresponding eigenvalues from the random data.

Algorithm:

Generate $B$ random data matrices of size $n \times p$ (e.g., $B = 1000$ ).
Compute the correlation matrix for each and extract eigenvalues.
For each factor position $k$ , compute the 95th percentile of eigenvalues across all $B$ replications: $\lambda_k^{\text{random}, 95}$ .
Retain factor $k$ if: $\lambda_k^{\text{observed}} > \lambda_k^{\text{random}, 95}$

Advantages: Most statistically rigorous criterion. Corrects for sampling error.
Disadvantages: Requires simulation; slightly more complex to implement.

💡 Recommendation: Always use Parallel Analysis as the primary criterion, supported by the scree plot. The Kaiser criterion can serve as a secondary reference but should not override Parallel Analysis.

8.4 Minimum Average Partial (MAP) Test

Velicer's MAP test successively partials out factors from the correlation matrix and examines the average squared partial correlation. The number of factors is chosen at the point where the average squared partial correlation is minimised:

$\bar{r}^2_k = \frac{\sum_{j < j'} r_{jj' \cdot F_1 \dots F_k}^2}{p(p-1)/2}$

Retain $m$ factors where $\bar{r}^2_m$ is minimised.

8.5 The Likelihood Ratio Chi-Squared Test (MLF only)

When using Maximum Likelihood Factoring, the fit of the $m$ -factor model can be tested against the saturated model:

$\chi^2 = (n - 1 - \frac{2p + 4m + 5}{6}) \ln\frac{|\boldsymbol{\Sigma}|}{|\mathbf{S}|}$

With degrees of freedom: $df = \frac{(p-m)^2 - (p+m)}{2}$

A non-significant $\chi^2$ ( $p > 0.05$ ) suggests the $m$ -factor model fits the data adequately. Increase $m$ until the test is non-significant.

⚠️ The chi-squared test is very sensitive to sample size — with large $n$ , even trivially small discrepancies from the model are flagged as significant. Use fit indices (RMSEA, SRMR) alongside the chi-squared test.

8.6 Summary Recommendation

Criterion	Recommended Use	Weight
Parallel Analysis	Primary criterion	⭐⭐⭐⭐⭐
Scree Plot	Visual support	⭐⭐⭐⭐
MAP Test	Secondary criterion	⭐⭐⭐⭐
Chi-Squared Test (ML)	When using ML extraction	⭐⭐⭐
Kaiser criterion	Quick reference only	⭐⭐
Percentage of variance explained	Contextual support	⭐⭐

💡 Best practice: Use parallel analysis as the primary guide, examine the scree plot for visual confirmation, and then extract solutions with $m-1$ , $m$ , and $m+1$ factors, choosing the one that produces the most interpretable, theoretically coherent structure.

9. Factor Rotation

9.1 Why Rotation is Necessary

The unrotated factor solution is mathematically correct but often difficult to interpret, because each factor tends to have moderate-to-large loadings on many variables. Rotation redistributes the variance among factors to achieve simple structure — a solution where each variable loads highly on ideally one factor and near-zero on all others.

Crucially, rotation does not change:

The total variance explained by the $m$ factors combined.
The communalities $h_j^2$ of each variable.
The reproduced correlation matrix.

Rotation only changes how the variance is distributed among factors, making the solution easier to interpret.

9.2 Simple Structure (Thurstone's Criteria)

The target of rotation is simple structure, defined by Thurstone's criteria:

Each variable has at least one near-zero loading.
Each factor has a set of variables with near-zero loadings.
For every pair of factors, there are several variables that load on one but not the other.
When more than 4 factors are extracted, most variables load on only one factor.
For every pair of factors, only a few variables load on both.

9.3 Orthogonal Rotation — Factors Are Uncorrelated

Orthogonal rotation maintains the assumption that factors are uncorrelated (perpendicular in factor space). The rotation applies an orthogonal transformation matrix $\mathbf{T}$ :

$\boldsymbol{\Lambda}^* = \boldsymbol{\Lambda} \mathbf{T}, \quad \text{where } \mathbf{T}^T\mathbf{T} = \mathbf{I}$

Varimax (most common orthogonal rotation)

Varimax maximises the sum of variances of the squared loadings within each factor (across variables), which simplifies the columns of the loading matrix:

$V = \frac{1}{p} \sum_{k=1}^{m} \left[\sum_{j=1}^{p} \lambda_{jk}^{*4} - \frac{(\sum_{j=1}^{p} \lambda_{jk}^{*2})^2}{p}\right]$

This encourages each factor to have a few large loadings and many near-zero loadings.

Quartimax

Quartimax maximises the variance of squared loadings within each variable (across factors):

$Q = \sum_{j=1}^{p} \sum_{k=1}^{m} \lambda_{jk}^{*4}$

This tends to produce one large general factor with others being more specific. Less popular than Varimax.

Equamax

Equamax is a compromise between Varimax (simplifies factors) and Quartimax (simplifies variables):

$E = \frac{m}{2p} Q + \frac{p-1}{2} V$

9.4 Oblique Rotation — Factors Are Allowed to Correlate

Oblique rotation allows factors to correlate with each other, which is more realistic for most psychological and social science constructs (e.g., cognitive abilities, personality traits). The rotation applies a non-orthogonal transformation.

With oblique rotation, two matrices are produced:

Pattern Matrix ( $\boldsymbol{\Lambda}^P$ ): Contains the regression weights (partial relationships between each variable and each factor, controlling for other factors). These are analogous to standardised regression coefficients. Use this matrix to identify which variables define each factor.

Structure Matrix ( $\boldsymbol{\Lambda}^S$ ): Contains the correlations between each variable and each factor (not controlling for other factors):

$\boldsymbol{\Lambda}^S = \boldsymbol{\Lambda}^P \boldsymbol{\Phi}$

Where $\boldsymbol{\Phi}$ is the $m \times m$ factor correlation matrix.

💡 When reporting oblique solutions, report the Pattern Matrix (for identifying factor membership) and the Factor Correlation Matrix (to show how factors relate to each other).

Direct Oblimin ( $\delta$ -parameterised)

Minimises a criterion that depends on the parameter $\delta$ :

$\delta = 0$ : Oblimin — the most common default setting. Allows mild to moderate obliqueness.
$\delta < 0$ : Produces more orthogonal (less correlated) factors.
$\delta > 0$ : Produces more oblique (more correlated) factors (generally not recommended).

Promax

Promax first applies Varimax rotation, then raises the loadings to a power $\kappa$ (typically 3 or 4) and uses the result as a target for an oblique rotation:

$\lambda_{jk}^{\text{Promax}} = \text{sign}(\lambda_{jk}^{\text{Varimax}}) \times |\lambda_{jk}^{\text{Varimax}}|^\kappa$

This is computationally faster than Oblimin and tends to produce similar results.

9.5 Choosing Between Orthogonal and Oblique Rotation

Scenario	Recommended Rotation
Factors are theoretically independent	Orthogonal (Varimax)
Factors are theoretically related	Oblique (Oblimin / Promax)
Unsure	Run both; check the factor correlation matrix
Factor correlations $	r
Factor correlations $	r

💡 Start with an oblique rotation. Examine the factor correlation matrix. If factors are essentially uncorrelated ( $|r| < 0.30$ ), you can switch to orthogonal rotation for simplicity. If factors are correlated, oblique rotation is the better choice.

10. Interpreting EFA Results

10.1 Reading the Factor Loading Matrix

The factor loading matrix is the primary output of EFA. Each cell $\lambda_{jk}$ is the loading of variable $j$ on factor $k$ .

Conventions for interpreting loadings:

| Loading Magnitude ( $|\lambda_{jk}|$ ) | Interpretation | | :------------------------------------ | :------------- | | $\geq 0.70$ | Excellent — strong relationship | | $0.63 - 0.69$ | Very Good | | $0.55 - 0.62$ | Good | | $0.45 - 0.54$ | Fair | | $0.32 - 0.44$ | Poor — consider excluding | | $< 0.32$ | Negligible — typically suppressed in output |

💡 Most researchers apply a loading cut-off of 0.30 or 0.40 — loadings below this threshold are considered negligible and suppressed in the output table for clarity.

10.2 Identifying Factor Membership

Each variable is primarily associated with the factor on which it has the highest loading (assuming a clean simple structure). Label each factor based on the common content of the variables that load most strongly on it.

Example — 3-Factor Solution:

Variable	Factor 1	Factor 2	Factor 3	$h^2$
Anxiety	0.82	0.12	0.08	0.69
Worry	0.79	0.18	0.11	0.67
Fear	0.74	0.05	0.14	0.57
Sadness	0.15	0.81	0.09	0.68
Hopeless	0.21	0.77	0.10	0.65
Fatigue	0.19	0.68	0.22	0.55
Energy	0.07	0.14	0.80	0.66
Focus	0.12	0.18	0.75	0.61
Sleep	0.24	0.28	0.61	0.50

Interpretation:

Factor 1 loads strongly on Anxiety, Worry, Fear → Label: "Anxiety"
Factor 2 loads strongly on Sadness, Hopelessness, Fatigue → Label: "Depression"
Factor 3 loads strongly on Energy, Focus, Sleep → Label: "Vitality"

10.3 Cross-Loadings

A cross-loading occurs when a variable loads substantially on two or more factors (e.g., $\lambda_{j1} = 0.55$ and $\lambda_{j2} = 0.48$ ). Cross-loadings are problematic because:

The variable is ambiguous — it does not clearly belong to one factor.
It complicates interpretation of both factors.

Options when cross-loadings are present:

Accept it if it is theoretically meaningful (the variable genuinely reflects multiple constructs).
Remove the variable from the analysis if it does not clearly belong to any factor.
Try oblique rotation — cross-loadings in orthogonal solutions sometimes resolve with oblique rotation.
Try a different number of factors — cross-loadings sometimes indicate that more (or fewer) factors should be extracted.

10.4 Naming and Interpreting Factors

Factor naming is a theoretical exercise, not a statistical one. To name a factor:

Identify all variables with loadings $> 0.40$ on the factor.
Look for a common theme or construct that unites these variables.
Choose a name that is:
- Theoretically meaningful and grounded in prior literature.
- As specific as possible.
- Consistent with the direction of the loadings (positive vs. negative).

⚠️ Factor names are always interpretive and provisional. Different researchers may label the same factor differently. The statistical output does not name factors — the researcher does.

10.5 Variance Explained Table

The variance explained table summarises how much of the total variance in the observed variables is accounted for by each factor:

Factor	Eigenvalue (SSL)	% Variance	Cumulative %
F1	$\text{SSL}_1$	$\text{SSL}_1 / p \times 100$	—
F2	$\text{SSL}_2$	$\text{SSL}_2 / p \times 100$	—
$\vdots$	$\vdots$	$\vdots$	$\vdots$
Fm	$\text{SSL}_m$	$\text{SSL}_m / p \times 100$	Total

💡 A common goal is for the retained factors to explain at least 50–60% of the total variance collectively, though this benchmark varies by field and the number of variables.

10.6 Factor Scores

Factor scores are estimates of each individual's standing on each latent factor. They can be used as variables in subsequent analyses (e.g., regression, ANOVA).

Several methods exist for computing factor scores:

Regression Method (Thomson):

$\hat{\mathbf{F}} = \mathbf{Z}\mathbf{R}^{-1}\boldsymbol{\Lambda}$

Where $\mathbf{Z}$ is the matrix of standardised observed scores. Factor scores are standardised (mean = 0, SD = 1) but may correlate even under orthogonal rotation.

Bartlett Method:

$\hat{\mathbf{F}} = (\boldsymbol{\Lambda}^T \boldsymbol{\Psi}^{-1} \boldsymbol{\Lambda})^{-1} \boldsymbol{\Lambda}^T \boldsymbol{\Psi}^{-1} \mathbf{Z}$

Produces unbiased factor score estimates. Preferred when unique variances differ substantially.

Anderson-Rubin Method:

Produces factor scores that are uncorrelated with each other even under oblique rotation.

⚠️ Factor scores are estimates of the latent factors, not the true factors (which are unobservable). Factor score indeterminacy means multiple sets of scores are consistent with the same loadings.

11. Model Fit and Evaluation

11.1 Reproduced Correlation Matrix

The reproduced correlation matrix $\hat{\mathbf{R}}$ is the correlation matrix implied by the factor model:

$\hat{\mathbf{R}} = \boldsymbol{\Lambda}\boldsymbol{\Lambda}^T + \boldsymbol{\Psi}$

Comparing $\hat{\mathbf{R}}$ to the observed $\mathbf{R}$ reveals how well the model fits.

11.2 Residual Correlations

The residual matrix contains the differences between observed and reproduced correlations:

$e_{jj'} = r_{jj'} - \hat{r}_{jj'}$

Key diagnostic: The proportion of absolute residuals greater than 0.05:

$< 50\%$ : Acceptable fit
$> 50\%$ : Poor fit — consider adding more factors

The Root Mean Square of Residuals (RMSR):

$\text{RMSR} = \sqrt{\frac{2\sum_{j < j'} e_{jj'}^2}{p(p-1)}}$

RMSR	Interpretation
$< 0.05$	Good fit
$0.05 - 0.08$	Adequate fit
$> 0.08$	Poor fit

11.3 Chi-Squared Test (Maximum Likelihood Only)

When using ML extraction, the chi-squared goodness-of-fit test assesses whether the $m$ -factor model fits the observed data:

$\chi^2 = (n - 1 - \frac{2p + 4m + 5}{6})(\ln|\hat{\boldsymbol{\Sigma}}| - \ln|\mathbf{S}| + \text{tr}(\mathbf{S}\hat{\boldsymbol{\Sigma}}^{-1}) - p)$

$df = \frac{(p-m)^2 - (p+m)}{2}$

$p\text{-value} > 0.05$ : Model fits (fail to reject).
$p\text{-value} < 0.05$ : Model does not fit.

⚠️ The chi-squared test is almost always significant with large samples, even when fit is practically adequate. Always supplement with RMSEA and SRMR.

11.4 RMSEA (Root Mean Square Error of Approximation)

RMSEA measures the discrepancy between the model-implied and observed correlation matrices, per degree of freedom:

$\text{RMSEA} = \sqrt{\max\left(\frac{\chi^2 - df}{df(n-1)}, 0\right)}$

RMSEA	Interpretation
$\leq 0.05$	Close fit
$0.05 - 0.08$	Adequate fit
$0.08 - 0.10$	Mediocre fit
$> 0.10$	Poor fit

11.5 SRMR (Standardised Root Mean Square Residual)

SRMR is the standardised average of the residual correlations:

$\text{SRMR} = \sqrt{\frac{2 \sum_{j \leq j'} \left(\frac{s_{jj'} - \hat{\sigma}_{jj'}}{\sqrt{s_{jj}s_{j'j'}}}\right)^2}{p(p+1)}}$

SRMR	Interpretation
$< 0.05$	Good fit
$0.05 - 0.10$	Acceptable fit
$> 0.10$	Poor fit

11.6 Tucker-Lewis Index (TLI) / Non-Normed Fit Index (NNFI)

The TLI compares the fit of the target model to the null model (no factors), penalised for complexity:

$\text{TLI} = \frac{\chi^2_{\text{null}}/df_{\text{null}} - \chi^2_{\text{model}}/df_{\text{model}}}{\chi^2_{\text{null}}/df_{\text{null}} - 1}$

TLI	Interpretation
$> 0.95$	Good fit
$0.90 - 0.95$	Acceptable fit
$< 0.90$	Poor fit

11.7 Summary of Fit Indices

Index	Good Fit Threshold	Available With
RMSR	$< 0.05$	PAF and MLF
Residuals $> 0.05$	$< 50\%$	PAF and MLF
RMSEA	$\leq 0.05$	MLF only
SRMR	$< 0.05$	MLF only
TLI	$> 0.95$	MLF only
$\chi^2$ $p$ -value	$> 0.05$	MLF only

12. Worked Examples

Example 1: Well-Being Survey (Simple 2-Factor Solution)

A researcher administers an 8-item well-being questionnaire to $n = 350$ participants. Items are rated 1–7 (Strongly Disagree to Strongly Agree):

Item	Content
WB1	I feel calm and relaxed.
WB2	I feel free from anxiety.
WB3	I do not feel worried.
WB4	I feel positive about my life.
WB5	I feel happy most of the time.
WB6	I feel my life has purpose.
WB7	I feel energetic and motivated.
WB8	I feel physically healthy.

Step 1 — Assess Factorability

KMO = 0.87 → Meritorious (excellent, proceed)
Bartlett's $\chi^2(28) = 1245.6$ , $p < 0.001$ → Significant (reject $H_0$ ; correlations are not all zero)

Step 2 — Determine Number of Factors

Eigenvalues from PAF:

Factor	Eigenvalue	% Variance	Cumulative %
1	3.21	40.1%	40.1%
2	1.44	18.0%	58.1%
3	0.71	8.9%	67.0%
4	0.58	7.2%	74.2%

Kaiser criterion: 2 factors ( $\lambda > 1$ )
Scree plot: Clear elbow after factor 2
Parallel analysis: 2 factors exceed random eigenvalues

Decision: Extract 2 factors.

Step 3 — Factor Extraction and Rotation (PAF, Varimax)

Final Rotated Loading Matrix:

Item	Factor 1	Factor 2	$h^2$
WB1	0.81	0.14	0.67
WB2	0.78	0.19	0.65
WB3	0.72	0.11	0.53
WB4	0.16	0.79	0.65
WB5	0.22	0.76	0.63
WB6	0.09	0.71	0.51
WB7	0.18	0.68	0.50
WB8	0.13	0.62	0.40
SSL	2.00	2.25
% Var	25.0%	28.1%

Note: Loadings $< 0.30$ suppressed for clarity.

Step 4 — Interpret the Factors

Factor 1 — High loadings on WB1 (Calm), WB2 (No anxiety), WB3 (No worry) → Label: "Emotional Calm" (anxiety-related items)
Factor 2 — High loadings on WB4 (Positive), WB5 (Happy), WB6 (Purpose), WB7 (Energy), WB8 (Physical health) → Label: "Positive Well-Being"

Step 5 — Evaluate Fit

RMSR = 0.042 → Good fit ( $< 0.05$ )
Proportion of residuals $|e_{jj'}| > 0.05$ : 18% → Good ( $< 50\%$ )
Total variance explained: 53.1% → Acceptable

Conclusion: A 2-factor structure adequately describes the 8-item well-being questionnaire. Factor 1 reflects freedom from anxiety/worry (Emotional Calm) and Factor 2 reflects positive psychological and physical well-being. Both factors are well-defined with high communalities and clean simple structure.

Example 2: Cognitive Ability Battery (Oblique 3-Factor Solution)

A psychologist administers 9 cognitive tests to $n = 500$ students:

Test	Content
T1	Verbal Comprehension
T2	Vocabulary
T3	Reading Speed
T4	Arithmetic
T5	Number Series
T6	Spatial Reasoning
T7	Mental Rotation
T8	Pattern Recognition
T9	Abstract Reasoning

Factorability: KMO = 0.91 (Marvellous), Bartlett's $p < 0.001$ .
Number of factors: Parallel analysis suggests 3 factors.
Extraction: PAF with Promax rotation (oblique — cognitive abilities are expected to correlate).

Rotated Pattern Matrix:

Test	Factor 1	Factor 2	Factor 3	$h^2$
T1	0.83	0.08	0.05	0.72
T2	0.80	0.11	0.09	0.68
T3	0.69	0.09	0.16	0.55
T4	0.10	0.82	0.07	0.70
T5	0.07	0.79	0.11	0.65
T6	0.09	0.67	0.18	0.53
T7	0.11	0.09	0.85	0.75
T8	0.08	0.14	0.79	0.66
T9	0.19	0.22	0.68	0.59

Factor Correlation Matrix ( $\boldsymbol{\Phi}$ ):

	F1	F2	F3
F1	1.00	0.42	0.38
F2	0.42	1.00	0.45
F3	0.38	0.45	1.00

Interpretation:

Factor 1 — T1 (Verbal Comprehension), T2 (Vocabulary), T3 (Reading Speed) → Label: "Verbal Ability"
Factor 2 — T4 (Arithmetic), T5 (Number Series), T6 (Spatial Reasoning) → Label: "Quantitative / Numerical Ability"
Factor 3 — T7 (Mental Rotation), T8 (Pattern Recognition), T9 (Abstract Reasoning) → Label: "Spatial / Fluid Ability"

Factor Correlations: All three factors are moderately correlated ( $r \approx 0.38 - 0.45$ ), consistent with the concept of general intelligence (g) — a higher-order factor that accounts for the positive correlations among cognitive ability factors. This justifies the use of oblique rotation.

Variance Explained:

Factor	SSL (Pattern)	% Variance
F1	1.95	21.7%
F2	1.88	20.9%
F3	2.00	22.2%
Total	5.83	64.8%

The 3-factor oblique solution explains 64.8% of total variance, with a clean simple structure and theoretically meaningful factor labels.

13. Common Mistakes and How to Avoid Them

Mistake 1: Using EFA With Too Few Variables Per Factor

Problem: Having only 1–2 variables per factor produces an under-determined, unstable factor that may not replicate.
Solution: Aim for at least 3–5 variables per factor. A factor defined by 3+ variables with loadings $> 0.50$ is generally considered reliable.

Mistake 2: Using the Kaiser Criterion as the Sole Decision Rule

Problem: The eigenvalue-greater-than-1 rule is known to over-extract factors with large $p$ and under-extract with small $p$ . Relying on it exclusively can lead to extracting the wrong number of factors.
Solution: Use parallel analysis as the primary criterion, supported by the scree plot. Treat the Kaiser criterion as one of several inputs.

Mistake 3: Defaulting to Orthogonal Rotation Without Justification

Problem: Assuming factors are uncorrelated is often unrealistic in psychology and social sciences. Forcing orthogonality can distort the loading structure and mask meaningful factor relationships.
Solution: Default to oblique rotation (Oblimin or Promax). If the factor correlations turn out to be near zero ( $|r| < 0.30$ ), you can switch to orthogonal rotation for simplicity.

Mistake 4: Interpreting Factors Without Adequate Communality

Problem: Retaining variables with very low communalities ( $h^2 < 0.30$ ) means that most of their variance is unique/error variance — they are poor indicators of any common factor.
Solution: Examine communalities before interpreting the solution. Consider removing variables with $h^2 < 0.30$ and re-running the analysis.

Mistake 5: Running EFA With an Inadequate Sample Size

Problem: With small samples ( $n < 100$ ), factor solutions are highly unstable, and loadings are poorly estimated. The solution may not replicate in a new sample.
Solution: Collect at least $n = 200$ observations, and aim for a 10:1 subject-to-variable ratio. If you have a small sample, interpret results cautiously and seek replication.

Mistake 6: Including Variables That Are Not Conceptually Related

Problem: Including variables from completely unrelated domains forces unnatural groupings, producing factors that are statistically driven rather than theoretically meaningful.
Solution: Only include variables that are theoretically expected to reflect common constructs. Use domain knowledge to guide variable selection before running EFA.

Mistake 7: Confusing EFA With PCA

Problem: Using PCA when EFA is appropriate (or vice versa) leads to conceptually incorrect conclusions. PCA components are not latent factors.
Solution: Use EFA when you want to identify latent constructs causing the correlations. Use PCA when you simply want to reduce data dimensionality without positing latent causes.

Mistake 8: Ignoring Cross-Loadings and Item-Factor Mismatches

Problem: Accepting a factor solution without scrutinising cross-loadings or theoretically misplaced items can lead to scale construction errors.
Solution: Inspect the full loading matrix. Remove or re-examine variables that: (a) cross-load substantially on two or more factors, (b) load on a factor that does not make theoretical sense, or (c) have communalities below 0.30.

Mistake 9: Treating Factor Scores as Error-Free

Problem: Using factor scores in subsequent analyses without acknowledging their uncertainty (due to factor score indeterminacy) can lead to overconfident conclusions.
Solution: Acknowledge that factor scores are estimates of latent variables and interpret subsequent analyses with appropriate caution.

Mistake 10: Reporting an Unrotated Solution

Problem: Publishing an unrotated factor solution is almost always uninterpretable and inappropriate.
Solution: Always rotate the factor solution (Varimax as a minimum; Oblimin/Promax when correlations are expected). Report the rotated loading matrix, not the unrotated matrix.

14. Troubleshooting

Problem	Likely Cause	Solution
KMO $< 0.50$	Variables are too weakly correlated	Review variable selection; remove irrelevant variables
Bartlett's test not significant	Correlations are too small for EFA	Add more strongly related variables; reconsider analysis
Algorithm fails to converge	Too many factors; small sample; Heywood cases	Reduce the number of factors; increase sample size
Heywood case (communality $> 1$ or uniqueness $< 0$ )	Over-factoring; small sample; multicollinearity	Reduce number of factors; remove highly collinear variables; increase $n$
All variables load on one large factor	Variables are too similar or highly intercorrelated	Re-examine variable selection; check for redundancy
No clean simple structure after rotation	Wrong number of factors; inappropriate rotation	Try different $m$ ; switch between orthogonal/oblique
Very low communalities ( $h^2 < 0.30$ )	Variables are poor indicators of any common factor	Remove poor variables; add better indicator variables
Large proportion of residuals $> 0.05$	Too few factors extracted	Increase the number of factors
Factors are uninterpretable	Variables are theoretically mixed; too many factors	Re-examine variable selection and factor number
Eigenvalues all near-equal	No clear factor structure; essentially random data	Reconsider whether EFA is appropriate for this data
Cross-loadings on many variables	Over-factoring or under-factoring	Try $m-1$ and $m+1$ solutions; try oblique rotation

15. Quick Reference Cheat Sheet

Core Equations

Formula	Description
$X_j = \sum_{k=1}^m \lambda_{jk} F_k + \epsilon_j$	Common factor model (per variable)
$\mathbf{X} = \boldsymbol{\Lambda}\mathbf{F} + \boldsymbol{\epsilon}$	Common factor model (matrix form)
$\mathbf{R} = \boldsymbol{\Lambda}\boldsymbol{\Lambda}^T + \boldsymbol{\Psi}$	Fundamental theorem of factor analysis
$h_j^2 = \sum_{k=1}^{m} \lambda_{jk}^2$	Communality of variable $j$
$\psi_j = 1 - h_j^2$	Uniqueness of variable $j$
$\text{SSL}_k = \sum_{j=1}^{p} \lambda_{jk}^2$	Sum of squared loadings (eigenvalue) for factor $k$
$\text{Var}_k = \text{SSL}_k / p$	Proportion of total variance explained by factor $k$
$\mathbf{E} = \mathbf{R} - \hat{\mathbf{R}}$	Residual correlation matrix
$\text{RMSR} = \sqrt{\frac{2\sum_{j < j'} e_{jj'}^2}{p(p-1)}}$	Root mean square of residuals
$\text{RMSEA} = \sqrt{\max\left(\frac{\chi^2 - df}{df(n-1)}, 0\right)}$	Root mean square error of approximation

Factorability Benchmarks

Test	Threshold for Proceeding
Bartlett's Test	$p < 0.05$
KMO Overall	$\geq 0.60$ (minimum), $\geq 0.70$ (recommended)
Correlation matrix determinant	$> 0.00001$

Number of Factors Decision Guide

Criterion	Decision Rule
Parallel Analysis	Retain factors where $\lambda_{\text{obs}} > \lambda_{\text{random}}^{95\text{th}}$
Scree Plot	Retain factors in the steep part, before the elbow
Kaiser Criterion	Retain factors with $\lambda > 1$ (secondary only)
MAP Test	Retain $m$ that minimises average squared partial correlation

Loading Interpretation Guide

| $|\lambda_{jk}|$ | Strength | | :--------------- | :------- | | $\geq 0.70$ | Excellent | | $0.55 - 0.69$ | Good | | $0.45 - 0.54$ | Fair | | $0.32 - 0.44$ | Poor | | $< 0.32$ | Negligible — suppress |

Rotation Decision Guide

Scenario	Rotation
Factors assumed independent	Varimax (orthogonal)
Factors expected to correlate	Oblimin or Promax (oblique)
Unsure	Run oblique first; check $\boldsymbol{\Phi}$

Fit Index Benchmarks

Index	Good Fit	Acceptable Fit	Poor Fit
RMSR	$< 0.05$	$0.05 - 0.08$	$> 0.08$
RMSEA	$< 0.05$	$0.05 - 0.08$	$> 0.10$
SRMR	$< 0.05$	$0.05 - 0.10$	$> 0.10$
TLI	$> 0.95$	$0.90 - 0.95$	$< 0.90$
Residuals $> 0.05$	$< 25\%$	$25 - 50\%$	$> 50\%$

This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Exploratory Factor Analysis using the DataStatPro application. For further reading, consult Fabrigar & Wegener's "Exploratory Factor Analysis" (2012), Costello & Osborne's "Best Practices in Exploratory Factor Analysis" (2005), or Gorsuch's "Factor Analysis" (1983). For feature requests or support, contact the DataStatPro team.

Exploratory Factor Analysis (EFA)

Exploratory Factor Analysis: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 Variables and Correlation

1.2 Variance and Covariance

1.3 Standardisation (Z-Scores)

1.4 Linear Combinations

1.5 Eigenvalues and Eigenvectors

1.6 Matrix Notation

2. What is Exploratory Factor Analysis?

2.1 The Core Idea

2.2 Formal Definition

2.3 Exploratory vs. Confirmatory Factor Analysis

2.4 Real-World Applications

2.5 The Fundamental Goal: Parsimony

3. The Mathematics Behind EFA

3.1 The Common Factor Model

3.2 Model Assumptions

3.3 The Fundamental Theorem of Factor Analysis

3.4 Communality and Uniqueness

3.5 Factor Loadings Interpretation

3.6 The Residual Matrix

4. Assumptions of EFA

4.1 Adequate Sample Size

4.2 Adequate Correlations Among Variables (Factorability)

4.3 Continuous (or Approximately Continuous) Variables

4.4 Multivariate Normality

4.5 Linearity

4.6 No Extreme Multicollinearity or Singularity

4.7 No Significant Outliers

5. Types of Factor Analysis

5.1 Exploratory vs. Confirmatory

5.2 EFA vs. Principal Component Analysis (PCA)

5.3 R-Mode vs. Q-Mode Factor Analysis

6. Using the EFA Component

Step-by-Step Guide

7. Factor Extraction Methods

7.1 Principal Axis Factoring (PAF)

7.2 Maximum Likelihood Factoring (MLF)

7.3 Comparison of Extraction Methods

8. Determining the Number of Factors

8.1 The Kaiser Criterion (Eigenvalue >1> 1>1 Rule)

8.2 The Scree Plot

8.3 Parallel Analysis (Horn's Method) — Gold Standard

8.4 Minimum Average Partial (MAP) Test

8.5 The Likelihood Ratio Chi-Squared Test (MLF only)

8.6 Summary Recommendation

9. Factor Rotation

9.1 Why Rotation is Necessary

9.2 Simple Structure (Thurstone's Criteria)

9.3 Orthogonal Rotation — Factors Are Uncorrelated

9.4 Oblique Rotation — Factors Are Allowed to Correlate

9.5 Choosing Between Orthogonal and Oblique Rotation

10. Interpreting EFA Results

10.1 Reading the Factor Loading Matrix

10.2 Identifying Factor Membership

10.3 Cross-Loadings

10.4 Naming and Interpreting Factors

10.5 Variance Explained Table

10.6 Factor Scores

11. Model Fit and Evaluation

11.1 Reproduced Correlation Matrix

11.2 Residual Correlations

11.3 Chi-Squared Test (Maximum Likelihood Only)

11.4 RMSEA (Root Mean Square Error of Approximation)

11.5 SRMR (Standardised Root Mean Square Residual)

11.6 Tucker-Lewis Index (TLI) / Non-Normed Fit Index (NNFI)

11.7 Summary of Fit Indices

12. Worked Examples

Example 1: Well-Being Survey (Simple 2-Factor Solution)

Example 2: Cognitive Ability Battery (Oblique 3-Factor Solution)

13. Common Mistakes and How to Avoid Them

Mistake 1: Using EFA With Too Few Variables Per Factor

Mistake 2: Using the Kaiser Criterion as the Sole Decision Rule

Mistake 3: Defaulting to Orthogonal Rotation Without Justification

Mistake 4: Interpreting Factors Without Adequate Communality

Mistake 5: Running EFA With an Inadequate Sample Size

Mistake 6: Including Variables That Are Not Conceptually Related

Mistake 7: Confusing EFA With PCA

8.1 The Kaiser Criterion (Eigenvalue $> 1$ Rule)