Knowledge Base / How to Interpret Effect Sizes and Clinical Significance Inferential Statistics 14 min read

How to Interpret Effect Sizes and Clinical Significance

Learn to interpret effect sizes and assess clinical significance.

How to Interpret Effect Sizes and Clinical Significance Using DataStatPro

Learning Objectives

By the end of this tutorial, you will be able to:

Understanding Statistical vs. Clinical Significance

Statistical Significance

Definition: The probability that an observed difference occurred by chance alone is less than a predetermined threshold (usually p < 0.05).

Characteristics:

Example:

Large study (n = 10,000):
Mean difference = 0.5 points on 100-point scale
p = 0.001 (statistically significant)
But clinically meaningless

Clinical Significance

Definition: The magnitude of difference that would be meaningful in clinical practice or real-world applications.

Characteristics:

Example:

Small study (n = 50):
Mean difference = 15 points on 100-point scale
p = 0.08 (not statistically significant)
But potentially clinically important

The Relationship

Ideal scenario: Both statistically AND clinically significant
Concern 1: Statistically significant but clinically trivial
Concern 2: Clinically important but not statistically significant
Clear result: Neither statistically nor clinically significant

Types of Effect Size Measures

Standardized Mean Differences

Cohen's d

Formula: d = (M₁ - M₂) / SDpooled

Interpretation Guidelines:

Small effect:    d = 0.2
Medium effect:   d = 0.5
Large effect:    d = 0.8
Very large:      d = 1.2+

Clinical Context Examples:

Psychotherapy outcomes:
- d = 0.3: Minimal improvement
- d = 0.5: Moderate improvement
- d = 0.8: Substantial improvement

Educational interventions:
- d = 0.2: Small educational gain
- d = 0.4: Educationally significant
- d = 0.6: Large educational impact

Hedges' g

Purpose: Bias-corrected version of Cohen's d for small samples Formula: g = d × (1 - 3/(4(n₁ + n₂) - 9))

When to Use:

Glass's Δ (Delta)

Formula: Δ = (M₁ - M₂) / SD_control Use Case: When groups have different variances or in experimental vs. control comparisons

Correlation-Based Effect Sizes

Pearson's r

Interpretation:

Small effect:    r = 0.10 (1% variance explained)
Medium effect:   r = 0.30 (9% variance explained)
Large effect:    r = 0.50 (25% variance explained)

Clinical Examples:

Biomarker correlations:
- r = 0.20: Weak but potentially useful
- r = 0.40: Moderate clinical utility
- r = 0.60: Strong clinical relationship

Coefficient of Determination (r²)

Interpretation: Proportion of variance explained

r = 0.30 → r² = 0.09 (9% variance explained)
r = 0.50 → r² = 0.25 (25% variance explained)
r = 0.70 → r² = 0.49 (49% variance explained)

Categorical Data Effect Sizes

Odds Ratio (OR)

Interpretation:

OR = 1.0: No association
OR = 1.5: Small to moderate effect
OR = 2.0: Moderate effect
OR = 3.0: Large effect
OR = 5.0: Very large effect

Clinical Context:

Risk factors:
- OR = 1.2: Minimal increased risk
- OR = 2.0: Doubled risk (clinically important)
- OR = 5.0: Five-fold increased risk (major concern)

Risk Ratio (RR)

Interpretation: Similar to OR but more intuitive

RR = 1.0: No difference in risk
RR = 1.5: 50% increased risk
RR = 2.0: Risk doubled
RR = 0.5: Risk halved

Number Needed to Treat (NNT)

Formula: NNT = 1 / |Risk Difference|

Clinical Interpretation:

NNT = 2: Very effective (treat 2 to benefit 1)
NNT = 5: Effective (treat 5 to benefit 1)
NNT = 10: Moderately effective
NNT = 25: Minimally effective
NNT = 100: Questionable clinical value

Real-World Examples:

Aspirin for heart attack prevention: NNT ≈ 67
Statins for cardiovascular events: NNT ≈ 60
Antibiotics for pneumonia: NNT ≈ 1.4

Number Needed to Harm (NNH)

Purpose: Number of patients treated before one experiences harm Interpretation: Higher NNH values indicate safer treatments

NNH = 10: 1 in 10 patients harmed (concerning)
NNH = 100: 1 in 100 patients harmed (acceptable for serious conditions)
NNH = 1000: 1 in 1000 patients harmed (very safe)

ANOVA Effect Sizes

Eta-squared (η²)

Formula: η² = SSbetween / SStotal Interpretation:

Small effect:    η² = 0.01 (1% variance explained)
Medium effect:   η² = 0.06 (6% variance explained)
Large effect:    η² = 0.14 (14% variance explained)

Partial Eta-squared (ηp²)

Formula: ηp² = SSeffect / (SSeffect + SSerror) Use: More common in factorial designs with multiple factors

Omega-squared (ω²)

Purpose: Less biased estimate than η² Formula: ω² = (SSbetween - (k-1)MSerror) / (SStotal + MSerror)

Clinical Significance Thresholds by Domain

Psychology and Mental Health

Depression Scales

Beck Depression Inventory (BDI-II):
- Minimal change: 3-5 points
- Clinically significant: 8-9 points
- Reliable change: 8.46 points

Hamilton Depression Rating Scale (HAM-D):
- Response: ≥50% reduction
- Remission: Score ≤7
- Clinically significant: 3-point change

Anxiety Measures

Generalized Anxiety Disorder-7 (GAD-7):
- Minimal change: 1-2 points
- Clinically significant: 4-5 points
- Reliable change: 4.15 points

State-Trait Anxiety Inventory:
- Small change: 4-6 points
- Moderate change: 10-15 points
- Large change: >20 points

Quality of Life

SF-36 Health Survey:
- Physical Component: 3-5 points
- Mental Component: 3-5 points
- Domain scores: 5-10 points

WHO Quality of Life (WHOQOL):
- Minimal important difference: 4-6 points
- Moderate change: 10-15 points

Medicine and Healthcare

Cardiovascular Outcomes

Blood Pressure:
- Clinically meaningful: 5 mmHg systolic, 3 mmHg diastolic
- Substantial benefit: 10 mmHg systolic reduction
- Population impact: 2 mmHg systolic reduction

Cholesterol:
- LDL reduction: 30-40 mg/dL clinically significant
- HDL increase: 5-10 mg/dL meaningful
- Total cholesterol: 20-30 mg/dL reduction

Pain Assessment

Visual Analog Scale (0-100):
- Minimal change: 10-13 points
- Moderate change: 20-30 points
- Substantial change: >30 points

Numeric Rating Scale (0-10):
- Minimal change: 1 point
- Moderate change: 2 points
- Substantial change: 3+ points

Functional Outcomes

6-Minute Walk Test:
- Minimal important difference: 25-35 meters
- Clinically meaningful: 50+ meters

Activities of Daily Living:
- Barthel Index: 1.85-point change
- Functional Independence Measure: 22-point change

Education

Academic Achievement

Standardized Test Scores:
- Small effect: 0.1-0.2 SD improvement
- Educationally significant: 0.25 SD
- Large educational impact: 0.4+ SD

Grade Point Average:
- Minimal change: 0.1-0.2 points
- Meaningful change: 0.3-0.5 points
- Substantial change: 0.5+ points

Business and Economics

Customer Satisfaction

5-point Likert Scale:
- Minimal change: 0.2-0.3 points
- Meaningful change: 0.5 points
- Substantial change: 1.0+ points

10-point Scale:
- Minimal change: 0.5 points
- Meaningful change: 1.0 points
- Substantial change: 2.0+ points

Using DataStatPro for Effect Size Analysis

Accessing Effect Size Tools

  1. Navigate to Effect Size Calculator

    • Go to CalculatorsEffect Sizes
    • Select appropriate effect size measure
    • Input your data or summary statistics
  2. Available Calculators

    - Cohen's d and Hedges' g
    - Correlation effect sizes (r, r²)
    - Odds ratios and risk ratios
    - NNT and NNH calculators
    - ANOVA effect sizes (η², ω²)
    - Confidence intervals for all measures
    

Step-by-Step: Cohen's d Calculation

  1. Input Data

    Group 1 (Treatment):
    - Mean: 85.2
    - Standard Deviation: 12.4
    - Sample Size: 45
    
    Group 2 (Control):
    - Mean: 78.6
    - Standard Deviation: 11.8
    - Sample Size: 42
    
  2. DataStatPro Calculation

    Results:
    - Cohen's d = 0.55
    - 95% CI: [0.12, 0.98]
    - Hedges' g = 0.54
    - Interpretation: Medium effect size
    
  3. Clinical Interpretation

    The treatment group scored 0.55 standard deviations higher 
    than the control group, representing a medium effect size. 
    This suggests a clinically meaningful improvement.
    

Confidence Intervals for Effect Sizes

Importance of CIs

Confidence intervals provide:
- Precision of effect size estimate
- Range of plausible values
- Statistical significance information
- Clinical significance assessment

Interpretation Examples

Cohen's d = 0.45, 95% CI [0.15, 0.75]:
- Point estimate suggests medium effect
- Lower bound indicates at least small effect
- Upper bound suggests potentially large effect
- Clinically meaningful range

Cohen's d = 0.25, 95% CI [-0.05, 0.55]:
- Point estimate suggests small effect
- CI includes zero (not statistically significant)
- Upper bound suggests potential medium effect
- Clinical significance uncertain

Practical Decision-Making Framework

Step 1: Calculate Effect Size

Use appropriate measure:
- Continuous outcomes: Cohen's d, Hedges' g
- Categorical outcomes: OR, RR, NNT
- Correlational: r, r²
- Multiple groups: η², ω²

Step 2: Assess Statistical Significance

Consider:
- P-value and confidence intervals
- Sample size adequacy
- Power analysis results
- Multiple comparison adjustments

Step 3: Evaluate Clinical Significance

Compare to:
- Established minimal important differences
- Clinical practice guidelines
- Previous research benchmarks
- Expert consensus thresholds

Step 4: Consider Context

Factors to consider:
- Cost of intervention
- Risk-benefit profile
- Patient preferences
- Alternative treatments
- Population characteristics

Step 5: Make Recommendation

Decision matrix:
- High effect + Low cost = Strong recommendation
- Moderate effect + Moderate cost = Conditional recommendation
- Small effect + High cost = Against recommendation
- Uncertain effect = More research needed

Real-World Example: Antidepressant Trial

Study Context

Study: New antidepressant vs. placebo
Outcome: Hamilton Depression Rating Scale (HAM-D)
Sample: 200 participants (100 per group)
Study duration: 8 weeks

Results

Treatment group:
- Baseline HAM-D: 22.4 ± 4.2
- 8-week HAM-D: 12.8 ± 6.1
- Change: -9.6 ± 5.8

Placebo group:
- Baseline HAM-D: 22.1 ± 4.0
- 8-week HAM-D: 17.2 ± 5.9
- Change: -4.9 ± 4.7

Effect Size Calculations

  1. Cohen's d for Change Scores

    d = (-9.6 - (-4.9)) / √((5.8² + 4.7²)/2)
    d = -4.7 / 5.28
    d = 0.89 (large effect)
    95% CI: [0.60, 1.18]
    
  2. Clinical Significance Assessment

    HAM-D change of 4.7 points:
    - Exceeds minimal important difference (3 points)
    - Approaches response criterion (50% reduction)
    - Clinically meaningful improvement
    
  3. Response Rates

    Response (≥50% reduction):
    - Treatment: 65/100 (65%)
    - Placebo: 28/100 (28%)
    - Risk difference: 37%
    - NNT = 1/0.37 = 2.7 ≈ 3
    

Interpretation

"The new antidepressant demonstrated a large effect size 
(Cohen's d = 0.89, 95% CI: 0.60-1.18) compared to placebo. 
The 4.7-point greater improvement in HAM-D scores exceeds 
established thresholds for clinical significance. With an 
NNT of 3, approximately one additional patient responds 
for every 3 patients treated compared to placebo, indicating 
strong clinical utility."

Advanced Considerations

Effect Size in Meta-Analysis

Random Effects Models

Considerations:
- Between-study heterogeneity
- Prediction intervals
- Subgroup analyses
- Publication bias assessment

Forest Plot Interpretation

Key elements:
- Individual study effect sizes
- Confidence intervals
- Overall pooled estimate
- Heterogeneity statistics (I², τ²)
- Prediction interval

Bayesian Effect Sizes

Credible Intervals

Bayesian approach:
- Prior information incorporation
- Posterior distributions
- Credible intervals vs. confidence intervals
- Probability statements about effect sizes

Machine Learning Context

Performance Metrics as Effect Sizes

Classification:
- Area under ROC curve (AUC)
- Cohen's kappa for agreement
- Sensitivity and specificity differences

Regression:
- R² and adjusted R²
- Root mean square error (RMSE)
- Mean absolute error (MAE)

Common Mistakes and How to Avoid Them

Mistake 1: Ignoring Effect Size

Problem: Focusing only on p-values Solution: Always report effect sizes with confidence intervals

Mistake 2: Misinterpreting Cohen's Guidelines

Problem: Applying generic thresholds to all contexts Solution: Use domain-specific clinical significance thresholds

Mistake 3: Confusing Correlation and Effect Size

Problem: Using r² as a measure of treatment effect Solution: Use appropriate standardized mean differences for interventions

Mistake 4: Not Considering Confidence Intervals

Problem: Reporting point estimates only Solution: Always include confidence intervals for precision

Mistake 5: Inappropriate Effect Size Measure

Problem: Using Cohen's d for non-normal data Solution: Consider non-parametric alternatives or data transformation

Communicating Effect Sizes to Different Audiences

For Clinicians

"The treatment reduced symptoms by an average of 15 points 
on the 100-point scale (Cohen's d = 0.75), which exceeds 
the 10-point threshold considered clinically meaningful. 
For every 4 patients treated, one additional patient will 
show significant improvement compared to standard care."

For Patients

"This treatment helps about 3 out of 4 people who try it. 
On average, people feel about 30% better compared to those 
who don't receive the treatment. The improvement is 
noticeable and meaningful for daily activities."

For Policymakers

"The intervention demonstrates a large effect size (d = 0.8) 
with an NNT of 3, indicating high clinical efficiency. 
Cost-effectiveness analysis suggests $X per quality-adjusted 
life year, falling within acceptable thresholds for 
public health implementation."

For Researchers

"The standardized mean difference was 0.65 (95% CI: 0.42-0.88), 
indicating a medium to large effect. The effect size is 
consistent with previous meta-analyses (pooled d = 0.58, 
95% CI: 0.45-0.71) and exceeds the minimal important 
difference established in validation studies."

Troubleshooting Common Issues

Problem: Very Large Sample, Small Effect

Situation: n = 10,000, p < 0.001, but d = 0.1 Interpretation: Statistically significant but clinically trivial Action: Focus on confidence intervals and clinical significance

Problem: Small Sample, Large Effect

Situation: n = 20, p = 0.08, but d = 0.8 Interpretation: Potentially important but underpowered Action: Report effect size with wide confidence intervals, suggest replication

Problem: Inconsistent Effect Sizes

Situation: Multiple outcomes show different effect magnitudes Interpretation: Treatment may have selective effects Action: Report all effects, discuss pattern of results

Problem: Negative Results

Situation: No statistically significant differences found Interpretation: May still have clinical implications Action: Report effect sizes and confidence intervals, discuss equivalence

Frequently Asked Questions

Q: Can effect sizes be negative?

A: Yes, negative effect sizes indicate the direction of the effect. For Cohen's d, negative values mean the first group scored lower than the second group.

Q: Should I use Cohen's d or Hedges' g?

A: Use Hedges' g for small samples (n < 20 per group) or meta-analyses. Cohen's d is fine for larger samples.

Q: How do I interpret overlapping confidence intervals?

A: Overlapping CIs don't necessarily mean no significant difference. The difference between groups has its own CI that should be examined.

Q: What if my effect size is between Cohen's thresholds?

A: Cohen's guidelines are rough benchmarks. Focus on clinical significance thresholds specific to your domain.

Q: Can I calculate effect sizes for non-significant results?

A: Yes, and you should. Effect sizes provide valuable information about magnitude regardless of statistical significance.

Related Tutorials

Next Steps

After mastering effect size interpretation, consider exploring:


This tutorial is part of DataStatPro's comprehensive statistical analysis guide. For more advanced techniques and personalized support, explore our Pro features.