Knowledge Base / Meta-Analysis Advanced Analysis 48 min read

Meta-Analysis

Comprehensive reference guide for meta-analysis and systematic review methods.

Meta-Analysis: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of meta-analysis all the way through advanced effect size computation, heterogeneity analysis, publication bias assessment, and practical usage within the DataStatPro application. Whether you are a complete beginner or an experienced researcher, this guide is structured to build your understanding step by step.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is Meta-Analysis?
  3. The Mathematics Behind Meta-Analysis
  4. Effect Size Measures
  5. Meta-Analysis Based on Original Measures
  6. Fixed-Effect vs. Random-Effects Models
  7. Assumptions of Meta-Analysis
  8. Heterogeneity
  9. Forest Plots
  10. Publication Bias
  11. Moderator Analysis: Subgroup Analysis and Meta-Regression
  12. Sensitivity Analysis
  13. Using the Meta-Analysis Component
  14. Computational and Formula Details
  15. Worked Examples
  16. Common Mistakes and How to Avoid Them
  17. Troubleshooting
  18. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into meta-analysis, it is helpful to be familiar with the following foundational concepts. Do not worry if you are not — each concept is briefly explained here.

1.1 The Concept of a Study Effect

In any individual research study, the effect is the quantitative result of interest — for example, the difference in mean blood pressure between a treatment group and a control group, or the correlation between study hours and exam scores. Meta-analysis combines these effects across multiple studies.

1.2 Variance and Standard Error

Variance (σ2\sigma^2) measures the spread of a distribution. The standard error (SE) is the standard deviation of a sampling distribution (i.e., how much an estimate, such as a mean, is expected to vary across samples):

SE=σnSE = \frac{\sigma}{\sqrt{n}}

In meta-analysis, each study has its own effect estimate with its own standard error. Studies with larger sample sizes produce smaller standard errors (more precise estimates).

1.3 Confidence Intervals

A 95% confidence interval (CI) provides a range of plausible values for the true population parameter. For a normally distributed estimator:

CI=θ^±zα/2×SE(θ^)\text{CI} = \hat{\theta} \pm z_{\alpha/2} \times SE(\hat{\theta})

Where z0.025=1.96z_{0.025} = 1.96 for a 95% CI. In meta-analysis, every study's effect estimate is paired with a confidence interval.

1.4 Weighted Averages

A weighted average gives more influence to certain values based on their importance (weight):

Xˉw=i=1kwiXii=1kwi\bar{X}_w = \frac{\sum_{i=1}^k w_i X_i}{\sum_{i=1}^k w_i}

In meta-analysis, studies with greater precision (smaller variance) receive larger weights, so that more reliable studies contribute more to the pooled estimate.

1.5 The Normal (Z) and Chi-Squared Distributions

The standard normal distribution N(0,1)\mathcal{N}(0,1) underpins significance testing of pooled effects. The chi-squared distribution χ2\chi^2 is used for testing heterogeneity across studies. Both are used extensively in meta-analytic calculations.


2. What is Meta-Analysis?

Meta-analysis is a quantitative statistical technique that combines the results of multiple independent studies addressing the same research question to produce a single, more precise and reliable estimate of the true effect.

2.1 The Systematic Review Context

Meta-analysis is almost always conducted as part of a systematic review — a rigorous, structured process of:

  1. Formulating a precise research question.
  2. Systematically searching the literature.
  3. Screening studies against inclusion/exclusion criteria.
  4. Extracting data from eligible studies.
  5. Assessing study quality and risk of bias.
  6. Synthesising results statistically (the meta-analysis).

Meta-analysis is the statistical step within a systematic review. Not every systematic review includes a meta-analysis (if studies are too heterogeneous, a narrative synthesis may be preferred), but every meta-analysis should be embedded within a systematic review.

2.2 Why Conduct a Meta-Analysis?

Individual studies are often limited by:

Meta-analysis addresses these issues by:

2.3 Real-World Applications

Meta-analysis is one of the most influential tools in evidence-based practice. Common applications include:

2.4 The Two Broad Approaches

Meta-analysis can be conducted using two broad data inputs:

ApproachDescriptionWhen Used
Effect-Size-Based Meta-AnalysisStudies are combined using standardised or unstandardised effect sizes computed from summary statisticsMost common; used when raw data are unavailable
Original-Measures-Based Meta-AnalysisStudies are combined using the raw or summary measures directly (e.g., raw mean differences, proportions, counts) without standardisationUsed when studies share the same original measurement scale

Both approaches are covered in depth in this tutorial.


3. The Mathematics Behind Meta-Analysis

3.1 The Basic Meta-Analytic Model

Let kk be the number of studies included in the meta-analysis. Each study ii (i=1,2,,ki = 1, 2, \dots, k) provides:

The fundamental assumption is that each observed effect estimate is drawn from a distribution centred on a true effect:

θ^i=θi+ϵi,ϵiN(0,vi)\hat{\theta}_i = \theta_i + \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(0, v_i)

Where ϵi\epsilon_i is sampling error for study ii. What θi\theta_i represents depends on the model chosen (fixed-effect or random-effects — see Section 6).

3.2 The Pooled Effect Estimate

The pooled (combined) effect estimate is a weighted average of the individual study effects:

θ^pooled=i=1kwiθ^ii=1kwi\hat{\theta}_{\text{pooled}} = \frac{\sum_{i=1}^k w_i \hat{\theta}_i}{\sum_{i=1}^k w_i}

Where wiw_i is the weight assigned to study ii. The exact form of wiw_i differs between the fixed-effect and random-effects models (see Section 6).

3.3 Variance and Standard Error of the Pooled Estimate

The variance of the pooled estimate is:

Var(θ^pooled)=1i=1kwi\text{Var}(\hat{\theta}_{\text{pooled}}) = \frac{1}{\sum_{i=1}^k w_i}

The standard error of the pooled estimate is:

SE(θ^pooled)=1i=1kwiSE(\hat{\theta}_{\text{pooled}}) = \sqrt{\frac{1}{\sum_{i=1}^k w_i}}

3.4 Confidence Interval for the Pooled Effect

A (1α)×100%(1-\alpha) \times 100\% confidence interval for the pooled effect is:

[θ^pooledzα/2×SE(θ^pooled),θ^pooled+zα/2×SE(θ^pooled)]\left[\hat{\theta}_{\text{pooled}} - z_{\alpha/2} \times SE(\hat{\theta}_{\text{pooled}}), \quad \hat{\theta}_{\text{pooled}} + z_{\alpha/2} \times SE(\hat{\theta}_{\text{pooled}})\right]

For a 95% CI, z0.025=1.96z_{0.025} = 1.96.

3.5 Z-Test for the Pooled Effect

To test the null hypothesis H0:θ=0H_0: \theta = 0 (no overall effect), the test statistic is:

Z=θ^pooledSE(θ^pooled)Z = \frac{\hat{\theta}_{\text{pooled}}}{SE(\hat{\theta}_{\text{pooled}})}

Under H0H_0, ZZ follows a standard normal distribution N(0,1)\mathcal{N}(0,1). The two-sided p-value is:

p-value=2×(1Φ(Z))p\text{-value} = 2 \times (1 - \Phi(|Z|))


4. Effect Size Measures

An effect size is a standardised, scale-free numerical index of the magnitude of a phenomenon. Meta-analysis based on effect sizes is the most common approach because it allows combining studies that may use different scales to measure the same construct.

4.1 Effect Sizes for Comparing Two Groups (Continuous Outcomes)

4.1.1 Raw (Unstandardised) Mean Difference (MD)

The simplest effect size for two-group comparisons on a continuous outcome measured on the same scale:

MD=XˉTXˉCMD = \bar{X}_T - \bar{X}_C

Where:

Variance of MD:

vMD=SDT2nT+SDC2nCv_{MD} = \frac{SD_T^2}{n_T} + \frac{SD_C^2}{n_C}

Where SDTSD_T, SDCSD_C are the standard deviations and nTn_T, nCn_C are the sample sizes of the treatment and control groups.

💡 Use MD when all studies measure the outcome on the same scale (e.g., all report blood pressure in mmHg). If studies use different scales, use a standardised effect size instead.

4.1.2 Cohen's d (Standardised Mean Difference)

Cohen's d standardises the mean difference by dividing by a pooled standard deviation, making it scale-free and comparable across studies using different measurement instruments:

d=XˉTXˉCSDpooledd = \frac{\bar{X}_T - \bar{X}_C}{SD_{\text{pooled}}}

Where the pooled standard deviation is:

SDpooled=(nT1)SDT2+(nC1)SDC2nT+nC2SD_{\text{pooled}} = \sqrt{\frac{(n_T - 1)SD_T^2 + (n_C - 1)SD_C^2}{n_T + n_C - 2}}

Variance of Cohen's d:

vd=nT+nCnT×nC+d22(nT+nC2)v_d = \frac{n_T + n_C}{n_T \times n_C} + \frac{d^2}{2(n_T + n_C - 2)}

Interpretation of Cohen's d:

d\|d\| ValueConventional Interpretation
0.000.200.00 - 0.20Negligible effect
0.200.500.20 - 0.50Small effect
0.500.800.50 - 0.80Medium effect
>0.80> 0.80Large effect

⚠️ Cohen's conventional benchmarks are rough guidelines. "Small," "medium," and "large" are context-dependent — always interpret effect sizes in the context of the domain.

4.1.3 Hedges' g (Bias-Corrected Standardised Mean Difference)

Cohen's dd is slightly positively biased (overestimates the true effect) in small samples. Hedges' g applies a correction factor JJ to remove this bias:

g=J×dg = J \times d

Where the correction factor is:

J=134(nT+nC2)1J = 1 - \frac{3}{4(n_T + n_C - 2) - 1}

For large samples, J1J \approx 1 and gdg \approx d. For small samples, the correction matters.

Variance of Hedges' g:

vg=J2×vdv_g = J^2 \times v_d

💡 Hedges' g is generally preferred over Cohen's d in meta-analysis due to its reduced bias in small samples.

4.1.4 Glass's Delta (Δ\Delta)

Glass's Delta standardises the mean difference using only the control group standard deviation, rather than the pooled SD:

Δ=XˉTXˉCSDC\Delta = \frac{\bar{X}_T - \bar{X}_C}{SD_C}

💡 Use Glass's Delta when the treatment may have affected the variability of scores (i.e., the SD of the treatment group may be inflated or deflated by the intervention). Using the control group SD provides a cleaner baseline.

4.2 Effect Sizes for Binary Outcomes

When outcomes are binary (event/non-event), effect sizes are based on 2×22 \times 2 contingency tables:

Event (Y=1Y=1)No Event (Y=0Y=0)Total
TreatedaabbnT=a+bn_T = a + b
ControlccddnC=c+dn_C = c + d

4.2.1 Odds Ratio (OR)

The odds ratio compares the odds of the event in the treatment group to the odds in the control group:

OR=a/bc/d=adbcOR = \frac{a/b}{c/d} = \frac{ad}{bc}

Because the OR is positively skewed, meta-analysis is performed on the natural logarithm of the OR:

ln(OR)=ln(adbc)=ln(a)+ln(d)ln(b)ln(c)\ln(OR) = \ln\left(\frac{ad}{bc}\right) = \ln(a) + \ln(d) - \ln(b) - \ln(c)

Variance of ln(OR)\ln(OR):

vln(OR)=1a+1b+1c+1dv_{\ln(OR)} = \frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}

The pooled ln(OR)\ln(OR) is back-transformed to the OR scale for reporting: ORpooled=eln(OR)pooledOR_{\text{pooled}} = e^{\ln(OR)_{\text{pooled}}}.

Interpretation:

OR ValueInterpretation
OR>1OR > 1Treatment increases the odds of the event
OR=1OR = 1No difference in odds
OR<1OR < 1Treatment decreases the odds of the event

⚠️ The OR can be a misleading measure when the event is common (prevalence > 10%), because it exaggerates the relative risk. In such cases, the Risk Ratio may be preferred.

4.2.2 Risk Ratio (Relative Risk, RR)

The risk ratio compares the probability (risk) of the event in the treatment group to the control group:

RR=a/(a+b)c/(c+d)=a(c+d)c(a+b)RR = \frac{a/(a+b)}{c/(c+d)} = \frac{a \cdot (c+d)}{c \cdot (a+b)}

Meta-analysis is performed on the log RR:

ln(RR)=ln(a/(a+b)c/(c+d))\ln(RR) = \ln\left(\frac{a/(a+b)}{c/(c+d)}\right)

Variance of ln(RR)\ln(RR):

vln(RR)=1a1a+b+1c1c+dv_{\ln(RR)} = \frac{1}{a} - \frac{1}{a+b} + \frac{1}{c} - \frac{1}{c+d}

Interpretation:

RR ValueInterpretation
RR>1RR > 1Treatment increases the risk of the event
RR=1RR = 1No difference in risk
RR<1RR < 1Treatment decreases the risk of the event

4.2.3 Risk Difference (RD)

The risk difference (also called the absolute risk reduction or attributable risk) is the difference in event probabilities between groups:

RD=aa+bcc+dRD = \frac{a}{a+b} - \frac{c}{c+d}

Variance of RD:

vRD=ab(a+b)3+cd(c+d)3v_{RD} = \frac{a \cdot b}{(a+b)^3} + \frac{c \cdot d}{(c+d)^3}

💡 The RD is on an absolute scale (e.g., 0.05 = 5 percentage points), making it clinically interpretable. The number needed to treat (NNT) = 1/|RD|.

4.3 Effect Sizes for Correlation

4.3.1 Pearson's r

When studies report the correlation coefficient rr between two continuous variables, it can be used directly as an effect size.

Interpretation of r|r|:

r\|r\| ValueConventional Interpretation
0.000.100.00 - 0.10Negligible
0.100.300.10 - 0.30Small
0.300.500.30 - 0.50Moderate
>0.50> 0.50Large

4.3.2 Fisher's Z Transformation

Because rr has a bounded range [1,1][-1, 1] and a non-normal sampling distribution (especially when rr is far from zero), meta-analysis is performed on the Fisher's Z-transformed correlation:

zr=12ln(1+r1r)=arctanh(r)z_r = \frac{1}{2} \ln\left(\frac{1+r}{1-r}\right) = \text{arctanh}(r)

Variance of zrz_r (remarkably simple):

vzr=1n3v_{z_r} = \frac{1}{n - 3}

Where nn is the sample size of the study. The pooled zrz_r is back-transformed to rr for reporting:

rpooled=e2z^r1e2z^r+1=tanh(z^r)r_{\text{pooled}} = \frac{e^{2\hat{z}_r} - 1}{e^{2\hat{z}_r} + 1} = \tanh(\hat{z}_r)

4.4 Effect Sizes for Single Proportions

When studies report a single event proportion p=a/np = a/n (e.g., prevalence of a disease):

p=anp = \frac{a}{n}

Variance of pp:

vp=p(1p)nv_p = \frac{p(1-p)}{n}

Because proportions are bounded [0,1][0,1] and can have skewed distributions, transformations are commonly applied:

Logit Transformation:

logit(p)=ln(p1p),vlogit(p)=1np(1p)\text{logit}(p) = \ln\left(\frac{p}{1-p}\right), \quad v_{\text{logit}(p)} = \frac{1}{np(1-p)}

Double Arcsine (Freeman-Tukey) Transformation:

ϕ=arcsin(an+1)+arcsin(a+1n+1)\phi = \arcsin\left(\sqrt{\frac{a}{n+1}}\right) + \arcsin\left(\sqrt{\frac{a+1}{n+1}}\right)

vϕ=1n+0.5v_\phi = \frac{1}{n + 0.5}

💡 The double arcsine transformation is particularly useful when proportions are close to 0 or 1 (rare or very common events), as it stabilises the variance effectively.

4.5 Comparison of Effect Size Measures

Effect SizeTypeScaleUse Case
Raw MDContinuous, 2 groupsOriginal unitsSame measurement scale across studies
Cohen's ddContinuous, 2 groupsStandardisedDifferent scales, not bias-corrected
Hedges' ggContinuous, 2 groupsStandardisedDifferent scales, bias-corrected (preferred)
Glass's Δ\DeltaContinuous, 2 groupsStandardisedTreatment may affect variance
Odds RatioBinary, 2 groupsMultiplicative (odds)Case-control or cohort studies
Risk RatioBinary, 2 groupsMultiplicative (probability)Cohort/RCT studies, common outcomes
Risk DifferenceBinary, 2 groupsAbsoluteAbsolute risk; NNT calculation
Pearson's rr / Fisher's zrz_rCorrelation[1,1][-1, 1]Relationship between two continuous variables
ProportionSingle group[0,1][0, 1]Prevalence, incidence

5. Meta-Analysis Based on Original Measures

When all studies in a meta-analysis measure the same outcome on the same original scale, it is valid — and often preferable — to pool the studies using the original (unstandardised) measures directly, without standardisation.

5.1 Meta-Analysis of Means (Single Group)

When studies report a single-group mean (e.g., mean age, mean biomarker level in a specific population), the effect of interest is the mean μ\mu itself.

Input data per study ii:

Within-study variance:

vi=SDi2niv_i = \frac{SD_i^2}{n_i}

The pooled mean is estimated as the weighted average described in Section 3.2, using weights wi=1/viw_i = 1/v_i.

5.2 Meta-Analysis of Raw Mean Differences (Two Groups)

When studies compare a treatment group to a control group using the same outcome measure, the raw mean difference is directly pooled.

Input data per study ii:

Effect estimate:

MDi=XˉT,iXˉC,iMD_i = \bar{X}_{T,i} - \bar{X}_{C,i}

Within-study variance:

vi=SDT,i2nT,i+SDC,i2nC,iv_i = \frac{SD_{T,i}^2}{n_{T,i}} + \frac{SD_{C,i}^2}{n_{C,i}}

5.3 Meta-Analysis of Proportions

When studies report a single proportion (e.g., the prevalence of hypertension in a population), the goal is to pool the proportions across studies.

Input data per study ii:

Observed proportion:

pi=ainip_i = \frac{a_i}{n_i}

The pooling is typically performed on the logit or double arcsine transformed proportions (see Section 4.4) to improve normality, and the result is back-transformed to the proportion scale.

5.4 Meta-Analysis of Incidence Rates

When studies report event counts and person-time (incidence rate data), the effect measure is the incidence rate:

λi=aiTi\lambda_i = \frac{a_i}{T_i}

Where aia_i is the number of events and TiT_i is the total person-time. Meta-analysis is typically performed on the log incidence rate:

ln(λi)=ln(ai)ln(Ti)\ln(\lambda_i) = \ln(a_i) - \ln(T_i)

Variance of ln(λi)\ln(\lambda_i):

vln(λi)=1aiv_{\ln(\lambda_i)} = \frac{1}{a_i}

5.5 Meta-Analysis of 2×2 Table Data (Raw Counts)

When studies report raw event counts from a 2×22 \times 2 contingency table (events and non-events in two groups), several effect measures can be computed:

InputFrom Table
aia_iEvents in treated group
bib_iNon-events in treated group
cic_iEvents in control group
did_iNon-events in control group

From these, the OR, RR, or RD are computed (see Section 4.2) and pooled. This is the most common input format for meta-analyses of clinical trials with binary outcomes.

💡 When cell counts are zero (e.g., no events in one group), a small continuity correction (typically adding 0.5 to all cells) is applied to allow computation of log-scale measures. This is known as the Haldane-Anscombe correction.


6. Fixed-Effect vs. Random-Effects Models

The choice between the fixed-effect model and the random-effects model is one of the most important decisions in meta-analysis. They differ fundamentally in their assumptions about the nature of the true effect across studies.

6.1 The Fixed-Effect Model

Core Assumption: All studies in the meta-analysis share the same single true effect size θ\theta. Differences between observed study results are due only to sampling error (within-study variability).

θ^i=θ+ϵi,ϵiN(0,vi)\hat{\theta}_i = \theta + \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(0, v_i)

Fixed-Effect Weight:

wiFE=1viw_i^{FE} = \frac{1}{v_i}

Each study is weighted by the inverse of its within-study variance. Larger, more precise studies get more weight.

Pooled Effect (Fixed-Effect):

θ^FE=i=1kwiFEθ^ii=1kwiFE\hat{\theta}_{FE} = \frac{\sum_{i=1}^k w_i^{FE} \hat{\theta}_i}{\sum_{i=1}^k w_i^{FE}}

When to use:

⚠️ The fixed-effect model is theoretically appropriate only when all studies are functionally identical. In most real-world meta-analyses, this assumption is untenable. The random-effects model is almost always more appropriate.

6.2 The Random-Effects Model

Core Assumption: The studies in the meta-analysis are a random sample from a population of studies with varying true effects. Each study has its own true effect θi\theta_i, which is drawn from a distribution of true effects:

θiN(μ,τ2)\theta_i \sim \mathcal{N}(\mu, \tau^2)

Where:

The observed effect in study ii is then:

θ^i=μ+ui+ϵi\hat{\theta}_i = \mu + u_i + \epsilon_i

Where:

Random-Effects Weight (DerSimonian-Laird):

wiRE=1vi+τ^2w_i^{RE} = \frac{1}{v_i + \hat{\tau}^2}

The weights now include both the within-study variance viv_i and the estimated between-study variance τ^2\hat{\tau}^2. This shrinks the weight differences between large and small studies compared to the fixed-effect model.

Pooled Effect (Random-Effects):

μ^RE=i=1kwiREθ^ii=1kwiRE\hat{\mu}_{RE} = \frac{\sum_{i=1}^k w_i^{RE} \hat{\theta}_i}{\sum_{i=1}^k w_i^{RE}}

When to use:

6.3 Methods for Estimating τ2\tau^2

Several estimators for the between-study variance τ2\tau^2 exist:

EstimatorDescriptionNotes
DerSimonian-Laird (DL)Moment-based estimator; the classical and most widely used methodCan underestimate τ2\tau^2; computationally simple
Restricted Maximum Likelihood (REML)Likelihood-based; generally preferred for accuracyIterative; more precise, especially with few studies
Maximum Likelihood (ML)Full likelihood-based; biased downward for τ2\tau^2Less preferred than REML
Hedges (HE)Another moment-based estimatorLess commonly used
Sidik-Jonkman (SJ)Robust estimatorBetter when τ2\tau^2 is large
Paule-Mandel (PM)Iterative moment-basedRecommended by some guidelines

💡 The DataStatPro application implements the DerSimonian-Laird and REML estimators. REML is generally recommended, especially when the number of studies k5k \geq 5.

6.4 The Prediction Interval

In the random-effects model, a prediction interval captures the expected range of the true effect in a new (future) study, accounting for between-study heterogeneity:

μ^RE±tα/2,k2×τ^2+Var(μ^RE)\hat{\mu}_{RE} \pm t_{\alpha/2, k-2} \times \sqrt{\hat{\tau}^2 + \text{Var}(\hat{\mu}_{RE})}

Where tα/2,k2t_{\alpha/2, k-2} is the critical value from the tt-distribution with k2k-2 degrees of freedom.

The prediction interval is wider than the confidence interval and reflects the true variability of effects across settings. It is arguably more informative than the confidence interval for clinical or practical decision-making.

💡 If the 95% prediction interval crosses zero (for a mean difference) or 1 (for an OR/RR), this indicates that in some settings, the true effect may be negligible or even reversed — even if the pooled effect is statistically significant.

6.5 Comparison: Fixed-Effect vs. Random-Effects

FeatureFixed-EffectRandom-Effects
Assumed true effectsOne common θ\thetaDistribution N(μ,τ2)\mathcal{N}(\mu, \tau^2)
Source of variabilitySampling error onlySampling error + between-study variance
Weights1/vi1/v_i1/(vi+τ^2)1/(v_i + \hat{\tau}^2)
Weight differencesLarge (small studies discounted heavily)Smaller (more balanced weights)
Pooled result representsEffect in these studiesAverage effect across a population of studies
Confidence intervalNarrowerWider (more honest)
Prediction intervalNot applicableAvailable and recommended
Sensitivity to heterogeneityHigh (ignores it)Accounts for it
Typical recommendationRarely appropriateAlmost always preferred

7. Assumptions of Meta-Analysis

7.1 Independence of Studies

Each study in the meta-analysis must represent an independent sample. Including multiple effect sizes from the same participants (without accounting for dependence) inflates precision.

⚠️ If a study reports multiple relevant outcomes or time points, use multilevel meta-analysis or select one primary outcome to avoid violating independence.

7.2 Consistent Effect Size Metric

All included studies must report (or allow computation of) the same effect size metric (e.g., all report Hedges' gg, not a mix of gg and rr without conversion).

7.3 Unbiased Study Results (No Selective Reporting)

Meta-analysis assumes that the available study results are representative of all studies conducted. Systematic publication bias or outcome reporting bias violates this assumption and can lead to inflated pooled effects (see Section 10).

7.4 Accurate Extraction of Study Data

Effect sizes and their variances must be correctly extracted or computed from primary studies. Errors in data extraction propagate directly into the pooled estimate.

7.5 Sufficient Overlap in Study Characteristics (Conceptual Homogeneity)

While statistical heterogeneity is expected and modelled in the random-effects model, studies should be conceptually similar enough that combining them is meaningful. Pooling studies of entirely different populations or interventions under one estimate is referred to as the "apples and oranges" problem.

7.6 Normality of Effect Size Distribution

The statistical methods assume that the distribution of effect sizes (after any required transformation) is approximately normal. This is generally satisfied when within-study sample sizes are reasonable. For very small within-study samples, transformations (log, logit, Fisher's Z) are used to improve normality.


8. Heterogeneity

Heterogeneity refers to variability in the true effects across studies, beyond what would be expected from sampling error alone. Assessing and understanding heterogeneity is central to any meta-analysis.

8.1 Cochran's Q Test

Cochran's Q tests the null hypothesis H0:θ1=θ2==θkH_0: \theta_1 = \theta_2 = \dots = \theta_k (all true effects are identical):

Q=i=1kwiFE(θ^iθ^FE)2Q = \sum_{i=1}^k w_i^{FE}(\hat{\theta}_i - \hat{\theta}_{FE})^2

Under H0H_0, QQ follows a chi-squared distribution with k1k-1 degrees of freedom:

Qχk12Q \sim \chi^2_{k-1}

A statistically significant QQ (p<0.05p < 0.05) indicates the presence of heterogeneity. However, the Q test has low statistical power (especially with few studies) and high power (detecting trivial heterogeneity) with many studies, so it should not be used in isolation.

8.2 I2I^2 Statistic

I2I^2 quantifies the proportion of total variability in effect sizes that is due to between-study heterogeneity (as opposed to sampling error):

I2=max(0,Q(k1)Q)×100%I^2 = \max\left(0, \frac{Q - (k-1)}{Q}\right) \times 100\%

I2I^2 ranges from 0% to 100%:

I2I^2 ValueConventional Interpretation
0%25%0\% - 25\%Low heterogeneity
25%50%25\% - 50\%Moderate heterogeneity
50%75%50\% - 75\%Substantial heterogeneity
75%100%75\% - 100\%Considerable heterogeneity

⚠️ I2I^2 is a relative measure and should be interpreted alongside τ2\tau^2 and the prediction interval. A large I2I^2 with a small τ2\tau^2 in absolute terms may still imply a practically negligible spread of true effects.

8.3 τ2\tau^2 (Between-Study Variance)

τ2\tau^2 is the estimated variance of the distribution of true effects in the random-effects model. Unlike I2I^2, it is on the same scale as the squared effect size and thus conveys the absolute magnitude of heterogeneity.

The DerSimonian-Laird estimator of τ2\tau^2 is:

τ^DL2=max(0,Q(k1)C)\hat{\tau}^2_{DL} = \max\left(0, \frac{Q - (k-1)}{C}\right)

Where:

C=i=1kwiFEi=1k(wiFE)2i=1kwiFEC = \sum_{i=1}^k w_i^{FE} - \frac{\sum_{i=1}^k (w_i^{FE})^2}{\sum_{i=1}^k w_i^{FE}}

τ\tau (tau, the square root of τ2\tau^2) is the standard deviation of the distribution of true effects and is reported in the same units as the effect size.

8.4 H2H^2 Statistic

H2=Qk1H^2 = \frac{Q}{k-1}

H2=1H^2 = 1 means no heterogeneity (all variability is sampling error). H2>1H^2 > 1 indicates excess heterogeneity. It is related to I2I^2 by: I2=(H21)/H2I^2 = (H^2 - 1)/H^2.

8.5 Confidence Intervals for τ2\tau^2 and I2I^2

τ2\tau^2 and I2I^2 are estimates with their own uncertainty. Confidence intervals for these quantities (e.g., using the Q-profile method or the Biggerstaff-Jackson method) should always be reported alongside the point estimates, particularly with few studies.

8.6 Interpreting Heterogeneity: A Framework

When heterogeneity is detected, the analyst should:

  1. Report the Q statistic, p-value, I2I^2, τ2\tau^2, and its 95% CI.
  2. Report the prediction interval to show the plausible range of true effects.
  3. Explore sources of heterogeneity via subgroup analysis or meta-regression (Section 11).
  4. Consider whether pooling is still appropriate or whether separate analyses by subgroup are needed.

9. Forest Plots

The forest plot is the canonical visual summary of a meta-analysis. It displays the effect size and confidence interval from each individual study, along with the pooled estimate, in a single graphic.

9.1 Anatomy of a Forest Plot

A standard forest plot contains the following elements:

ElementDescription
Study labelsNames or identifiers of individual studies (leftmost column)
Data summaryEffect size and/or sample size for each study (may be tabulated)
Horizontal lines95% confidence interval for each study's effect estimate
Square/BoxPoint estimate for each study; the area of the square is proportional to the study's weight
DiamondThe pooled effect estimate; the width of the diamond represents its 95% CI
Vertical line at nullThe line of no effect (θ=0\theta = 0 for MD/RD; θ=1\theta = 1 for OR/RR on log scale)
Heterogeneity statisticsQQ, pp, I2I^2, τ2\tau^2 reported below the plot
Overall testZZ-statistic and pp-value for the pooled effect

9.2 Reading a Forest Plot

9.3 Separate Forest Plots for Fixed-Effect and Random-Effects

The DataStatPro application generates two forest plots — one for the fixed-effect model and one for the random-effects model. Comparing them is instructive:


10. Publication Bias

Publication bias is the tendency for studies with statistically significant or large effects to be published more readily than studies with non-significant or small effects. If the meta-analyst only has access to published studies, the pooled effect will be overestimated.

10.1 The Funnel Plot

The funnel plot is the primary graphical tool for assessing publication bias. It plots each study's effect size (x-axis) against a measure of its precision (y-axis, typically the standard error — note: inverted so that more precise studies appear at the top).

Expected appearance under no publication bias:

Signs of publication bias:

⚠️ Funnel plot asymmetry can result from publication bias but also from other causes: small-study effects (small studies genuinely showing larger effects), heterogeneity, artefacts of particular effect size measures, or chance. Interpret with caution and use formal tests.

10.2 Egger's Test

Egger's test is a formal statistical test for funnel plot asymmetry based on a weighted linear regression of the standardised effect (θ^i/SEi\hat{\theta}_i / SE_i) on precision (1/SEi1/SE_i):

θ^iSEi=a+b×1SEi+ϵi\frac{\hat{\theta}_i}{SE_i} = a + b \times \frac{1}{SE_i} + \epsilon_i

10.3 Begg's Test (Rank Correlation Test)

Begg's test examines whether there is a rank correlation between the standardised effect sizes and their variances. It uses Kendall's τ\tau to test for correlation:

H0:rank(θ^i) is uncorrelated with rank(vi)H_0: \text{rank}(\hat{\theta}_i) \text{ is uncorrelated with } \text{rank}(v_i)

A significant τ\tau suggests asymmetry. Begg's test generally has lower power than Egger's test.

10.4 Trim and Fill Method

The trim and fill method (Duval & Tweedie) is a non-parametric procedure that:

  1. Trims the asymmetric studies (assumed to be the more extreme studies on one side).
  2. Re-estimates the centre of the funnel.
  3. Fills in the missing (unpublished) mirror-image studies.
  4. Produces an adjusted pooled estimate that accounts for publication bias.

The adjusted estimate represents what the pooled effect would be if the funnel plot were symmetric.

⚠️ The trim and fill method assumes that asymmetry is caused solely by publication bias. If heterogeneity or other factors cause asymmetry, the adjusted estimate may be misleading. It should be treated as a sensitivity analysis, not the primary result.

10.5 Fail-Safe N (Rosenthal's Method)

Fail-safe N (NfsN_{fs}) estimates the number of unpublished null-result studies that would be needed to reduce the pooled effect to non-significance:

Nfs=(zizα)2kN_{fs} = \left(\frac{\sum z_i}{z_{\alpha}}\right)^2 - k

Where zi\sum z_i is the sum of the zz-statistics from all included studies, zα=1.645z_{\alpha} = 1.645 (one-tailed at 5%), and kk is the number of included studies.

A large NfsN_{fs} relative to the number of included studies suggests the results are robust to publication bias.

⚠️ Fail-safe N has been criticised as it does not consider the quality or magnitude of missing studies — only their number. It should be supplemented with other methods.


11. Moderator Analysis: Subgroup Analysis and Meta-Regression

When heterogeneity is detected, the natural next step is to explain it by identifying study-level variables (moderators) that systematically relate to the effect size.

11.1 Subgroup Analysis

Subgroup analysis divides studies into groups based on a categorical moderator (e.g., type of intervention, country income level, participant age group) and estimates a separate pooled effect for each subgroup.

Procedure:

  1. Define subgroups based on a theoretically motivated moderator.
  2. Run separate meta-analyses within each subgroup.
  3. Test for between-subgroup heterogeneity using a Q-test for subgroup differences:

QB=g=1Gwg(θ^gθ^all)2Q_B = \sum_{g=1}^G w_g^* \left(\hat{\theta}_g - \hat{\theta}_{\text{all}}\right)^2

Where θ^g\hat{\theta}_g is the pooled effect for subgroup gg and wg=1/Var(θ^g)w_g^* = 1/\text{Var}(\hat{\theta}_g).

Under H0H_0 (no subgroup differences), QBχG12Q_B \sim \chi^2_{G-1} where GG is the number of subgroups.

⚠️ Subgroup analyses are subject to multiple testing inflation and should be pre-specified (not data-driven post hoc). Treat unexpected subgroup findings as exploratory and hypothesis-generating, not confirmatory.

11.2 Meta-Regression

Meta-regression models the effect size as a function of one or more continuous or categorical study-level covariates (moderators):

θ^i=β0+β1Z1i+β2Z2i++βpZpi+ui+ϵi\hat{\theta}_i = \beta_0 + \beta_1 Z_{1i} + \beta_2 Z_{2i} + \dots + \beta_p Z_{pi} + u_i + \epsilon_i

Where:

Weighted Least Squares Estimation: Meta-regression is estimated using weighted least squares with weights wi=1/(vi+τ^residual2)w_i = 1/(v_i + \hat{\tau}^2_{\text{residual}}).

Testing a Moderator: The significance of moderator ZjZ_j is tested using the Wald statistic:

zj=β^jSE(β^j)z_j = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)}

A significant zjz_j indicates that ZjZ_j explains a portion of the heterogeneity.

R2R^2 Analogue (Proportion of Heterogeneity Explained):

Rmeta2=τ^null2τ^model2τ^null2R^2_{\text{meta}} = \frac{\hat{\tau}^2_{\text{null}} - \hat{\tau}^2_{\text{model}}}{\hat{\tau}^2_{\text{null}}}

This estimates the proportion of between-study variance (τ2\tau^2) explained by the moderator(s).

⚠️ Meta-regression requires a sufficient number of studies (a common guideline is at least 10 studies per moderator). With few studies, the regression will be underpowered and potentially spurious.


12. Sensitivity Analysis

Sensitivity analysis examines the robustness of the pooled results to the methodological choices made in the meta-analysis.

12.1 Leave-One-Out Analysis

The leave-one-out (also called "one-study-removed") analysis re-runs the meta-analysis kk times, each time excluding one study. If removing any single study dramatically changes the pooled estimate, that study is influential and warrants investigation.

Interpretation:

12.2 Influence Statistics

For each study ii, the following influence diagnostics can be computed:

StatisticDescription
Standardised residualHow far the study's effect is from the pooled effect, in SE units
Cook's distanceOverall influence on the vector of pooled estimates
DFFITSChange in the fitted value when study ii is excluded
Covariance ratioChange in the precision of the pooled estimate
Hat valueLeverage of the study on the pooled estimate

💡 Influential studies should not be automatically excluded — they should be examined for data quality, unique population characteristics, or methodological anomalies. Exclusion decisions should be pre-specified or transparently justified.

12.3 Sensitivity to Model Choice

It is good practice to compare the pooled effect and heterogeneity estimates under:

Consistent results across these analyses strengthen confidence in the conclusions.


13. Using the Meta-Analysis Component

The Meta-Analysis component in the DataStatPro application provides a full end-to-end workflow for performing meta-analysis on your datasets.

Step-by-Step Guide

Step 1 — Select Dataset Choose the dataset you want to analyse from the "Dataset" dropdown. The dataset should contain one row per study, with columns for the relevant study-level statistics.

Step 2 — Select Analysis Type Choose the type of meta-analysis:

Step 3 — Select Input Variables Depending on the analysis type, map the relevant dataset columns:

Analysis TypeRequired Columns
Two-group continuousXˉT\bar{X}_T, SDTSD_T, nTn_T, XˉC\bar{X}_C, SDCSD_C, nCn_C
Binary (2×2 table)aa, bb, cc, dd (or events and totals per group)
Correlationrr, nn
Single proportionEvents aa, total nn
Single meanMean, SDSD, nn
Pre-computedEffect size, variance (or SE)

Step 4 — Select Effect Size Measure For continuous two-group outcomes, select the desired effect size:

For binary outcomes, select OR, RR, or RD.

Step 5 — Select Statistical Model Choose between:

If Random-Effects is selected, choose the τ2\tau^2 estimation method (DerSimonian-Laird, REML, etc.).

Step 6 — Select Confidence Level Choose the confidence level for confidence intervals (default: 95%).

Step 7 — Select Display Options Choose which outputs to display:

Step 8 — Configure Moderators (Optional) If performing subgroup analysis, specify the column containing the categorical moderator variable. If performing meta-regression, specify the continuous or categorical covariate columns.

Step 9 — Run the Analysis Click "Run Meta-Analysis". The application will:

  1. Compute per-study effect sizes and variances (or use pre-computed values).
  2. Apply any required transformations (log, logit, Fisher's Z).
  3. Estimate the fixed-effect and/or random-effects pooled estimate.
  4. Estimate τ2\tau^2, I2I^2, QQ, and the prediction interval.
  5. Generate the forest plot and funnel plot.
  6. Run publication bias tests.
  7. Run leave-one-out sensitivity analysis.
  8. Run subgroup analysis or meta-regression if specified.

14. Computational and Formula Details

14.1 Full Step-by-Step Calculation Workflow

For any meta-analysis, the computation proceeds as follows:

Step A: Compute per-study effect sizes θ^i\hat{\theta}_i and variances viv_i Using the appropriate formula from Section 4 or 5.

Step B: Apply variance-stabilising transformations if needed (e.g., ln(OR)\ln(OR), Fisher's zrz_r, logit(pp))

Step C: Compute fixed-effect weights

wiFE=1viw_i^{FE} = \frac{1}{v_i}

Step D: Compute the fixed-effect pooled estimate

θ^FE=wiFEθ^iwiFE,SE(θ^FE)=1wiFE\hat{\theta}_{FE} = \frac{\sum w_i^{FE} \hat{\theta}_i}{\sum w_i^{FE}}, \quad SE(\hat{\theta}_{FE}) = \frac{1}{\sqrt{\sum w_i^{FE}}}

Step E: Compute Cochran's Q and estimate τ2\tau^2

Q=wiFE(θ^iθ^FE)2Q = \sum w_i^{FE}(\hat{\theta}_i - \hat{\theta}_{FE})^2

τ^DL2=max(0,Q(k1)C),C=wiFE(wiFE)2wiFE\hat{\tau}^2_{DL} = \max\left(0, \frac{Q-(k-1)}{C}\right), \quad C = \sum w_i^{FE} - \frac{\sum(w_i^{FE})^2}{\sum w_i^{FE}}

Step F: Compute random-effects weights

wiRE=1vi+τ^2w_i^{RE} = \frac{1}{v_i + \hat{\tau}^2}

Step G: Compute the random-effects pooled estimate

μ^RE=wiREθ^iwiRE,SE(μ^RE)=1wiRE\hat{\mu}_{RE} = \frac{\sum w_i^{RE} \hat{\theta}_i}{\sum w_i^{RE}}, \quad SE(\hat{\mu}_{RE}) = \frac{1}{\sqrt{\sum w_i^{RE}}}

Step H: Compute CIs, Z-test, and prediction interval

CI95%:μ^RE±1.96×SE(μ^RE)\text{CI}_{95\%}: \hat{\mu}_{RE} \pm 1.96 \times SE(\hat{\mu}_{RE})

Z=μ^RESE(μ^RE),p=2(1Φ(Z))Z = \frac{\hat{\mu}_{RE}}{SE(\hat{\mu}_{RE})}, \quad p = 2(1-\Phi(|Z|))

PI95%:μ^RE±t0.025,k2τ^2+SE2(μ^RE)\text{PI}_{95\%}: \hat{\mu}_{RE} \pm t_{0.025, k-2} \sqrt{\hat{\tau}^2 + SE^2(\hat{\mu}_{RE})}

Step I: Back-transform to original scale if needed (e.g., eln(OR)poolede^{\ln(OR)_{\text{pooled}}} for OR; tanh(zr,pooled)\tanh(z_{r,\text{pooled}}) for rr)

Step J: Compute I2I^2 and H2H^2

I2=max(0,Q(k1)Q)×100%,H2=Qk1I^2 = \max\left(0, \frac{Q-(k-1)}{Q}\right) \times 100\%, \quad H^2 = \frac{Q}{k-1}

14.2 Handling Zero Events (Continuity Correction)

When a=0a = 0, b=0b = 0, c=0c = 0, or d=0d = 0 in a 2×22 \times 2 table, log-scale measures (OR, RR) are undefined. The Haldane-Anscombe correction adds 0.5 to all four cells:

a=a+0.5,b=b+0.5,c=c+0.5,d=d+0.5a' = a + 0.5, \quad b' = b + 0.5, \quad c' = c + 0.5, \quad d' = d + 0.5

This is applied only to studies with zero cells. Double-zero studies (where both a=0a = 0 and c=0c = 0, meaning no events in either group) are typically excluded from OR/RR meta-analyses as they carry no information about the relative effect.

14.3 Computing Effect Sizes from Alternative Inputs

Not all studies report means and SDs directly. The following conversions are commonly needed:

From SE to SD: SD=SE×nSD = SE \times \sqrt{n}

From 95% CI to SE: SE=Upper CILower CI2×1.96SE = \frac{\text{Upper CI} - \text{Lower CI}}{2 \times 1.96}

From median and IQR (Wan et al. method, for non-normal distributions): Xˉ^Q1+Median+Q33\hat{\bar{X}} \approx \frac{Q_1 + \text{Median} + Q_3}{3} SD^Q3Q11.35\hat{SD} \approx \frac{Q_3 - Q_1}{1.35}

From t-statistic (two groups): d=tnT+nCnT×nCd = t \sqrt{\frac{n_T + n_C}{n_T \times n_C}}


15. Worked Examples

Example 1: Meta-Analysis of Standardised Mean Differences (Hedges' g)

Research Question: Does mindfulness-based stress reduction (MBSR) reduce anxiety (standardised mean difference) compared to a control condition?

Included Studies (hypothetical):

StudynTn_TXˉT\bar{X}_TSDTSD_TnCn_CXˉC\bar{X}_CSDCSD_C
Adams (2018)4512.34.24415.74.8
Brown (2019)3011.03.93114.54.1
Chen (2020)12013.15.011816.25.3
Davis (2021)2210.53.52013.83.7
Evans (2022)7512.84.57415.94.6

Step 1: Compute SDpooledSD_{\text{pooled}}, dd, and JJ (for Hedges' gg) for each study.

For Adams (2018):

SDpooled=(451)(4.2)2+(441)(4.8)245+442=44(17.64)+43(23.04)87=776.16+990.7287=1766.888720.314.507SD_{\text{pooled}} = \sqrt{\frac{(45-1)(4.2)^2 + (44-1)(4.8)^2}{45+44-2}} = \sqrt{\frac{44(17.64) + 43(23.04)}{87}} = \sqrt{\frac{776.16 + 990.72}{87}} = \sqrt{\frac{1766.88}{87}} \approx \sqrt{20.31} \approx 4.507

d=12.315.74.507=3.44.5070.755d = \frac{12.3 - 15.7}{4.507} = \frac{-3.4}{4.507} \approx -0.755

J=134(45+442)1=1334710.008650.991J = 1 - \frac{3}{4(45+44-2)-1} = 1 - \frac{3}{347} \approx 1 - 0.00865 \approx 0.991

g=0.991×(0.755)0.748g = 0.991 \times (-0.755) \approx -0.748

vg=J2×[nT+nCnT×nC+d22(nT+nC2)]=0.9912×[8945×44+0.570174]=0.982×[0.04499+0.00328]=0.982×0.048270.04740v_g = J^2 \times \left[\frac{n_T+n_C}{n_T \times n_C} + \frac{d^2}{2(n_T+n_C-2)}\right] = 0.991^2 \times \left[\frac{89}{45 \times 44} + \frac{0.570}{174}\right] = 0.982 \times [0.04499 + 0.00328] = 0.982 \times 0.04827 \approx 0.04740

(Calculations for Brown, Chen, Davis, and Evans proceed identically.)

Summary of computed effect sizes (illustrative):

Studyggvgv_gwiFE=1/vgw_i^{FE} = 1/v_g
Adams (2018)-0.7480.047421.10
Brown (2019)-0.8760.072213.85
Chen (2020)-0.6040.017856.18
Davis (2021)-0.9130.10089.92
Evans (2022)-0.6810.028335.34
Total136.39

Step 2: Fixed-Effect Pooled Estimate

θ^FE=21.10(0.748)+13.85(0.876)+56.18(0.604)+9.92(0.913)+35.34(0.681)136.39=15.7812.1333.939.0524.07136.39=94.96136.390.696\hat{\theta}_{FE} = \frac{21.10(-0.748) + 13.85(-0.876) + 56.18(-0.604) + 9.92(-0.913) + 35.34(-0.681)}{136.39} = \frac{-15.78 - 12.13 - 33.93 - 9.05 - 24.07}{136.39} = \frac{-94.96}{136.39} \approx -0.696

SEFE=1136.390.0856,95% CI: [0.864,0.528]SE_{FE} = \frac{1}{\sqrt{136.39}} \approx 0.0856, \quad 95\%\text{ CI: } [-0.864, -0.528]

Step 3: Cochran's Q and τ2\tau^2

Q=21.10(0.748(0.696))2+13.85(0.876(0.696))2+56.18(0.604(0.696))2+9.92(0.913(0.696))2+35.34(0.681(0.696))2Q = 21.10(-0.748-(-0.696))^2 + 13.85(-0.876-(-0.696))^2 + 56.18(-0.604-(-0.696))^2 + 9.92(-0.913-(-0.696))^2 + 35.34(-0.681-(-0.696))^2

=21.10(0.002704)+13.85(0.032400)+56.18(0.008464)+9.92(0.047089)+35.34(0.000225)= 21.10(0.002704) + 13.85(0.032400) + 56.18(0.008464) + 9.92(0.047089) + 35.34(0.000225)

0.057+0.449+0.476+0.467+0.008=1.457\approx 0.057 + 0.449 + 0.476 + 0.467 + 0.008 = 1.457

df=k1=4df = k - 1 = 4, p0.835p \approx 0.835No significant heterogeneity (p>0.05p > 0.05).

I2=max(0,(1.4574)/1.457)×100%=max(0,1.744)=0%I^2 = \max(0, (1.457-4)/1.457) \times 100\% = \max(0, -1.744) = 0\%

τ^DL2=max(0,(1.4574)/C)=0(since Q<k1)\hat{\tau}^2_{DL} = \max(0, (1.457-4)/C) = 0 \quad \text{(since } Q < k-1\text{)}

Step 4: Random-Effects Pooled Estimate

Since τ^2=0\hat{\tau}^2 = 0, the random-effects model is identical to the fixed-effect model here:

μ^RE0.696,95% CI: [0.864,0.528]\hat{\mu}_{RE} \approx -0.696, \quad 95\%\text{ CI: } [-0.864, -0.528]

Z=0.6960.08568.13,p<0.001Z = \frac{-0.696}{0.0856} \approx -8.13, \quad p < 0.001

Conclusion: The pooled Hedges' g0.696g \approx -0.696 (95% CI: [0.864,0.528][-0.864, -0.528]), indicating a medium-to-large reduction in anxiety following MBSR compared to control. The effect is highly statistically significant (p<0.001p < 0.001). No significant heterogeneity was detected (I2=0%I^2 = 0\%, Q(4)=1.46Q(4) = 1.46, p=0.83p = 0.83).


Example 2: Meta-Analysis of Odds Ratios (Binary Outcome)

Research Question: Does low-dose aspirin reduce the risk of myocardial infarction (MI)?

Input Data (2×2 tables, hypothetical):

Studyaa (MI, Aspirin)bb (No MI, Aspirin)cc (MI, Control)dd (No MI, Control)
Study 12897245955
Study 21238822378
Study 3551945841916
Study 4819215185

Step 1: Compute ln(OR)\ln(OR) and vln(OR)v_{\ln(OR)} for each study.

Study 1:

ln(OR)=ln(28×955972×45)=ln(2674043740)=ln(0.6113)0.4912\ln(OR) = \ln\left(\frac{28 \times 955}{972 \times 45}\right) = \ln\left(\frac{26740}{43740}\right) = \ln(0.6113) \approx -0.4912

vln(OR)=128+1972+145+1955=0.03571+0.00103+0.02222+0.00105=0.06001v_{\ln(OR)} = \frac{1}{28} + \frac{1}{972} + \frac{1}{45} + \frac{1}{955} = 0.03571 + 0.00103 + 0.02222 + 0.00105 = 0.06001

w1FE=1/0.0600116.66w_1^{FE} = 1/0.06001 \approx 16.66

(Calculations for Studies 2–4 proceed identically.)

Summary (illustrative):

Studyln(OR)\ln(OR)vvwFEw^{FE}OR
Study 1-0.4910.060016.660.612
Study 2-0.6120.11548.670.542
Study 3-0.4250.030233.110.654
Study 4-0.6340.19875.030.530
Total63.47

Step 2: Pooled ln(OR)\ln(OR)

ln(OR)^FE=16.66(0.491)+8.67(0.612)+33.11(0.425)+5.03(0.634)63.47=8.1805.30614.0723.18963.47=30.74763.470.4845\widehat{\ln(OR)}_{FE} = \frac{16.66(-0.491)+8.67(-0.612)+33.11(-0.425)+5.03(-0.634)}{63.47} = \frac{-8.180-5.306-14.072-3.189}{63.47} = \frac{-30.747}{63.47} \approx -0.4845

ORpooled=e0.48450.616OR_{\text{pooled}} = e^{-0.4845} \approx 0.616

SE=1/63.470.1255,95% CI for ln(OR):[0.731,0.238]SE = 1/\sqrt{63.47} \approx 0.1255, \quad 95\%\text{ CI for }\ln(OR): [-0.731, -0.238]

95% CI for OR: [e0.731,e0.238]=[0.481,0.788]\Rightarrow 95\%\text{ CI for OR: } \left[e^{-0.731}, e^{-0.238}\right] = [0.481, 0.788]

Step 3: Test and Heterogeneity

Z=0.48450.1255=3.86,p<0.001Z = \frac{-0.4845}{0.1255} = -3.86, \quad p < 0.001

QQ (not calculated in detail here) is non-significant → low heterogeneity.

Conclusion: The pooled OR 0.616\approx 0.616 (95% CI: [0.481,0.788][0.481, 0.788]), indicating that aspirin reduces the odds of MI by approximately 38% compared to control. The effect is highly statistically significant (p<0.001p < 0.001).


Example 3: Meta-Analysis of Proportions (Single Group)

Research Question: What is the pooled prevalence of depression in university students?

Input Data:

StudyEvents (aa)Total (nn)Proportion (pp)
Study A855000.170
Study B1208000.150
Study C402000.200
Study D603500.171
Study E20012000.167

Step 1: Logit transform each proportion

Study A:

logit(0.170)=ln(0.170/0.830)=ln(0.2048)1.585\text{logit}(0.170) = \ln(0.170/0.830) = \ln(0.2048) \approx -1.585

vlogit=1500×0.170×0.830=170.550.01418,w=70.55v_{\text{logit}} = \frac{1}{500 \times 0.170 \times 0.830} = \frac{1}{70.55} \approx 0.01418, \quad w = 70.55

(Steps for B–E proceed identically.)

Step 2: Pool on logit scale → back-transform

logit(p)^=wilogit(pi)wi1.626\widehat{\text{logit}(p)} = \frac{\sum w_i \cdot \text{logit}(p_i)}{\sum w_i} \approx -1.626

p^=e1.6261+e1.6260.19681.19680.164\hat{p} = \frac{e^{-1.626}}{1 + e^{-1.626}} \approx \frac{0.1968}{1.1968} \approx 0.164

Conclusion: The pooled prevalence of depression is approximately 16.4% (95% CI to be computed from the SE of the logit-scale pooled estimate, back-transformed).


16. Common Mistakes and How to Avoid Them

Mistake 1: Combining Apples and Oranges

Problem: Pooling studies that are conceptually too different (different populations, interventions, or outcomes) into a single meta-analysis.
Solution: Apply strict inclusion/exclusion criteria. Consider whether the research question is specific enough. Use subgroup analysis for conceptually different study types rather than pooling indiscriminately.

Mistake 2: Ignoring Heterogeneity

Problem: Reporting only the pooled effect and ignoring significant heterogeneity (I2>50%I^2 > 50\%), implying a single universal effect when the true effects vary widely.
Solution: Always report QQ, I2I^2, τ2\tau^2, and the prediction interval. Explore heterogeneity via subgroup analysis and meta-regression. Be transparent about what the pooled estimate represents when heterogeneity is high.

Mistake 3: Using Fixed-Effect When Random-Effects Is Appropriate

Problem: Applying the fixed-effect model (which assumes all studies share a single true effect) to heterogeneous studies, producing an overconfident (too-narrow) confidence interval.
Solution: Use the random-effects model as the default unless there is a compelling theoretical reason all studies share exactly the same true effect. Compare the two models as a sensitivity check.

Mistake 4: Misinterpreting the Confidence Interval vs. Prediction Interval

Problem: Interpreting the narrow 95% confidence interval of the pooled effect as the range of effects across all possible settings, ignoring that the true effect varies across contexts.
Solution: Report and interpret the prediction interval. Communicate that the CI describes uncertainty about the average effect, while the PI describes the range of true effects across settings.

Mistake 5: Data Extraction Errors

Problem: Incorrectly recording means, SDs, or cell counts from primary studies — a common and serious source of error in meta-analyses.
Solution: Use dual independent extraction by two reviewers, with discrepancies resolved by consensus. Double-check all calculated effect sizes against reported statistics where possible.

Mistake 6: Misinterpreting the Odds Ratio as a Risk Ratio

Problem: Describing an OR of 2.0 as "the treatment doubles the probability of the event" when in fact it doubles the odds, which only approximates doubling the probability when events are rare.
Solution: Always specify clearly whether the effect measure is an OR or RR. If the OR is used with common outcomes (>10%), note its potential to overestimate the RR and consider reporting both.

Mistake 7: Over-Reliance on Statistical Significance of the Pooled Effect

Problem: Concluding that an intervention is "effective" solely because p<0.05p < 0.05 for the pooled estimate, without considering the magnitude and clinical relevance of the effect.
Solution: Interpret the pooled effect size in context. A statistically significant but tiny effect (e.g., Hedges' g=0.05g = 0.05) may have no practical importance. Always report effect sizes with confidence intervals and discuss clinical/practical significance.

Mistake 8: Ignoring Publication Bias

Problem: Assuming that all relevant studies are captured and that the literature is a representative sample of all research conducted.
Solution: Conduct a comprehensive search (including grey literature, trial registries, and non-English publications). Assess publication bias formally using funnel plot inspection, Egger's test, and Trim and Fill. Report the adjusted estimate from Trim and Fill as a sensitivity analysis.

Mistake 9: Post Hoc Subgroup Analysis Without Correction

Problem: Conducting many unplanned subgroup analyses and reporting only those that are statistically significant, leading to false positives.
Solution: Pre-specify all planned subgroup and moderator analyses before running the meta-analysis. Treat unplanned analyses as exploratory. Apply appropriate corrections for multiple testing if many comparisons are made.

Mistake 10: Confusing Within-Study and Between-Study Variance

Problem: Using viv_i (within-study variance) as the measure of heterogeneity, or conflating SE(μ^)SE(\hat{\mu}) with τ\tau.
Solution: Clearly distinguish viv_i (uncertainty within each study, reduced with larger nn) from τ2\tau^2 (variance of true effects across studies, a property of the study population). The SE of the pooled estimate reflects both.


17. Troubleshooting

IssueLikely CauseSolution
τ^2=0\hat{\tau}^2 = 0 despite apparent scatter in forest plotDL estimator truncated at 0 when Q<k1Q < k-1; or sample sizes are smallUse REML estimator; report 95% CI for τ2\tau^2 using Q-profile method
Very wide prediction intervalHigh τ2\tau^2 (substantial heterogeneity)Report and interpret PI honestly; explore moderators to explain heterogeneity
I2=100%I^2 = 100\%Extreme outlier study; possible data error; genuine massive heterogeneityCheck data extraction for the outlier; run leave-one-out; consider removing the study with justification
Undefined ln(OR)\ln(OR) or ln(RR)\ln(RR)Zero cell counts in 2×22\times2 tableApply Haldane-Anscombe correction (+0.5 to all cells); exclude double-zero studies
Pooled OR very extreme (e.g., >50)Small cells after zero correction; separationCheck for double-zero studies; verify data extraction; consider RD instead of OR
Egger's test significant but funnel plot looks symmetricLow power of Egger's test; chanceExamine funnel plot critically; run Trim and Fill as sensitivity analysis; search for unpublished studies
Trim and Fill adds no studiesNo asymmetry detected (from that direction)Does not rule out publication bias — could exist in other forms (selective outcome reporting)
Only 2–3 studies availableUnderpowered meta-analysis; unreliable estimatesReport results with extreme caution; CIs will be very wide; note the limitation explicitly; do not force a meta-analysis
Meta-regression coefficient is significant but R2=0%R^2 = 0\%Sampling variation in τ2\tau^2 estimation; DL underestimationUse REML; report results cautiously; confirm with sensitivity analysis
Negative τ2\tau^2 estimateQ<k1Q < k-1 (sampling variability); DL estimatorTruncate at 0 (standard practice); report τ^2=0\hat{\tau}^2 = 0

18. Quick Reference Cheat Sheet

Core Formulas

FormulaDescription
θ^pooled=wiθ^iwi\hat{\theta}_{\text{pooled}} = \frac{\sum w_i \hat{\theta}_i}{\sum w_i}Weighted pooled effect
SE(θ^pooled)=1/wiSE(\hat{\theta}_{\text{pooled}}) = 1/\sqrt{\sum w_i}SE of pooled effect
wiFE=1/viw_i^{FE} = 1/v_iFixed-effect weight
wiRE=1/(vi+τ^2)w_i^{RE} = 1/(v_i + \hat{\tau}^2)Random-effects weight
Q=wiFE(θ^iθ^FE)2Q = \sum w_i^{FE}(\hat{\theta}_i - \hat{\theta}_{FE})^2Cochran's Q
I2=max(0,(Q(k1))/Q)×100%I^2 = \max(0, (Q-(k-1))/Q) \times 100\%I2I^2 heterogeneity
τ^DL2=max(0,(Q(k1))/C)\hat{\tau}^2_{DL} = \max(0, (Q-(k-1))/C)DL between-study variance
Z=μ^RE/SE(μ^RE)Z = \hat{\mu}_{RE} / SE(\hat{\mu}_{RE})Pooled effect z-test
PI95%=μ^±t0.025,k2τ^2+SE2(μ^)\text{PI}_{95\%} = \hat{\mu} \pm t_{0.025,k-2}\sqrt{\hat{\tau}^2+SE^2(\hat{\mu})}Prediction interval
g=J×dg = J \times d, J=13/[4(nT+nC2)1]J = 1 - 3/[4(n_T+n_C-2)-1]Hedges' gg
ln(OR)=ln(ad/bc)\ln(OR) = \ln(ad/bc), v=1/a+1/b+1/c+1/dv = 1/a+1/b+1/c+1/dLog OR and variance
ln(RR)=lna/(a+b)c/(c+d)\ln(RR) = \ln\frac{a/(a+b)}{c/(c+d)}Log risk ratio
zr=0.5ln((1+r)/(1r))z_r = 0.5\ln((1+r)/(1-r)), v=1/(n3)v = 1/(n-3)Fisher's Z transformation

Effect Size Benchmarks

Effect SizeNegligibleSmallMediumLarge
Hedges' gg / Cohen's dd<0.20< 0.200.200.200.500.500.800.80
Pearson's rr<0.10< 0.100.100.100.300.300.500.50
OR / RR1\approx 11.51.52.52.54.0+4.0+

Heterogeneity Benchmarks

StatisticLowModerateSubstantialConsiderable
I2I^2025%0-25\%2550%25-50\%5075%50-75\%75100%75-100\%
QQ p-value>0.05> 0.05 (non-sig)<0.05< 0.05 (sig)

Model Selection Guide

ScenarioRecommended Approach
Studies are functionally identicalFixed-effect model
Studies differ in any wayRandom-effects model
High heterogeneity detectedRandom-effects + explore moderators
Continuous outcome, same scaleRaw MD
Continuous outcome, different scalesHedges' gg
Binary outcome (rare event, < 10%)Odds Ratio
Binary outcome (common event, > 10%)Risk Ratio or Risk Difference
Correlation studiesFisher's zrz_r → back-transform to rr
Prevalence studiesLogit or double-arcsine transformation

Publication Bias Assessment

MethodTypeH0H_0When to Use
Funnel plotVisualSymmetryAlways (visual inspection)
Egger's testFormalIntercept = 0k10k \geq 10, continuous/OR effects
Begg's testFormalNo rank correlationk10k \geq 10; lower power than Egger's
Trim and FillAdjustmentSymmetrySensitivity analysis for adjusted estimate
Fail-Safe NRobustnessPooled effect =0= 0Supplementary robustness check

Summary of Effect Size Formulas

MeasurePoint EstimateVariance
Raw MDXˉTXˉC\bar{X}_T - \bar{X}_CSDT2/nT+SDC2/nCSD_T^2/n_T + SD_C^2/n_C
Cohen's ddMD/SDpooledMD/SD_\text{pooled}nT+nCnTnC+d22(nT+nC2)\frac{n_T+n_C}{n_T n_C}+\frac{d^2}{2(n_T+n_C-2)}
Hedges' ggJ×dJ \times dJ2×vdJ^2 \times v_d
ln(OR)\ln(OR)ln(ad/bc)\ln(ad/bc)1/a+1/b+1/c+1/d1/a+1/b+1/c+1/d
ln(RR)\ln(RR)lna(c+d)c(a+b)\ln\frac{a(c+d)}{c(a+b)}1/a1/(a+b)+1/c1/(c+d)1/a-1/(a+b)+1/c-1/(c+d)
RDa/(a+b)c/(c+d)a/(a+b) - c/(c+d)ab/(a+b)3+cd/(c+d)3ab/(a+b)^3 + cd/(c+d)^3
Fisher's zrz_rarctanh(r)\text{arctanh}(r)1/(n3)1/(n-3)
logit(pp)ln(p/(1p))\ln(p/(1-p))1/(np(1p))1/(np(1-p))

This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Meta-Analysis using the DataStatPro application. For further reading, consult Borenstein et al.'s "Introduction to Meta-Analysis", Hedges & Olkin's "Statistical Methods for Meta-Analysis", or Higgins & Thomas's "Cochrane Handbook for Systematic Reviews of Interventions". For feature requests or support, contact the DataStatPro team.