Knowledge Base / Difference-in-Differences Advanced Analysis 61 min read

Difference-in-Differences

Comprehensive reference guide for Difference-in-Differences (DiD) causal inference models.

Difference-in-Differences Models: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of Difference-in-Differences (DiD) estimation all the way through advanced extensions, assumption testing, heterogeneity analysis, and practical usage within the DataStatPro application. Whether you are a complete beginner or an experienced analyst, this guide is structured to build your understanding step by step.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is the Difference-in-Differences Design?
  3. The Mathematical Framework
  4. The Parallel Trends Assumption
  5. Identification and Causal Inference
  6. Standard DiD Estimation
  7. Hypothesis Testing and Inference
  8. Effect Size Measures
  9. Model Fit and Evaluation
  10. Diagnostics and Assumption Testing
  11. Extensions: Staggered DiD and Multiple Time Periods
  12. Extensions: Heterogeneous Treatment Effects
  13. Extensions: Continuous and Fuzzy Treatment
  14. Covariates and Controls in DiD
  15. Using the DiD Component
  16. Computational and Formula Details
  17. Worked Examples
  18. Common Mistakes and How to Avoid Them
  19. Troubleshooting
  20. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into Difference-in-Differences, it is helpful to be familiar with the following foundational concepts. Do not worry if you are not — each concept is briefly explained here.

1.1 Counterfactuals and the Potential Outcomes Framework

The potential outcomes framework (Rubin Causal Model) is the conceptual foundation of causal inference. For each unit ii and time period tt, define:

The individual treatment effect for unit ii at time tt is:

τit=Yit(1)Yit(0)\tau_{it} = Y_{it}(1) - Y_{it}(0)

The fundamental problem of causal inference: We can never observe both Yit(1)Y_{it}(1) and Yit(0)Y_{it}(0) for the same unit at the same time. We observe only one — the realised outcome. The unobserved outcome is called the counterfactual.

The Average Treatment Effect on the Treated (ATT) is:

ATT=E[Yit(1)Yit(0)Di=1]ATT = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1]

DiD is one of the most widely used methods for estimating the ATT using a comparison group to approximate the unobserved counterfactual.

1.2 Selection Bias

Selection bias arises when the assignment to treatment is not random — treated and control units differ systematically in ways that also affect the outcome. A naïve comparison of treated vs. untreated units confounds the treatment effect with pre-existing differences:

E[YD=1]E[YD=0]=ATTcausal effect+E[Y(0)D=1]E[Y(0)D=0]selection biasE[Y \mid D=1] - E[Y \mid D=0] = \underbrace{ATT}_{\text{causal effect}} + \underbrace{E[Y(0) \mid D=1] - E[Y(0) \mid D=0]}_{\text{selection bias}}

DiD removes selection bias due to time-invariant unobserved differences between groups.

1.3 Panel Data

Panel data (also called longitudinal data) consists of observations on the same units (individuals, firms, regions, countries) over multiple time periods. It has two dimensions: a cross-sectional dimension (nn units) and a time dimension (TT periods).

Panel data are written as {Yit,Dit,Xit}\{Y_{it}, D_{it}, \mathbf{X}_{it}\} for unit i=1,,ni = 1, \dots, n and period t=1,,Tt = 1, \dots, T.

DiD most naturally arises in a panel data context, though it can also be implemented with repeated cross-sections.

1.4 Fixed Effects

A unit fixed effect αi\alpha_i captures all time-invariant characteristics of unit ii — both observed and unobserved — that affect the outcome. By including fixed effects in a regression, we effectively compare each unit to itself over time, removing all time-invariant confounders.

A time fixed effect λt\lambda_t captures factors that affect all units equally at time tt — common macroeconomic conditions, seasonal patterns, or universal policy changes.

1.5 Ordinary Least Squares (OLS) Regression

OLS regression finds the linear relationship between predictors X\mathbf{X} and outcome YY by minimising the sum of squared residuals:

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

DiD is typically implemented as an OLS regression with specific interaction terms and fixed effects, so familiarity with OLS is essential.

1.6 Treatment Assignment and Natural Experiments

A natural experiment is a situation in which the assignment of units to treatment and control conditions is determined by some external, exogenous factor — rather than by the researcher or by the units themselves. Natural experiments approximate the conditions of a randomised controlled trial. Common examples:

DiD is the workhorse estimator for natural experiments with a pre-period and a post-period.


2. What is the Difference-in-Differences Design?

2.1 The Core Idea

Difference-in-Differences (DiD) is a quasi-experimental research design that estimates the causal effect of a treatment or policy by comparing the change over time in the outcome for a treated group to the change over time in the outcome for an untreated (control) group.

The intuition is straightforward:

τ^DiD=(Yˉtreated,postYˉtreated,pre)before-after change in treated group(Yˉcontrol,postYˉcontrol,pre)before-after change in control group\hat{\tau}_{DiD} = \underbrace{(\bar{Y}_{treated,post} - \bar{Y}_{treated,pre})}_{\text{before-after change in treated group}} - \underbrace{(\bar{Y}_{control,post} - \bar{Y}_{control,pre})}_{\text{before-after change in control group}}

By taking the difference of these two differences, DiD isolates the treatment effect from:

  1. Pre-existing level differences between treated and control groups.
  2. Common time trends affecting both groups equally.

2.2 The 2×2 DiD Setup

The simplest (canonical) DiD design has:

The canonical 2×2 DiD table of group-period means:

Pre-Period (t=0t = 0)Post-Period (t=1t = 1)Difference (Post − Pre)
Treated (D=1D=1)Yˉ1,0\bar{Y}_{1,0}Yˉ1,1\bar{Y}_{1,1}Yˉ1,1Yˉ1,0\bar{Y}_{1,1} - \bar{Y}_{1,0}
Control (D=0D=0)Yˉ0,0\bar{Y}_{0,0}Yˉ0,1\bar{Y}_{0,1}Yˉ0,1Yˉ0,0\bar{Y}_{0,1} - \bar{Y}_{0,0}
Difference (Treated − Control)Yˉ1,0Yˉ0,0\bar{Y}_{1,0} - \bar{Y}_{0,0}Yˉ1,1Yˉ0,1\bar{Y}_{1,1} - \bar{Y}_{0,1}τ^DiD\hat{\tau}_{DiD}

The DiD estimate:

τ^DiD=(Yˉ1,1Yˉ1,0)(Yˉ0,1Yˉ0,0)\hat{\tau}_{DiD} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - (\bar{Y}_{0,1} - \bar{Y}_{0,0})

2.3 Real-World Applications

DiD is one of the most widely applied methods in empirical social science, economics, public health, and policy evaluation:

2.4 DiD vs. Other Quasi-Experimental Methods

MethodKey AssumptionWhen to Use
DiDParallel trends in absence of treatmentPanel data or repeated cross-sections; policy timing varies
Regression Discontinuity (RD)No manipulation around the cutoffAssignment determined by a continuous threshold
Instrumental Variables (IV)Instrument relevance and exclusionEndogenous treatment with a valid instrument
Synthetic ControlWeighted average of controls matches treated pre-trendSingle treated unit; many potential controls
Event StudyNo pre-trends; clean identification windowMultiple time periods around treatment timing
Propensity Score MatchingConditional independence (selection on observables)Rich covariate data; no unobservable confounders

3. The Mathematical Framework

3.1 The Canonical 2×2 DiD Model

The standard regression formulation of the 2×2 DiD model is:

Yit=α+βTreatedi+γPostt+δ(Treatedi×Postt)+ϵitY_{it} = \alpha + \beta \cdot \text{Treated}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{Treated}_i \times \text{Post}_t) + \epsilon_{it}

Where:

Predicted cell means from the regression:

Pre (Post=0\text{Post} = 0)Post (Post=1\text{Post} = 1)Difference
Control (Treated=0\text{Treated} = 0)α\alphaα+γ\alpha + \gammaγ\gamma
Treated (Treated=1\text{Treated} = 1)α+β\alpha + \betaα+β+γ+δ\alpha + \beta + \gamma + \deltaγ+δ\gamma + \delta
Difference (T − C)β\betaβ+δ\beta + \deltaδ\boldsymbol{\delta}

3.2 The Two-Way Fixed Effects (TWFE) Model

The most general and widely used DiD regression extends the canonical model to panel data with unit fixed effects and time fixed effects:

Yit=αi+λt+δDit+XitTβ+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}

Where:

Key insight: The TWFE model is the natural extension of the 2×2 DiD to multiple units and multiple time periods. The coefficient δ\delta on the treatment indicator DitD_{it} is the DiD estimate once unit and time fixed effects are included.

3.3 The Within-Estimator Interpretation

The TWFE estimator is equivalent to the within estimator (demeaning). Define:

Y¨it=YitYˉiYˉt+Yˉ\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{Y}

Where Yˉi=T1tYit\bar{Y}_i = T^{-1}\sum_t Y_{it} (unit mean), Yˉt=n1iYit\bar{Y}_t = n^{-1}\sum_i Y_{it} (time mean), and Yˉ=(nT)1itYit\bar{Y} = (nT)^{-1}\sum_{it} Y_{it} (grand mean). Similarly define D¨it\ddot{D}_{it} and X¨it\ddot{\mathbf{X}}_{it}.

The TWFE estimator is:

δ^TWFE=itD¨itY¨ititD¨it2\hat{\delta}_{TWFE} = \frac{\sum_{it} \ddot{D}_{it} \ddot{Y}_{it}}{\sum_{it} \ddot{D}_{it}^2}

This identifies δ\delta from within-unit, within-time variation in treatment status.

3.4 Potential Outcomes Representation

In the potential outcomes framework, the DiD estimand is:

δDiD=E[Yit(1)Yit(0)Di=1,t=1]\delta_{DiD} = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1, t = 1]

This is the ATT in the post-treatment period — the average effect on the treated units of receiving treatment.

The DiD identification strategy replaces the unobserved counterfactual E[Yit(0)Di=1,t=1]E[Y_{it}(0) \mid D_i = 1, t = 1] with the observable:

E[Yit(0)Di=1,t=1]=E[Yit(0)Di=1,t=0]+(E[Yit(0)Di=0,t=1]E[Yit(0)Di=0,t=0])time trend from control groupE[Y_{it}(0) \mid D_i = 1, t = 1] = E[Y_{it}(0) \mid D_i = 1, t = 0] + \underbrace{\left(E[Y_{it}(0) \mid D_i = 0, t = 1] - E[Y_{it}(0) \mid D_i = 0, t = 0]\right)}_{\text{time trend from control group}}

This is exactly the parallel trends assumption — the counterfactual trend for the treated group equals the observed trend for the control group.


4. The Parallel Trends Assumption

4.1 The Assumption Stated

The parallel trends assumption (also called the common trends or parallel paths assumption) is the key identifying assumption of DiD:

In the absence of treatment, the average outcome for the treated group would have followed the same trend as the average outcome for the control group.

Formally:

E[Yit(0)Di=1,t=1]E[Yit(0)Di=1,t=0]=E[Yit(0)Di=0,t=1]E[Yit(0)Di=0,t=0]E[Y_{it}(0) \mid D_i = 1, t = 1] - E[Y_{it}(0) \mid D_i = 1, t = 0] = E[Y_{it}(0) \mid D_i = 0, t = 1] - E[Y_{it}(0) \mid D_i = 0, t = 0]

Crucially: This assumption is about the counterfactual — what would have happened to the treated group had it not been treated. It is fundamentally untestable with post-treatment data, but can be supported with:

4.2 Visualising Parallel Trends

The canonical DiD diagram plots the outcome over time for both groups:

Outcome
  |
  |             ● Treated (actual)
  |           /
  |         /   ↑ δ = Treatment Effect
  |       /   /
  |     / ---/ ← Treated counterfactual (unobserved)
  |   /   /
  |  ●   /
  | / \ /
  |/   ● Control (observed, serves as counterfactual trend)
  |
  +--------+--------→ Time
        Pre       Post
           ↑
        Treatment
        begins

The treatment effect δ\delta is the vertical distance between the actual treated outcome and the counterfactual treated outcome in the post-period. The control group's observed trajectory is the counterfactual trend.

4.3 When is Parallel Trends Plausible?

Parallel trends is more plausible when:

Parallel trends is less plausible when:

4.4 Parallel Trends in Different Functional Forms

The parallel trends assumption is not scale-invariant. It may hold on the levels scale but not on the logarithmic scale (or vice versa):

The choice of outcome transformation (levels, logs, rates) should be guided by theory about the nature of the treatment effect and the plausibility of parallel trends.

4.5 Conditional Parallel Trends

The parallel trends assumption may only hold conditional on observable covariates Xi\mathbf{X}_i:

E[Yit(0)Di=1,Xi,t=1]E[Yit(0)Di=1,Xi,t=0]=E[Yit(0)Di=0,Xi,t=1]E[Yit(0)Di=0,Xi,t=0]E[Y_{it}(0) \mid D_i = 1, \mathbf{X}_i, t = 1] - E[Y_{it}(0) \mid D_i = 1, \mathbf{X}_i, t = 0] = E[Y_{it}(0) \mid D_i = 0, \mathbf{X}_i, t = 1] - E[Y_{it}(0) \mid D_i = 0, \mathbf{X}_i, t = 0]

When unconditional parallel trends is implausible, including covariates (Section 14) can restore the assumption by controlling for observable differences in time trends between groups.


5. Identification and Causal Inference

5.1 What DiD Identifies

Under the parallel trends assumption, the DiD regression coefficient δ\delta identifies the Average Treatment Effect on the Treated (ATT) in the post-treatment period:

δ^DiDpATT=E[Yit(1)Yit(0)Di=1,tt0]\hat{\delta}_{DiD} \xrightarrow{p} ATT = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1, t \geq t_0]

Where t0t_0 is the treatment onset period.

Not identified by DiD:

5.2 The No Anticipation Assumption

A supplementary assumption is no anticipation: treated units do not change their behaviour in the pre-treatment period in anticipation of receiving treatment.

Formally, for all t<t0t < t_0:

Yit(1)=Yit(0)for all i with Di=1Y_{it}(1) = Y_{it}(0) \quad \text{for all } i \text{ with } D_i = 1

Why it matters: If treated units begin changing before the treatment officially starts (e.g., firms start investing as soon as a subsidy is announced), the pre-period outcome already reflects anticipatory responses. This violates the parallel trends assumption in the pre-period and biases the DiD estimator.

How to check: Pre-treatment placebo tests (event study coefficients for pre-period leads should be near zero).

5.3 The Stable Unit Treatment Value Assumption (SUTVA)

SUTVA has two components:

  1. No interference: The treatment status of unit ii does not affect the potential outcomes of unit jj (no spillovers, general equilibrium effects, or cross-unit contamination).

  2. No hidden versions of treatment: There is only one version of the treatment; all treated units receive the same treatment.

Violations: Spillovers arise when treatment of some units affects control units (e.g., a local employment policy in one area displaces workers to other areas, affecting those areas' outcomes). SUTVA violations bias the DiD estimator.

5.4 Exogeneity of Treatment Timing

In staggered DiD designs (Section 11), a key requirement is that the timing of treatment adoption is exogenous — not determined by pre-existing trends or anticipation of future outcomes. If units that were doing well adopt treatment earlier, the DiD estimator is biased.

5.5 DiD as a Special Case of the Fixed Effects Estimator

The 2×2 DiD estimator is numerically equivalent to the first-differences estimator in a two-period panel:

ΔYi=Yi,1Yi,0=δΔDi+Δϵi\Delta Y_i = Y_{i,1} - Y_{i,0} = \delta \cdot \Delta D_i + \Delta \epsilon_i

Where ΔDi=Di,1Di,0=1\Delta D_i = D_{i,1} - D_{i,0} = 1 for treated units and 00 for control units. OLS on this first-differenced equation produces δ^=(ΔYˉtreatedΔYˉcontrol)\hat{\delta} = (\bar{\Delta Y}_{treated} - \bar{\Delta Y}_{control}) — exactly the DiD formula.


6. Standard DiD Estimation

6.1 OLS Estimation of the Canonical DiD

The 2×2 DiD regression:

Yit=α+βTreatedi+γPostt+δ(Treatedi×Postt)+ϵitY_{it} = \alpha + \beta \cdot \text{Treated}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{Treated}_i \times \text{Post}_t) + \epsilon_{it}

is estimated by OLS. The DiD coefficient:

δ^=(Yˉ1,1Yˉ0,1)(Yˉ1,0Yˉ0,0)\hat{\delta} = (\bar{Y}_{1,1} - \bar{Y}_{0,1}) - (\bar{Y}_{1,0} - \bar{Y}_{0,0})

This can be written as:

δ^=β^OLS=(XTX)1XTy\hat{\delta} = \hat{\beta}_{OLS} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

Where X\mathbf{X} contains the constant, Treatedi\text{Treated}_i, Postt\text{Post}_t, and their interaction Treatedi×Postt\text{Treated}_i \times \text{Post}_t.

6.2 TWFE Estimation with Panel Data

The TWFE estimator is obtained by including unit and time dummies (or using the within-transformation):

Using dummy variables:

Yit=i=1nαi1[unit=i]+t=1Tλt1[period=t]+δDit+XitTβ+ϵitY_{it} = \sum_{i=1}^n \alpha_i \mathbf{1}[unit = i] + \sum_{t=1}^T \lambda_t \mathbf{1}[period = t] + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}

Using the within (demeaning) transformation:

Y¨it=δD¨it+X¨itTβ+ϵ¨it\ddot{Y}_{it} = \delta \cdot \ddot{D}_{it} + \ddot{\mathbf{X}}_{it}^T\boldsymbol{\beta} + \ddot{\epsilon}_{it}

Where Y¨it=YitYˉiYˉt+Yˉˉ\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{\bar{Y}} and similarly for other variables.

The TWFE estimator δ^\hat{\delta} is consistent under the parallel trends assumption and strict exogeneity of treatment given fixed effects.

6.3 First Differences Estimator

An alternative to demeaning is first differencing, which subtracts the previous period's observation:

ΔYit=YitYi,t1=λtλt1+δ(DitDi,t1)+XitTβ+Δϵit\Delta Y_{it} = Y_{it} - Y_{i,t-1} = \lambda_t - \lambda_{t-1} + \delta(D_{it} - D_{i,t-1}) + \mathbf{X}_{it}^{*T}\boldsymbol{\beta} + \Delta\epsilon_{it}

In a two-period model, first differences and within estimation are identical. For T>2T > 2, they differ in efficiency — first differences is more efficient when ϵit\epsilon_{it} follows a random walk; within estimation is more efficient when ϵit\epsilon_{it} is serially uncorrelated.

6.4 Weighted DiD

When the groups have unequal sizes or when reweighting is needed to improve comparability, a weighted DiD uses weights wiw_i:

δ^WDiD=iwiDiΔYiiwiDiiwi(1Di)ΔYiiwi(1Di)\hat{\delta}_{WDiD} = \frac{\sum_i w_i D_i \Delta Y_i}{\sum_i w_i D_i} - \frac{\sum_i w_i (1-D_i) \Delta Y_i}{\sum_i w_i (1-D_i)}

Common weighting schemes:

6.5 DiD with Repeated Cross-Sections

When panel data (the same units followed over time) are unavailable, DiD can be implemented with repeated cross-sections — independent samples drawn from the same population at each time period. The DiD regression:

Yig=α+βGg+γTt+δ(Gg×Tt)+ϵigY_{ig} = \alpha + \beta \cdot G_g + \gamma \cdot T_t + \delta \cdot (G_g \times T_t) + \epsilon_{ig}

Where:

The DiD estimator is valid under the assumption that the cross-sectional samples are representative of the same underlying population in each period, even though different individuals are observed.


7. Hypothesis Testing and Inference

7.1 Standard Error Choices

The choice of standard errors is critical in DiD analyses. Several options are available, with different assumptions:

7.1.1 OLS Standard Errors

SEOLS(δ^)=σ^2[(XTX)1]δδSE_{OLS}(\hat{\delta}) = \sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{\delta\delta}}

Valid only under homoscedasticity and no serial correlation. Almost never appropriate for DiD — treated group observations are typically serially correlated.

7.1.2 Heteroscedasticity-Robust (HC) Standard Errors

SEHC(δ^)=[(XTX)1XTΩ^X(XTX)1]δδSE_{HC}(\hat{\delta}) = \sqrt{[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\hat{\Omega}\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}]_{\delta\delta}}

Where Ω^=diag(ϵ^it2)\hat{\Omega} = \text{diag}(\hat{\epsilon}_{it}^2). Accounts for heteroscedasticity but not serial correlation.

7.1.3 Cluster-Robust Standard Errors

The most recommended standard errors for DiD. Clustering at the group level (e.g., state, firm, country) allows for arbitrary heteroscedasticity and serial correlation within clusters:

SEcluster(δ^)=[(XTX)1(g=1GXgTϵ^gϵ^gTXg)(XTX)1]δδSE_{cluster}(\hat{\delta}) = \sqrt{\left[(\mathbf{X}^T\mathbf{X})^{-1}\left(\sum_{g=1}^G \mathbf{X}_g^T\hat{\epsilon}_g\hat{\epsilon}_g^T\mathbf{X}_g\right)(\mathbf{X}^T\mathbf{X})^{-1}\right]_{\delta\delta}}

Where gg indexes clusters, Xg\mathbf{X}_g and ϵ^g\hat{\epsilon}_g are the design matrix and residuals for cluster gg.

Critical recommendation (Bertrand, Duflo & Mullainathan, 2004): Always cluster at the level of treatment assignment (e.g., state-level policy → cluster at state level). Failure to do so leads to severely underestimated standard errors and spurious significance.

7.1.4 Wild Cluster Bootstrap

When the number of clusters is small (G<30G < 30), cluster-robust standard errors based on asymptotic approximations can be unreliable. The wild cluster bootstrap (Cameron, Gelbach & Miller) provides more reliable inference:

  1. Estimate the model and obtain cluster-robust residuals ϵ^g\hat{\epsilon}_g.
  2. For each bootstrap replication b=1,,Bb = 1, \dots, B:
    • Draw vg(b){1,+1}v_g^{(b)} \in \{-1, +1\} with equal probability for each cluster gg.
    • Construct bootstrap residuals ϵ~it(b)=vg(b)ϵ^it\tilde{\epsilon}_{it}^{(b)} = v_g^{(b)} \hat{\epsilon}_{it}.
    • Form the bootstrap outcome: Yit(b)=Y^it+ϵ~it(b)Y_{it}^{*(b)} = \hat{Y}_{it} + \tilde{\epsilon}_{it}^{(b)}.
    • Re-estimate δ^(b)\hat{\delta}^{(b)}.
  3. Use the distribution of δ^(b)\hat{\delta}^{(b)} to compute p-values and confidence intervals.
Standard Error TypeWhen to UseKey Assumption
OLSNever (DiD context)IID errors
HC (Robust)Small nn, homogeneous clustersHeteroscedastic, no serial corr.
Cluster-RobustStandard recommendationWithin-cluster correlation allowed
Wild Cluster BootstrapFew clusters (G<30G < 30)More reliable with few clusters
Block BootstrapPanel data, spatial correlationResamples entire clusters

7.2 The Wald Test for the DiD Coefficient

The Wald test for the DiD effect tests H0:δ=0H_0: \delta = 0:

t=δ^SE(δ^)tnkt = \frac{\hat{\delta}}{SE(\hat{\delta})} \sim t_{n-k}

Where nkn - k is the residual degrees of freedom. With cluster-robust standard errors, use the tt-distribution with G1G - 1 degrees of freedom (where GG is the number of clusters):

t=δ^SEcluster(δ^)tG1t = \frac{\hat{\delta}}{SE_{cluster}(\hat{\delta})} \sim t_{G-1}

A (1α)×100%(1-\alpha) \times 100\% confidence interval for δ\delta:

δ^±tα/2,G1×SEcluster(δ^)\hat{\delta} \pm t_{\alpha/2,\, G-1} \times SE_{cluster}(\hat{\delta})

7.3 F-Test for Joint Significance

To jointly test whether a vector of DiD coefficients is zero (e.g., in a model with multiple treatment indicators):

H0:Rβ=0H_0: \mathbf{R}\boldsymbol{\beta} = \mathbf{0}

F=(Rβ^)T[R(XTΩ^X)1RT]1(Rβ^)qFq,GqF = \frac{(\mathbf{R}\hat{\boldsymbol{\beta}})^T[\mathbf{R}(\mathbf{X}^T\hat{\Omega}\mathbf{X})^{-1}\mathbf{R}^T]^{-1}(\mathbf{R}\hat{\boldsymbol{\beta}})}{q} \sim F_{q, G-q}

Where qq is the number of restrictions.

7.4 Inference with Few Treated Units

A common challenge is when only a few units receive treatment (e.g., one state, two firms). In such cases:


8. Effect Size Measures

8.1 The DiD Coefficient as Effect Size

The primary effect size in DiD is the DiD coefficient δ^\hat{\delta} itself. Its interpretation depends on the model specification:

8.2 Percent Change Effect

When the outcome is in levels, the percent change caused by treatment is:

%Δ=δ^Yˉtreated,pre×100%\%\Delta = \frac{\hat{\delta}}{\bar{Y}_{treated,pre}} \times 100\%

Where Yˉtreated,pre\bar{Y}_{treated,pre} is the pre-treatment mean of the treated group. This contextualises the absolute effect size relative to the pre-treatment baseline.

8.3 Standardised Effect Size (Cohen's d Analogue)

Standardise the DiD estimate by the pre-treatment standard deviation of the outcome:

dDiD=δ^spred_{DiD} = \frac{\hat{\delta}}{s_{pre}}

Where spres_{pre} is the pooled pre-treatment standard deviation across treated and control groups. Benchmarks follow Cohen (1988):

| dDiD|d_{DiD}| | Effect Size | | :---------- | :---------- | | 0.200.20 | Small | | 0.500.50 | Medium | | 0.800.80 | Large |

8.4 Relative Reduction/Increase

For outcomes where the baseline level matters (e.g., crime rates, disease incidence), report the relative effect:

RR=Yˉtreated,postobservedYˉtreated,postcounterfactual=Yˉ1,1Yˉ1,0+(Yˉ0,1Yˉ0,0)RR = \frac{\bar{Y}_{treated,post}^{observed}}{\bar{Y}_{treated,post}^{counterfactual}} = \frac{\bar{Y}_{1,1}}{\bar{Y}_{1,0} + (\bar{Y}_{0,1} - \bar{Y}_{0,0})}

Or the relative DiD:

δ^rel=δ^Yˉtreated,pre(fractional change)\hat{\delta}_{rel} = \frac{\hat{\delta}}{\bar{Y}_{treated,pre}} \quad \text{(fractional change)}

8.5 Number Needed to Treat (NNT)

For binary outcomes (e.g., employed/unemployed, insured/uninsured):

NNT=1δ^NNT = \frac{1}{|\hat{\delta}|}

The NNT represents the number of units that need to be treated to produce one additional success (or prevented failure), contextualising the policy significance of the effect.

8.6 R2R^2 and Explained Variance

While not a primary effect size for DiD, report the within-R2R^2 (after partialling out fixed effects) to convey how much treatment variation explains the residual variation in the outcome:

Rwithin2=1SSresidualSSwithinunit,withintimeR^2_{within} = 1 - \frac{SS_{residual}}{SS_{within-unit, within-time}}

Report both the overall R2R^2 and the within R2R^2 for TWFE models.


9. Model Fit and Evaluation

9.1 Goodness-of-Fit Statistics

Standard regression fit statistics apply to the DiD regression:

StatisticFormulaDescription
R2R^21SSres/SStot1 - SS_{res}/SS_{tot}Overall variance explained
Within R2R^21SSres/SSwithin1 - SS_{res}/SS_{within}Variance explained within unit × time cells
Between R2R^2Based on group-time meansVariance explained between group-time cells
Adjusted R2R^21(1R2)(n1)/(nk1)1 - (1-R^2)(n-1)/(n-k-1)R2R^2 penalised for parameters
RMSESSres/(nk)\sqrt{SS_{res}/(n-k)}Root mean squared error
AICnln(SSres/n)+2kn\ln(SS_{res}/n) + 2kPenalised fit (lower is better)
BICnln(SSres/n)+kln(n)n\ln(SS_{res}/n) + k\ln(n)Strongly penalised fit (lower is better)

9.2 Fit of the Counterfactual

A key model evaluation step is assessing how well the control group serves as a counterfactual for the treated group's pre-treatment trajectory. Visually:

Quantitatively: Compute the pre-treatment DiD — the difference in trends during the pre-period. If this is near zero and statistically insignificant, the parallel trends assumption is supported.

9.3 Information Criteria for Model Comparison

When comparing DiD specifications (e.g., different control sets, different functional forms, different clustering levels), use AIC and BIC:

AIC=2k2ln(L^)AIC = 2k - 2\ln(\hat{L})

BIC=kln(n)2ln(L^)BIC = k\ln(n) - 2\ln(\hat{L})

Where kk is the number of parameters and L^\hat{L} is the maximised likelihood.

Note: AIC and BIC comparisons are only valid for models fitted to the same sample using the same outcome variable (e.g., levels vs. logs are not comparable on these criteria).

9.4 Assessing Balance on Pre-Treatment Covariates

A critical validity check is whether treated and control groups have similar pre-treatment characteristics:


10. Diagnostics and Assumption Testing

10.1 Pre-Trends Test (Event Study)

The event study (dynamic DiD) is the primary tool for testing the parallel trends assumption using pre-treatment data. It estimates a separate DiD coefficient for each time period:

Yit=αi+λt+k1δk1[tTi=k]Di+XitTβ+ϵitY_{it} = \alpha_i + \lambda_t + \sum_{k \neq -1} \delta_k \cdot \mathbf{1}[t - T_i^* = k] \cdot D_i + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}

Where:

Interpretation:

Formal pre-trends test: Joint FF-test (or χ2\chi^2 test) that all pre-treatment coefficients δk\delta_k (k<0k < 0) are jointly zero:

H0:δK=δK+1==δ2=0H_0: \delta_{-K} = \delta_{-K+1} = \dots = \delta_{-2} = 0

A non-significant result supports the parallel trends assumption; a significant result casts doubt.

⚠️ Passing the pre-trends test does not prove parallel trends hold in the post-period — trends may diverge after treatment for reasons unrelated to treatment. The pre-trends test is a necessary but not sufficient condition for identification.

10.2 Placebo Tests

Placebo tests assess whether the estimated DiD effect could have arisen by chance or due to confounding:

10.2.1 Placebo Time Periods

Estimate the DiD using only pre-treatment data, using a false treatment date (e.g., 2 years before the true treatment):

Yit=αi+λt+δplaceboDitplacebo+ϵitY_{it} = \alpha_i + \lambda_t + \delta_{placebo} \cdot D_{it}^{placebo} + \epsilon_{it}

A significant δ^placebo\hat{\delta}_{placebo} when treatment has not yet occurred suggests confounding or a violation of parallel trends.

10.2.2 Placebo Treatment Groups

Assign treatment to groups that were not actually treated and estimate the DiD. If the "treatment effect" is significant for these falsely treated groups, the design has poor identification.

10.2.3 Outcome Placebo Tests

Estimate the DiD using outcomes that should not be affected by the treatment. A null result (δ^0\hat{\delta} \approx 0) for these placebo outcomes increases confidence that the design is not picking up spurious effects.

10.3 Sensitivity to Parallel Trends Violations

Rambachan and Roth's sensitivity analysis (2023) provides a formal framework for assessing how large a violation of parallel trends would need to be to reverse the conclusion. The key parameter MM captures the maximum allowable deviation from parallel trends:

ΔpostΔpreM|\Delta_{post} - \Delta_{pre}| \leq M

Report breakdown values of MM — the maximum deviation consistent with the estimated effect remaining statistically significant or of the correct sign.

10.4 Testing for Anticipation Effects

Estimate the DiD including period k=1k = -1 (the period immediately before treatment) in the event study:

Yit=αi+λt+k=K1δk1[tTi=k]Di+k=0Kδk1[tTi=k]Di+ϵitY_{it} = \alpha_i + \lambda_t + \sum_{k = -K}^{-1} \delta_k \cdot \mathbf{1}[t - T_i^* = k] \cdot D_i + \sum_{k=0}^{K} \delta_k \cdot \mathbf{1}[t - T_i^* = k] \cdot D_i + \epsilon_{it}

If δ1\delta_{-1} is significantly different from zero, anticipation effects may be present.

10.5 Checking for Compositional Changes

In DiD with repeated cross-sections, the composition of the treated or control groups may change between periods. If the treatment induces sample selection (e.g., a health policy causes sick people to enter/exit the workforce), the DiD estimator may be biased.

How to check:

10.6 Residual Diagnostics

Standard regression diagnostics apply to the DiD residuals:


11. Extensions: Staggered DiD and Multiple Time Periods

11.1 Staggered Treatment Adoption

In many real-world settings, different units adopt treatment at different points in time — this is called staggered (or differential timing) DiD. For example, different US states adopt a policy in different years.

The TWFE regression in this context:

Yit=αi+λt+δDit+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \epsilon_{it}

The TWFE estimator δ^\hat{\delta} is a weighted average of all possible 2×2 DiD comparisons between groups that adopt treatment at different times — what Goodman-Bacon (2021) calls Bacon decomposition.

11.2 The Bacon Decomposition

Goodman-Bacon (2021) shows that the TWFE estimator in a staggered design decomposes as:

δ^TWFE=klsklδ^kl\hat{\delta}_{TWFE} = \sum_{k \neq l} s_{kl} \hat{\delta}_{kl}

Where δ^kl\hat{\delta}_{kl} is the 2×2 DiD comparing early adopters (treatment at time kk) vs. late adopters (treatment at time l>kl > k), and skl>0s_{kl} > 0 are weights summing to 1.

The problem of "forbidden comparisons": Some of these 2×2 DiDs compare a late adopter group in the post-period against an early adopter group that has already been treated — using already-treated units as a control group. If treatment effects are heterogeneous and dynamic (treatment effects change over time), this produces negative weights that can lead to:

11.3 Robust Staggered DiD Estimators

Several robust estimators have been developed to address the staggered DiD problem:

11.3.1 Callaway & Sant'Anna (2021) — Cohort-Specific ATTs

Define a cohort as the set of units that first receive treatment at the same calendar time gg. The cohort-average treatment effect on the treated is:

ATT(g,t)=E[Yt(g)Yt()G=g]ATT(g, t) = E[Y_t(g) - Y_t(\infty) \mid G = g]

Where Yt(g)Y_t(g) is the potential outcome at time tt if first treated at time gg, and Yt()Y_t(\infty) is the never-treated potential outcome.

Aggregation: Individual cohort-time ATTs are aggregated to form:

11.3.2 Sun & Abraham (2021) — Interaction-Weighted Estimator

Decompose the TWFE estimate using cohort × period interactions:

Yit=αi+λt+gk1δgk1[Gi=g]1[tg=k]+ϵitY_{it} = \alpha_i + \lambda_t + \sum_{g \neq \infty} \sum_{k \neq -1} \delta_{gk} \cdot \mathbf{1}[G_i = g] \cdot \mathbf{1}[t - g = k] + \epsilon_{it}

The interaction-weighted (IW) estimator aggregates δ^gk\hat{\delta}_{gk} using the share of each cohort in each period as weights, producing a heterogeneity-robust estimate of the average effect.

11.3.3 de Chaisemartin & D'Haultfœuille (2020) — DIDM\text{DID}_M

The DIDM\text{DID}_M estimator uses only "clean" comparisons — periods in which treatment status changes — to form the estimate:

DIDM=(i,t):Sit=1witΔYit(i,t):Sit=0witΔYit\text{DID}_M = \sum_{(i,t): S_{it}=1} w_{it} \Delta Y_{it} - \sum_{(i,t): S_{it}=0} w_{it} \Delta Y_{it}

Where Sit=1S_{it} = 1 if unit ii switches from untreated to treated between t1t-1 and tt, and weights witw_{it} ensure comparability.

11.3.4 Borusyak, Jaravel & Spiess (2024) — Imputation Estimator

Imputes the counterfactual using a linear factor model:

  1. Estimate αi\alpha_i and λt\lambda_t using untreated observations only.
  2. Impute the counterfactual: Y^it(0)=α^i+λ^t\hat{Y}_{it}(0) = \hat{\alpha}_i + \hat{\lambda}_t for treated observations.
  3. Estimate ATTs: τ^it=YitY^it(0)\hat{\tau}_{it} = Y_{it} - \hat{Y}_{it}(0).

11.4 Choosing Among Staggered DiD Estimators

EstimatorRobust to Effect HeterogeneityMultiple ControlsCovariatesKey Reference
TWFE❌ (negative weights possible)
Callaway-Sant'AnnaCallaway & Sant'Anna (2021)
Sun-AbrahamLimitedSun & Abraham (2021)
de Chaisemartin-D'HaultfœuilleLimitedLimiteddCH & DH (2020)
Borusyak-Jaravel-SpiessBJS (2024)

💡 For staggered designs, always report the Bacon decomposition to diagnose the extent of potentially problematic comparisons, and supplement TWFE with at least one robust estimator.


12. Extensions: Heterogeneous Treatment Effects

12.1 Why Treatment Effects May Be Heterogeneous

The standard DiD model estimates a single average treatment effect (ATT). In reality, treatment effects often vary across:

12.2 Subgroup DiD Analysis

To examine how treatment effects vary across a categorical moderator ZiZ_i:

Yit=αi+λt+δDit+γ(Dit×Zi)+XitTβ+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \gamma \cdot (D_{it} \times Z_i) + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}

Test of effect heterogeneity: H0:γ=0H_0: \gamma = 0 (no heterogeneity). Use cluster-robust standard errors.

12.3 Dynamic Treatment Effects (Event Study)

The event study design (Section 10.1) directly estimates dynamic treatment effects:

δk=E[Yit(1)Yit(0)Di=1,tTi=k]\delta_k = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1, t - T_i^* = k]

For k=0,1,2,,Kk = 0, 1, 2, \dots, K (periods since treatment onset). Plot δ^k\hat{\delta}_k with confidence intervals to visualise:

12.4 Heterogeneity-Robust Aggregation

The robust staggered DiD estimators (Section 11.3) produce cohort-specific ATTs ATT(g,t)ATT(g, t) that can be aggregated in various ways:

Average across all cohorts and post-periods: δˉ=1S(g,t)SATT(g,t)\bar{\delta} = \frac{1}{|\mathcal{S}|}\sum_{(g,t) \in \mathcal{S}} ATT(g, t)

Event-time average: ATT as a function of time since treatment: δ(k)=1{g:tg+k}g:tg+kATT(g,g+k)\delta(k) = \frac{1}{|\{g: t \geq g + k\}|}\sum_{g: t \geq g+k} ATT(g, g + k)

Calendar-time average: ATT in each calendar year: δ(t)=1{g:gt}gtATT(g,t)\delta(t) = \frac{1}{|\{g: g \leq t\}|}\sum_{g \leq t} ATT(g, t)


13. Extensions: Continuous and Fuzzy Treatment

13.1 Continuous Treatment Intensity (Dose-Response DiD)

When the treatment variable is continuous (e.g., amount of subsidy, level of minimum wage increase, exposure to a policy) rather than binary, the DiD model becomes:

Yit=αi+λt+δDit+XitTβ+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}

Where DitD_{it} is now a continuous variable representing the intensity of treatment. The coefficient δ\delta represents the effect of a one-unit increase in treatment intensity on the outcome.

Dose-response curve: Plot the predicted outcome as a function of treatment intensity at different time periods to visualise the dose-response relationship.

13.2 Fuzzy DiD (Instrumental Variables DiD)

In a fuzzy DiD design, the policy change shifts the probability of treatment but does not deterministically assign treatment. For example:

The binary treatment variable DitD_{it} measures actual take-up; the policy indicator ZitZ_{it} is the instrument (discontinuous change in eligibility or incentive).

First stage: Treatment as a function of the instrument:

Dit=π0+π1Zit+αiD+λtD+XitTπ+νitD_{it} = \pi_0 + \pi_1 Z_{it} + \alpha_i^D + \lambda_t^D + \mathbf{X}_{it}^T\boldsymbol{\pi} + \nu_{it}

Second stage: Outcome as a function of predicted treatment:

Yit=αi+λt+δD^it+XitTβ+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot \hat{D}_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}

The IV-DiD estimator δ^IV\hat{\delta}_{IV} estimates the Local Average Treatment Effect (LATE) — the effect on compliers (units that switch treatment status in response to the policy change).

Fuzzy DiD estimator:

δ^FuzzyDiD=δ^reducedformπ^1=ΔYˉtreatedΔYˉcontrolΔDˉtreatedΔDˉcontrol\hat{\delta}_{FuzzyDiD} = \frac{\hat{\delta}_{reduced\, form}}{\hat{\pi}_1} = \frac{\Delta\bar{Y}_{treated} - \Delta\bar{Y}_{control}}{\Delta\bar{D}_{treated} - \Delta\bar{D}_{control}}

13.3 Triple Differences (DDD)

Triple differences (DDD) adds a third source of variation to further control for confounders. The idea is to difference out group-specific time trends that are common to all individuals within a group:

Yigjt=α+(all pairwise FE)+δ(Treatedg×Postt×Eligiblej)+ϵigjtY_{igjt} = \alpha + \sum \text{(all pairwise FE)} + \delta \cdot (\text{Treated}_g \times \text{Post}_t \times \text{Eligible}_j) + \epsilon_{igjt}

Where jj is an additional dimension (e.g., age group, income group) that determines eligibility within the treated group.

DDD estimator:

δ^DDD=[(YˉTr,Post,ElYˉTr,Pre,El)(YˉTr,Post,NoElYˉTr,Pre,NoEl)]DiD among Treated Group[(YˉCtrl,Post,ElYˉCtrl,Pre,El)(YˉCtrl,Post,NoElYˉCtrl,Pre,NoEl)]DiD among Control Group\hat{\delta}_{DDD} = \underbrace{[(\bar{Y}_{Tr,Post,El} - \bar{Y}_{Tr,Pre,El}) - (\bar{Y}_{Tr,Post,NoEl} - \bar{Y}_{Tr,Pre,NoEl})]}_{\text{DiD among Treated Group}} - \underbrace{[(\bar{Y}_{Ctrl,Post,El} - \bar{Y}_{Ctrl,Pre,El}) - (\bar{Y}_{Ctrl,Post,NoEl} - \bar{Y}_{Ctrl,Pre,NoEl})]}_{\text{DiD among Control Group}}

DDD is valuable when the comparison across regions includes contamination from regional trends that differentially affect all groups in treated regions.


14. Covariates and Controls in DiD

14.1 Why Include Covariates?

Adding covariates to the DiD model serves two distinct purposes:

  1. Improving efficiency (precision): Covariates that predict the outcome reduce residual variance, shrinking standard errors and widening the confidence interval.

  2. Restoring conditional parallel trends: If unconditional parallel trends is implausible but trends are parallel after conditioning on observable characteristics, including covariates removes the confounding and restores identification.

14.2 Time-Invariant Covariates

In the TWFE model, time-invariant covariates Xi\mathbf{X}_i (e.g., gender, ethnicity, geographic characteristics) are absorbed by the unit fixed effect αi\alpha_i and cannot be estimated separately. However, they can be included as interactions with the treatment or time variables to allow their effect to vary:

Yit=αi+λt+δDit+γ(Dit×Xi)+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \gamma \cdot (D_{it} \times X_i) + \epsilon_{it}

14.3 Time-Varying Covariates

Time-varying covariates XitX_{it} can be included directly in the TWFE model:

Yit=αi+λt+δDit+βTXit+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \boldsymbol{\beta}^T \mathbf{X}_{it} + \epsilon_{it}

⚠️ Including time-varying covariates that are themselves affected by the treatment (i.e., "bad controls" or "mediators") is a common mistake. Including such variables absorbs part of the treatment effect, leading to underestimation of δ\delta. Only include covariates that are determined before treatment or that are plausibly unaffected by treatment.

14.4 Regression Adjustment (Outcome Regression)

The regression-adjusted DiD uses the control group's pre-to-post relationship between covariates and the outcome to construct an improved counterfactual:

δ^RA=(Yˉ1,1Yˉ1,0)m^(X1,0)\hat{\delta}_{RA} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - \hat{m}(\mathbf{X}_{1,0})

Where m^(X1,0)\hat{m}(\mathbf{X}_{1,0}) is the predicted counterfactual change based on the control group's covariate-outcome relationship. This improves efficiency and removes covariate-related bias.

14.5 Doubly Robust DiD (DR-DiD)

The doubly robust estimator combines propensity score weighting and outcome regression. It is consistent if either the propensity score model or the outcome regression model is correctly specified:

δ^DR=1n1Di=1[ΔYim^(Xi)]1n0Di=0w^(Xi)[ΔYim^(Xi)]\hat{\delta}_{DR} = \frac{1}{n_1}\sum_{D_i = 1}\left[\Delta Y_i - \hat{m}(\mathbf{X}_i)\right] - \frac{1}{n_0}\sum_{D_i = 0}\hat{w}(\mathbf{X}_i)\left[\Delta Y_i - \hat{m}(\mathbf{X}_i)\right]

Where w^(Xi)\hat{w}(\mathbf{X}_i) are propensity score reweighting terms and m^(Xi)\hat{m}(\mathbf{X}_i) is the regression-adjusted counterfactual change.

14.6 Controlling for Pre-Treatment Trends (Linear Trend Adjustment)

When parallel trends is violated by unit-specific linear time trends, include unit-specific trend terms:

Yit=αi+λt+ρit+δDit+ϵitY_{it} = \alpha_i + \lambda_t + \rho_i \cdot t + \delta \cdot D_{it} + \epsilon_{it}

Where ρit\rho_i \cdot t is a unit-specific linear time trend. This allows each unit to have its own pre-treatment trajectory, controlling for heterogeneous trends.

⚠️ Including unit-specific trends is a strong assumption (units would have continued on their pre-treatment trend indefinitely) and can overcontrol. Use only when unit-specific trends are well-established in the pre-period and when there are sufficient pre-period observations to estimate them.


15. Using the DiD Component

The Difference-in-Differences component in the DataStatPro application provides a comprehensive workflow for DiD estimation, testing, and visualisation.

Step-by-Step Guide

Step 1 — Select Dataset Choose the dataset from the "Dataset" dropdown. The dataset should have:

Step 2 — Select DiD Design Choose the DiD specification:

Step 3 — Select Variables Map the required variables from your dataset:

Step 4 — Specify Treatment Timing

Step 5 — Configure Fixed Effects Select fixed effects to include:

Step 6 — Configure Standard Errors Select the standard error type:

Step 7 — Select Staggered DiD Estimator (if applicable) For staggered designs, choose the robust estimator:

Step 8 — Configure Inference Options

Step 9 — Select Display Options Choose which outputs to display:

Step 10 — Run the Analysis Click "Run DiD Model". The application will:

  1. Construct the design matrix with appropriate interaction terms and fixed effects.
  2. Estimate the DiD coefficient(s) via OLS with specified standard errors.
  3. Compute the event study / dynamic effects coefficients.
  4. Run the pre-trends test and parallel trends diagnostics.
  5. Execute placebo tests (if requested).
  6. Compute the Bacon decomposition (for staggered designs).
  7. Generate all selected visualisations and tables.

16. Computational and Formula Details

16.1 The 2×2 DiD Estimator: Step-by-Step

Step 1: Compute group-period means

Yˉgt=1ngtigt=tYit\bar{Y}_{gt} = \frac{1}{n_{gt}}\sum_{i \in g}\sum_{t' = t} Y_{it}

For groups g{0,1}g \in \{0, 1\} (control, treated) and periods t{0,1}t \in \{0, 1\} (pre, post).

Step 2: First differences

ΔYˉtreated=Yˉ1,1Yˉ1,0\Delta\bar{Y}_{treated} = \bar{Y}_{1,1} - \bar{Y}_{1,0}

ΔYˉcontrol=Yˉ0,1Yˉ0,0\Delta\bar{Y}_{control} = \bar{Y}_{0,1} - \bar{Y}_{0,0}

Step 3: DiD estimate

δ^=ΔYˉtreatedΔYˉcontrol\hat{\delta} = \Delta\bar{Y}_{treated} - \Delta\bar{Y}_{control}

Step 4: Standard error (homoscedastic OLS)

With ngtn_{gt} observations per cell, n=n00+n01+n10+n11n = n_{00} + n_{01} + n_{10} + n_{11}:

SEOLS(δ^)=σ^1n00+1n01+1n10+1n11SE_{OLS}(\hat{\delta}) = \hat{\sigma}\sqrt{\frac{1}{n_{00}} + \frac{1}{n_{01}} + \frac{1}{n_{10}} + \frac{1}{n_{11}}}

Where σ^2=igt(YitY^it)2/(n4)\hat{\sigma}^2 = \sum_{igt}(Y_{it} - \hat{Y}_{it})^2 / (n - 4).

16.2 TWFE Estimation: The Demeaning Procedure

Step 1: Compute unit means, time means, and grand mean

Yˉi=1Tt=1TYit,Yˉt=1ni=1nYit,Yˉ=1nTi,tYit\bar{Y}_i = \frac{1}{T}\sum_{t=1}^T Y_{it}, \quad \bar{Y}_t = \frac{1}{n}\sum_{i=1}^n Y_{it}, \quad \bar{Y} = \frac{1}{nT}\sum_{i,t} Y_{it}

Step 2: Demean all variables

Y¨it=YitYˉiYˉt+Yˉ\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{Y} D¨it=DitDˉiDˉt+Dˉ\ddot{D}_{it} = D_{it} - \bar{D}_i - \bar{D}_t + \bar{D}

Step 3: Estimate TWFE by OLS on demeaned variables

δ^TWFE=itD¨itY¨ititD¨it2\hat{\delta}_{TWFE} = \frac{\sum_{it}\ddot{D}_{it}\ddot{Y}_{it}}{\sum_{it}\ddot{D}_{it}^2}

16.3 Cluster-Robust Variance Estimator

With GG clusters and the TWFE estimator:

V^cluster(δ^)=GG1n1nk(D¨TD¨)1(g=1GD¨gTϵ^gϵ^gTD¨g)(D¨TD¨)1\hat{V}_{cluster}(\hat{\delta}) = \frac{G}{G-1} \cdot \frac{n-1}{n-k} \cdot (\ddot{\mathbf{D}}^T\ddot{\mathbf{D}})^{-1} \left(\sum_{g=1}^G \ddot{\mathbf{D}}_g^T\hat{\boldsymbol{\epsilon}}_g\hat{\boldsymbol{\epsilon}}_g^T\ddot{\mathbf{D}}_g\right) (\ddot{\mathbf{D}}^T\ddot{\mathbf{D}})^{-1}

Where G/(G1)G/(G-1) is a small-sample correction, D¨g\ddot{\mathbf{D}}_g and ϵ^g\hat{\boldsymbol{\epsilon}}_g are the demeaned treatment vector and residuals for cluster gg.

SEcluster(δ^)=V^cluster(δ^)SE_{cluster}(\hat{\delta}) = \sqrt{\hat{V}_{cluster}(\hat{\delta})}

16.4 Event Study Regression: Full Specification

For a unit ii with treatment onset at period TiT_i^* and balanced panel from t=1t = 1 to TT:

Define event-time dummies:

1[k]it=1[tTi=k]Di\mathbf{1}[k]_{it} = \mathbf{1}[t - T_i^* = k] \cdot D_i

For k=K,,2,1,0,1,,Kk = -K, \dots, -2, -1, 0, 1, \dots, K (omitting k=1k = -1 as the reference).

Stack the regression:

Yit=αi+λt+k=K,k1Kδk1[k]it+ϵitY_{it} = \alpha_i + \lambda_t + \sum_{k=-K, k\neq-1}^{K} \delta_k \cdot \mathbf{1}[k]_{it} + \epsilon_{it}

Estimate by TWFE (adding unit and time FE as dummy variables or using within-transformation).

Confidence bands: For each kk, compute δ^k±1.96×SEcluster(δ^k)\hat{\delta}_k \pm 1.96 \times SE_{cluster}(\hat{\delta}_k).

16.5 The Bacon Decomposition

For a staggered design with cohorts (groups adopting treatment at different times), the TWFE estimator decomposes as:

δ^TWFE=g<gs^ggδ^gg+gs^gUδ^gU\hat{\delta}_{TWFE} = \sum_{g < g'} \hat{s}_{gg'} \hat{\delta}_{gg'} + \sum_{g} \hat{s}_{gU} \hat{\delta}_{gU}

Where:

The decomposition reveals how much of the TWFE estimate comes from each pairwise comparison, and which comparisons use already-treated units as controls (potentially problematic).

16.6 Pre-Trend Test Statistic

Joint F-test for pre-treatment event study coefficients:

F=(Rδ^)T[RV^RT]1(Rδ^)qF = \frac{(\mathbf{R}\hat{\boldsymbol{\delta}})^T[\mathbf{R}\hat{V}\mathbf{R}^T]^{-1}(\mathbf{R}\hat{\boldsymbol{\delta}})}{q}

Where R\mathbf{R} selects the pre-treatment coefficients (k=K,,2k = -K, \dots, -2), δ^\hat{\boldsymbol{\delta}} is the vector of event study estimates, V^\hat{V} is their variance-covariance matrix (cluster-robust), and q=K1q = K - 1 is the number of pre-period restrictions.

Under H0H_0 (parallel trends in pre-period): FFq,G1F \sim F_{q, G-1} approximately.

16.7 DiD with Binary Outcomes

For binary outcomes (Yit{0,1}Y_{it} \in \{0, 1\}), the linear probability model (LPM) DiD remains valid and interpretable:

E[YitDit,αi,λt]=αi+λt+δDitE[Y_{it} \mid D_{it}, \alpha_i, \lambda_t] = \alpha_i + \lambda_t + \delta \cdot D_{it}

δ^\hat{\delta} estimates the probability change (in percentage points) caused by treatment. While predicted probabilities may fall outside [0,1][0,1], the LPM DiD estimator of δ\delta is unbiased under parallel trends.

For probits or logits, the DiD interpretation is more complex and non-linear. The average marginal effect from a nonlinear DiD:

AMEDiD=1ntreatedi:Di=1[F^(η^i,post)F^(η^i,pre)]1ncontroli:Di=0[F^(η^i,post)F^(η^i,pre)]AME_{DiD} = \frac{1}{n_{treated}}\sum_{i: D_i=1}\left[\hat{F}(\hat{\eta}_{i,post}) - \hat{F}(\hat{\eta}_{i,pre})\right] - \frac{1}{n_{control}}\sum_{i: D_i=0}\left[\hat{F}(\hat{\eta}_{i,post}) - \hat{F}(\hat{\eta}_{i,pre})\right]

Where F^\hat{F} is the estimated CDF (probit or logistic) and η^\hat{\eta} is the linear predictor. Note: the LPM is generally preferred for DiD with binary outcomes due to tractability.


17. Worked Examples

Example 1: 2×2 DiD — Effect of Minimum Wage on Employment

Research Question: Did a 20% increase in the minimum wage in State A in 2019 affect the fast-food employment rate, using State B (which had no minimum wage change) as the control?

Data: Monthly employment rates for fast-food workers in State A (treated) and State B (control), 2017–2021. For simplicity, we use 2018 as pre-period and 2019 onward as post-period.

Step 1: Group-Period Mean Table

Pre-2019 Mean (Yˉg,0\bar{Y}_{g,0})Post-2019 Mean (Yˉg,1\bar{Y}_{g,1})Change (ΔYˉg\Delta\bar{Y}_g)
State A (Treated, D=1D=1)72.4%70.1%-2.3 pp
State B (Control, D=0D=0)74.1%73.2%-0.9 pp
DiD-2.3 − (−0.9) = −1.4 pp

Step 2: OLS Regression

Employmentit=α+βStateAi+γPostt+δ(StateAi×Postt)+ϵit\text{Employment}_{it} = \alpha + \beta \cdot \text{StateA}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{StateA}_i \times \text{Post}_t) + \epsilon_{it}

CoefficientEstimateCluster-Robust SEttpp95% CI
Intercept (α\alpha)74.10.42176.4< 0.001[73.2, 75.0]
State A (β\beta)-1.70.61-2.790.023[-3.1, -0.3]
Post (γ\gamma)-0.90.31-2.900.018[-1.6, -0.2]
DiD (δ\delta)-1.40.54-2.590.031[-2.6, -0.2]

R2=0.218R^2 = 0.218, n=120n = 120 monthly observations.

Step 3: Interpretation

The minimum wage increase in State A reduced fast-food employment by an estimated 1.4 percentage points (p=0.031p = 0.031, 95% CI: [2.6,0.2][-2.6, -0.2] pp). This represents a 1.4/72.4=1.9%1.4/72.4 = 1.9\% relative reduction from the pre-treatment baseline.

Step 4: Pre-Trends Check (Using 2017–2018 data)

Using quarterly 2017–2018 data, estimate a placebo DiD treating 2018 Q1 as "post":

Placebo δ^=0.31\hat{\delta} = 0.31, SE = 0.48, p=0.524p = 0.524 → No significant pre-treatment difference in trends. Parallel trends assumption is supported.

Step 5: Visualisation Summary

Pre-period trends for both states are approximately parallel (both declining slightly). Post-2019, State A's employment declines more sharply than State B's, consistent with the minimum wage effect.


Example 2: TWFE — Effect of Broadband Access on Business Formation

Research Question: Did broadband internet access (treated when broadband penetration > 50%) increase the rate of new business formation across US counties, 2000–2010?

Data: Annual panel of n=3,142n = 3{,}142 counties, T=11T = 11 years (N=34,562N = 34{,}562 observations); outcome: log business formation rate; treatment: broadband penetration indicator.

Step 1: TWFE Regression

ln(BusinessRateit)=αi+λt+δBroadbandit+β1ln(Populationit)+β2Unemploymentit+ϵit\ln(\text{BusinessRate}_{it}) = \alpha_i + \lambda_t + \delta \cdot \text{Broadband}_{it} + \beta_1\ln(\text{Population}_{it}) + \beta_2\text{Unemployment}_{it} + \epsilon_{it}

Step 2: Results

VariableCoefficientCluster-Robust SE (County)t3141t_{3141}pp
Broadband (DiD)0.08410.02143.93< 0.001
ln(Population)0.12430.03813.260.001
Unemployment-0.01820.0051-3.57< 0.001
County FE✅ (3,142 dummies)
Year FE✅ (11 dummies)

Within R2=0.312R^2 = 0.312, N=34,562N = 34{,}562.

Step 3: Interpretation

Broadband internet access increases the log business formation rate by 0.0841, corresponding to an e0.08411=8.8%e^{0.0841} - 1 = 8.8\% increase in business formation. The effect is highly significant (p<0.001p < 0.001) after controlling for county and year fixed effects and time-varying population and unemployment controls.

Standardised effect:

dDiD=0.0841spre=0.08410.312=0.27d_{DiD} = \frac{0.0841}{s_{pre}} = \frac{0.0841}{0.312} = 0.27

A small-to-medium standardised effect.

Step 4: Event Study

Estimating event study coefficients for 3 years before and 5 years after treatment adoption:

Period (kk)δ^k\hat{\delta}_kSEppSignificant?
k=3k = -30.0110.0180.543No
k=2k = -2-0.0080.0150.591No
k=1k = -1(reference = 0)
k=0k = 00.0410.0190.031Yes
k=1k = 10.0720.0230.002Yes
k=2k = 20.0840.024< 0.001Yes
k=3k = 30.0910.026< 0.001Yes
k=4k = 40.0880.0270.001Yes
k=5k = 50.0830.0290.004Yes

Pre-treatment coefficients: Joint F2,3141=0.47F_{2, 3141} = 0.47, p=0.624p = 0.624No pre-trends. Post-treatment effects ramp up over 2–3 years and then stabilise — consistent with gradual adoption and business formation.


Example 3: Staggered DiD — Effect of Paid Family Leave Policies on Female Labour Force Participation

Research Question: Did the adoption of state-level paid family leave (PFL) policies affect female labour force participation (FLFP) across US states, with different states adopting at different times (2004–2016)?

Data: Annual panel of 50 states, 2000–2020; outcome: FLFP rate (%); treatment: PFL adoption indicator. 12 states adopt PFL at different times; 38 states never adopt (control).

Step 1: TWFE Estimate (Standard)

δ^TWFE=1.82\hat{\delta}_{TWFE} = 1.82 pp, SE = 0.74, p=0.015p = 0.015.

Step 2: Bacon Decomposition

Comparison TypeWeight2×2 DiD Estimate
Early adopters vs. Never treated0.412.31
Late adopters vs. Never treated0.282.04
Early vs. Late (early as treated)0.221.41
Late vs. Early (late as treated)0.090.63

The decomposition reveals that 31% of the TWFE estimate comes from comparing early vs. late adopters — using already-treated states as controls. The "Late vs. Early" comparison (δ^=0.63\hat{\delta} = 0.63) is notably smaller, suggesting heterogeneous treatment effects across adoption cohorts.

Step 3: Callaway-Sant'Anna Robust Estimator

Computing ATT(g,t)ATT(g, t) for each adoption cohort gg and time tt:

Cohort (First Treated Year gg)ncohortn_{cohort}Average Post-Treatment ATT
2004 (California)13.21 pp
2008 (New Jersey)12.84 pp
2013 (Rhode Island)12.41 pp
2016 (New York)11.98 pp
Other early adopters (2004–2008)42.68 pp
Other late adopters (2009–2016)51.72 pp

Aggregated average ATT (Callaway-Sant'Anna): 2.31 pp (SE = 0.82, p=0.005p = 0.005).

Observation: The Callaway-Sant'Anna estimate (2.31 pp) is larger than the TWFE estimate (1.82 pp), and the discrepancy is explained by the downward-biasing effect of using already-treated states as controls in the "Late vs. Early" TWFE comparison.

Step 4: Dynamic Effects (Event Study)

Event Time kkCS EstimateSE95% CI
3-3-0.120.28[-0.67, 0.43]
2-20.080.24[-0.39, 0.55]
1-1(reference)
001.210.41[0.41, 2.01]
+1+12.180.54[1.12, 3.24]
+2+22.410.61[1.21, 3.61]
+3+32.580.68[1.25, 3.91]

Pre-trends test: F2,49=0.14F_{2, 49} = 0.14, p=0.869p = 0.869No pre-trends. Effects increase over the first two years post-adoption and stabilise, suggesting that businesses and workers gradually adjust to the new policy.

Conclusion: PFL policies increase female labour force participation by approximately 2.3 percentage points on average (Callaway-Sant'Anna robust estimate). The TWFE estimate is downward biased by about 0.5 pp due to heterogeneous treatment effects across adoption cohorts. Effects materialise immediately and grow slightly over the first two years.


Example 4: Triple Differences — Effect of Health Insurance Expansion on Hospital Admissions

Research Question: Did Medicaid expansion under the ACA increase hospital admissions for low-income adults (the eligible group) compared to higher-income adults (the ineligible group), in expansion vs. non-expansion states?

Data: State-year panel; outcome: hospitalisation rate per 1,000 adults; three dimensions: State (expansion vs. non-expansion), Year (pre-2014 vs. post-2014), Income group (low-income eligible vs. higher-income ineligible).

Triple Differences Regression:

Hospsyt=αsy+αst+αyt+δ(Exps×Posty×LowInct)+ϵsyt\text{Hosp}_{syt} = \alpha_{sy} + \alpha_{st} + \alpha_{yt} + \delta \cdot (\text{Exp}_s \times \text{Post}_y \times \text{LowInc}_t) + \epsilon_{syt}

Where all two-way interactions are absorbed by the appropriate two-way fixed effects.

CoefficientEstimateSEpp
DDD (δ\delta)12.43.21< 0.001

The DDD estimate suggests that Medicaid expansion increased hospitalisation rates for low-income adults (who became eligible) by 12.4 per 1,000 relative to higher-income adults in expansion states, and relative to the same income groups in non-expansion states. This controls for any general time trends in hospitalisation, any state-specific income disparities, and any common shifts in healthcare utilisation across income groups.


18. Common Mistakes and How to Avoid Them

Mistake 1: Failing to Use Cluster-Robust Standard Errors

Problem: Using OLS standard errors (or even heteroscedasticity-robust HC standard errors) in a DiD design, resulting in severely underestimated standard errors and spuriously small p-values. DiD residuals are almost always serially correlated within units across time, and within groups (e.g., states) across individuals.
Solution: Always cluster standard errors at the level of treatment assignment. For state-level policies, cluster at the state level. If the number of clusters is small (G<30G < 30), use the wild cluster bootstrap instead of asymptotic cluster-robust SEs.

Mistake 2: Ignoring Pre-Trends

Problem: Reporting a significant DiD estimate without testing or reporting the parallel trends assumption, leaving the identification assumption entirely unvalidated and the results unconvincing.
Solution: Always conduct and report the event study pre-trends test. Plot the event study coefficients with confidence bands for at least 2–3 pre-treatment periods. Report the joint FF-test for pre-period coefficients and discuss any concerning patterns, even if not statistically significant.

Mistake 3: Including Bad Controls (Post-Treatment Outcomes)

Problem: Including time-varying covariates that are themselves caused by the treatment (mediators or "bad controls") — e.g., including health insurance status as a covariate when the treatment is a health policy that affects insurance. This absorbs part of the treatment effect through the covariate, biasing δ^\hat{\delta} toward zero.
Solution: Only include covariates that are predetermined (determined before treatment) or plausibly unaffected by the treatment. When in doubt, report the model with and without the covariate and check sensitivity.

Mistake 4: Applying Standard TWFE to Staggered Designs Without Checking

Problem: Using the standard TWFE estimator in a staggered design with heterogeneous treatment effects, obtaining a potentially biased or sign-reversed DiD estimate without investigating its composition.
Solution: Always run the Bacon decomposition for staggered designs to understand what the TWFE estimate represents. If heterogeneous treatment effects are plausible, supplement with (or switch to) a robust estimator: Callaway-Sant'Anna, Sun-Abraham, or Borusyak-Jaravel-Spiess. Report both for transparency.

Mistake 5: Confusing the ATT with the ATE

Problem: Interpreting the DiD estimate as the Average Treatment Effect (ATE) — the effect averaged over all units — when DiD actually identifies the ATT (Average Treatment Effect on the Treated).
Solution: Be precise in reporting: DiD estimates the ATT — the effect for the treated units specifically. The ATE (which would include the effect on control units) is not identified by DiD unless additional assumptions (or different estimators) are invoked.

Mistake 6: Treating the Pre-Trends Test as Definitive

Problem: Passing the pre-trends test (no significant pre-treatment coefficients) and concluding that parallel trends definitely holds in the post-period. This overstates the confidence in identification.
Solution: The pre-trends test is supportive evidence, not proof. Pre-treatment parallel trends do not guarantee post-treatment parallel trends. Supplement with theoretical arguments for why the groups would have trended similarly, with placebo tests, and with Rambachan-Roth sensitivity analysis. Be honest about residual uncertainty.

Mistake 7: Selecting the Control Group Retrospectively Based on Pre-Trends

Problem: Searching across many potential control groups and selecting the one that shows the most parallel pre-trends with the treated group. This "pre-trend matching" leads to data mining, overfitting, and inflated confidence in parallel trends.
Solution: Specify the control group a priori based on institutional knowledge and theoretical comparability, not post-hoc based on pre-trend patterns. If multiple control groups are plausible, report results for all of them and assess robustness.

Mistake 8: Ignoring Anticipation Effects

Problem: Treating the period immediately before the official treatment start as a clean pre-period, when in fact treated units may have already begun responding in anticipation of treatment.
Solution: Test for anticipation effects by examining whether the period just before treatment (k=1k = -1) shows a significant coefficient in the event study. Consider extending the "pre-period" to exclude periods potentially affected by anticipation. If anticipation is present, adjust the model (e.g., redefine the treatment as starting earlier).

Mistake 9: Using Levels When Parallel Trends Holds Only in Logs

Problem: Estimating the DiD in levels when treated and control groups are growing at the same rate (multiplicative parallel trends), rather than by the same amount (additive parallel trends). This produces spurious pre-trends on the levels scale.
Solution: Inspect raw pre-treatment trends in both levels and logs. If trends are parallel in logs but not levels, use the log outcome. Report the pre-trends test for the chosen specification and note the functional form assumption.

Mistake 10: Not Reporting the Full Event Study

Problem: Reporting only the single pooled DiD coefficient when the treatment has dynamic effects over multiple time periods, losing information about the timing, ramp-up, and persistence of the effect.
Solution: Always produce and report the full event study figure with pre- and post-treatment coefficients and confidence bands. The event study provides far more information than a single coefficient and is essential for assessing both identification (pre-trends) and the nature of the effect (dynamic patterns).


19. Troubleshooting

IssueLikely CauseSolution
DiD coefficient is unexpected signParallel trends violation; wrong treatment assignment; data coding errorCheck treatment coding; inspect raw trends plots; run event study for pre-trend evidence
Very large standard errorsToo few clusters (G<510G < 5-10); very small treatment group; collinearityUse wild cluster bootstrap; report exact p-value (randomisation inference); check VIF
Significant pre-trends (parallel trends violated)Systematic pre-existing trend differences; selection into treatment based on trendsInclude unit-specific linear trends; use conditional parallel trends (covariates); consider synthetic control or matching
Event study coefficients show large dip at k=1k=-1Anticipation effects; treatment actually starts earlier than recordedExtend pre-period; redefine treatment onset; check institutional knowledge
TWFE gives negative estimate despite positive raw DiDHeterogeneous treatment effects in staggered design (Bacon negative weights)Run Bacon decomposition; use Callaway-Sant'Anna or other robust staggered estimator
Never-treated group is very smallLimited comparison group; potential control group contaminationExpand the never-treated control; use only timing-variation comparisons; consider synthetic control
Pre-trends test passes but Rambachan-Roth bounds are wideWeak pre-trend evidence; short pre-period; noisy outcomeCollect more pre-periods; use better outcome measure; report sensitivity analysis bounds prominently
Fixed effects absorb all treatment variationTreatment is perfectly collinear with unit or time FE; no within-unit variation in DitD_{it}Check whether treatment is time-varying; verify panel structure; ensure DitD_{it} varies within units
Coefficient estimates change dramatically with different control setsModel is sensitive to covariate specification; potential bad controls includedReport all specifications; identify and exclude bad controls (post-treatment variables); use doubly robust estimator
Wild cluster bootstrap gives pp-value of 1Asymmetric bootstrap distribution; very few clusters; extreme outlier clusterIncrease bootstrap replications; use refinement bootstrap; investigate influential clusters; consider randomisation inference
Log outcome produces extreme predictionsZeros in the outcome variable (log undefined)Use inverse hyperbolic sine (IHS) transformation: arcsinh(Y)=ln(Y+Y2+1)\text{arcsinh}(Y) = \ln(Y + \sqrt{Y^2+1}); or use Poisson TWFE for count outcomes
DDD estimate is implausibly largeInteraction with ineligible group not clean; compositional changesVerify eligibility classification; test for treatment effects on ineligible group (should be zero); check for spillovers
Staggered design: Callaway-Sant'Anna gives very wide CIsSmall cohort sizes; few pre-periods for some cohorts; sparse dataReport cohort-level ATTs separately; aggregate with caution; increase sample
Placebo treatment group shows significant effectSUTVA violation (spillovers); contamination of control groupInvestigate potential spillover mechanisms; redefine control group to exclude exposed units; report sensitivity to control group definition

20. Quick Reference Cheat Sheet

Core DiD Formulas

FormulaDescription
δ^DiD=(Yˉ1,1Yˉ1,0)(Yˉ0,1Yˉ0,0)\hat{\delta}_{DiD} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - (\bar{Y}_{0,1} - \bar{Y}_{0,0})2×2 DiD estimator
Yit=α+βTi+γPt+δ(Ti×Pt)+ϵitY_{it} = \alpha + \beta \cdot T_i + \gamma \cdot P_t + \delta \cdot (T_i \times P_t) + \epsilon_{it}2×2 DiD regression
Yit=αi+λt+δDit+XitTβ+ϵitY_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}TWFE DiD
Y¨it=YitYˉiYˉt+Yˉ\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{Y}Within-transformation (demeaning)
δ^TWFE=itD¨itY¨it/itD¨it2\hat{\delta}_{TWFE} = \sum_{it}\ddot{D}_{it}\ddot{Y}_{it} / \sum_{it}\ddot{D}_{it}^2TWFE estimator
Yit=αi+λt+k1δk1[tTi=k]Di+ϵitY_{it} = \alpha_i + \lambda_t + \sum_{k \neq -1}\delta_k \cdot \mathbf{1}[t-T_i^*=k]D_i + \epsilon_{it}Event study regression
t=δ^/SEcluster(δ^)tG1t = \hat{\delta}/SE_{cluster}(\hat{\delta}) \sim t_{G-1}Test statistic (clustered)
δ^±tα/2,G1×SEcluster(δ^)\hat{\delta} \pm t_{\alpha/2, G-1} \times SE_{cluster}(\hat{\delta})Confidence interval
dDiD=δ^/spred_{DiD} = \hat{\delta}/s_{pre}Standardised effect size
%Δ=(δ^/Yˉtreated,pre)×100\%\Delta = (\hat{\delta}/\bar{Y}_{treated,pre}) \times 100Percent change effect

2×2 DiD Table Template

Pre (t=0t=0)Post (t=1t=1)Difference
Treated (D=1D=1)Yˉ1,0\bar{Y}_{1,0}Yˉ1,1\bar{Y}_{1,1}Yˉ1,1Yˉ1,0\bar{Y}_{1,1}-\bar{Y}_{1,0}
Control (D=0D=0)Yˉ0,0\bar{Y}_{0,0}Yˉ0,1\bar{Y}_{0,1}Yˉ0,1Yˉ0,0\bar{Y}_{0,1}-\bar{Y}_{0,0}
DifferenceYˉ1,0Yˉ0,0\bar{Y}_{1,0}-\bar{Y}_{0,0}Yˉ1,1Yˉ0,1\bar{Y}_{1,1}-\bar{Y}_{0,1}δ^DiD\hat{\delta}_{DiD}

Standard Error Selection Guide

SettingRecommended SEWhen
Large GG (G>50G > 50)Cluster-robust (HC1)Standard panel DiD
Moderate GG (305030-50)Cluster-robust with bias correctionBorderline case
Small GG (103010-30)Wild cluster bootstrapFew clusters
Very few GG (<10< 10)Randomisation inferenceOnly a few treated clusters
Cross-section with groupsCluster at group levelGroup-level treatment

Assumption Checklist

AssumptionHow to TestIf Violated
Parallel trendsPre-trends test (event study); placebo testsAdd covariates; unit-specific trends; Rambachan-Roth bounds
No anticipationCheck k=1k=-1 event study coefficientRedefine treatment timing; extend pre-period
SUTVA (no spillovers)Placebo treatment on nearby controls; spillover testsRedefine control group; use exclusion zones
No compositional changeCheck covariate balance across periodsUse balanced panel; control for composition
Exogenous treatment timingInstitutional knowledge; pre-trends testInstrument for timing; use conditional parallel trends
No interferenceStudy design; geographic checksSpatial correlation SEs; define clean control zones

DiD Estimator Comparison for Staggered Designs

EstimatorRobust to HeterogeneityEasy to ImplementSoftware
TWFEAny regression software
Bacon DecompositionDiagnostic onlyDataStatPro, bacondecomp (R/Stata)
Callaway-Sant'AnnaModerateDataStatPro, did (R), csdid (Stata)
Sun-AbrahamModerateDataStatPro, sunab (Stata)
de Chaisemartin-DHModerateDataStatPro, did_multiplegt (Stata)
Borusyak-Jaravel-SpiessModerateDataStatPro, did_imputation (Stata)

Effect Size Interpretation

MeasureFormulaUnitInterpretation
DiD coefficient (δ^\hat{\delta})Direct estimateSame as YYAbsolute change in YY
Log DiDeδ^1e^{\hat{\delta}} - 1% changePercentage change in YY
Percent changeδ^/Yˉtreated,pre\hat{\delta}/\bar{Y}_{treated,pre}%Change relative to baseline
Cohen's ddδ^/spre\hat{\delta}/s_{pre}SD unitsStandardised effect
NNT$1/\hat{\delta}$

Model Specification Checklist

FeatureRecommendationNotes
Unit fixed effects✅ Always includeRemoves time-invariant confounders
Time fixed effects✅ Always includeRemoves common time shocks
Cluster-robust SEs✅ Always useCluster at treatment assignment level
Pre-trends test✅ Always reportEvent study with 2\geq 2 pre-periods
Bacon decomposition✅ For staggeredDiagnose TWFE composition
Robust staggered estimator✅ For staggeredAt least one robust estimator
Placebo tests✅ ReportPlacebo time, group, or outcome
Covariate balance table✅ ReportPre-treatment balance check
Unit-specific trends⚠️ Use with cautionOnly if strong pre-trend concern
Binary outcome LPM✅ PreferredMore tractable than probit DiD
Log outcome✅ If proportional trendsCheck functional form

Key Identification Assumptions

AssumptionFormal StatementTestable?Diagnostic
Parallel trendsE[ΔY(0)D=1]=E[ΔY(0)D=0]E[\Delta Y(0) \mid D=1] = E[\Delta Y(0) \mid D=0]Partially (pre-period only)Event study pre-trends
No anticipationYit(1)=Yit(0)Y_{it}(1) = Y_{it}(0) for t<Tit < T_i^*Yes (pre-period k=1k=-1)Check δ^1\hat{\delta}_{-1}
SUTVANo spillovers; one version of treatmentPartiallyGeographic placebo; excluded-zone test
Exogenous timingTiT_i^* not determined by anticipated outcomesPartiallyPre-trend test; institutional knowledge
Stable compositionSample composition unchanged by treatmentYesCovariate balance across periods

This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Difference-in-Differences Models using the DataStatPro application. For further reading, consult Angrist & Pischke's "Mostly Harmless Econometrics" (Princeton University Press, 2009), Callaway & Sant'Anna's "Difference-in-Differences with Multiple Time Periods" (Journal of Econometrics, 2021), Roth et al.'s "What's Trending in Difference-in-Differences?" (Journal of Econometrics, 2023), or Goodman-Bacon's "Difference-in-Differences with Variation in Treatment Timing" (Journal of Econometrics, 2021). For feature requests or support, contact the DataStatPro team.