Difference-in-Differences Models: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of Difference-in-Differences (DiD) estimation all the way through advanced extensions, assumption testing, heterogeneity analysis, and practical usage within the DataStatPro application. Whether you are a complete beginner or an experienced analyst, this guide is structured to build your understanding step by step.

Prerequisites and Background Concepts
What is the Difference-in-Differences Design?
The Mathematical Framework
The Parallel Trends Assumption
Identification and Causal Inference
Standard DiD Estimation
Hypothesis Testing and Inference
Effect Size Measures
Model Fit and Evaluation
Diagnostics and Assumption Testing
Extensions: Staggered DiD and Multiple Time Periods
Extensions: Heterogeneous Treatment Effects
Extensions: Continuous and Fuzzy Treatment
Covariates and Controls in DiD
Using the DiD Component
Computational and Formula Details
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into Difference-in-Differences, it is helpful to be familiar with the following foundational concepts. Do not worry if you are not — each concept is briefly explained here.

1.1 Counterfactuals and the Potential Outcomes Framework

The potential outcomes framework (Rubin Causal Model) is the conceptual foundation of causal inference. For each unit $i$ and time period $t$ , define:

$Y_{it}(1)$ : The potential outcome that would occur if unit $i$ received treatment at time $t$ .
$Y_{it}(0)$ : The potential outcome that would occur if unit $i$ did not receive treatment at time $t$ .

The individual treatment effect for unit $i$ at time $t$ is:

$\tau_{it} = Y_{it}(1) - Y_{it}(0)$

The fundamental problem of causal inference: We can never observe both $Y_{it}(1)$ and $Y_{it}(0)$ for the same unit at the same time. We observe only one — the realised outcome. The unobserved outcome is called the counterfactual.

The Average Treatment Effect on the Treated (ATT) is:

$ATT = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1]$

DiD is one of the most widely used methods for estimating the ATT using a comparison group to approximate the unobserved counterfactual.

1.2 Selection Bias

Selection bias arises when the assignment to treatment is not random — treated and control units differ systematically in ways that also affect the outcome. A naïve comparison of treated vs. untreated units confounds the treatment effect with pre-existing differences:

$E[Y \mid D=1] - E[Y \mid D=0] = \underbrace{ATT}_{\text{causal effect}} + \underbrace{E[Y(0) \mid D=1] - E[Y(0) \mid D=0]}_{\text{selection bias}}$

DiD removes selection bias due to time-invariant unobserved differences between groups.

1.3 Panel Data

Panel data (also called longitudinal data) consists of observations on the same units (individuals, firms, regions, countries) over multiple time periods. It has two dimensions: a cross-sectional dimension ( $n$ units) and a time dimension ( $T$ periods).

Panel data are written as $\{Y_{it}, D_{it}, \mathbf{X}_{it}\}$ for unit $i = 1, \dots, n$ and period $t = 1, \dots, T$ .

DiD most naturally arises in a panel data context, though it can also be implemented with repeated cross-sections.

1.4 Fixed Effects

A unit fixed effect $\alpha_i$ captures all time-invariant characteristics of unit $i$ — both observed and unobserved — that affect the outcome. By including fixed effects in a regression, we effectively compare each unit to itself over time, removing all time-invariant confounders.

A time fixed effect $\lambda_t$ captures factors that affect all units equally at time $t$ — common macroeconomic conditions, seasonal patterns, or universal policy changes.

1.5 Ordinary Least Squares (OLS) Regression

OLS regression finds the linear relationship between predictors $\mathbf{X}$ and outcome $Y$ by minimising the sum of squared residuals:

$\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$

DiD is typically implemented as an OLS regression with specific interaction terms and fixed effects, so familiarity with OLS is essential.

1.6 Treatment Assignment and Natural Experiments

A natural experiment is a situation in which the assignment of units to treatment and control conditions is determined by some external, exogenous factor — rather than by the researcher or by the units themselves. Natural experiments approximate the conditions of a randomised controlled trial. Common examples:

A policy reform that applies to some regions but not others.
A law change that takes effect at a specific date.
Geographic boundaries that determine eligibility for a programme.
Lotteries or other chance-based assignment mechanisms.

DiD is the workhorse estimator for natural experiments with a pre-period and a post-period.

2. What is the Difference-in-Differences Design?

2.1 The Core Idea

Difference-in-Differences (DiD) is a quasi-experimental research design that estimates the causal effect of a treatment or policy by comparing the change over time in the outcome for a treated group to the change over time in the outcome for an untreated (control) group.

The intuition is straightforward:

$\hat{\tau}_{DiD} = \underbrace{(\bar{Y}_{treated,post} - \bar{Y}_{treated,pre})}_{\text{before-after change in treated group}} - \underbrace{(\bar{Y}_{control,post} - \bar{Y}_{control,pre})}_{\text{before-after change in control group}}$

The first difference (within the treated group, over time) removes time-invariant differences between treated units and the rest of the world.
The second difference (between treated and control, within the same time period) removes common time trends that affect both groups equally.

By taking the difference of these two differences, DiD isolates the treatment effect from:

Pre-existing level differences between treated and control groups.
Common time trends affecting both groups equally.

2.2 The 2×2 DiD Setup

The simplest (canonical) DiD design has:

Two groups: A treated group ( $D_i = 1$ ) and a control group ( $D_i = 0$ ).
Two time periods: A pre-treatment period ( $t = 0$ ) and a post-treatment period ( $t = 1$ ).
One treatment: Applied only to the treated group, only in the post-treatment period.

The canonical 2×2 DiD table of group-period means:

	Pre-Period ( $t = 0$ )	Post-Period ( $t = 1$ )	Difference (Post − Pre)
Treated ( $D=1$ )	$\bar{Y}_{1,0}$	$\bar{Y}_{1,1}$	$\bar{Y}_{1,1} - \bar{Y}_{1,0}$
Control ( $D=0$ )	$\bar{Y}_{0,0}$	$\bar{Y}_{0,1}$	$\bar{Y}_{0,1} - \bar{Y}_{0,0}$
Difference (Treated − Control)	$\bar{Y}_{1,0} - \bar{Y}_{0,0}$	$\bar{Y}_{1,1} - \bar{Y}_{0,1}$	$\hat{\tau}_{DiD}$

The DiD estimate:

$\hat{\tau}_{DiD} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - (\bar{Y}_{0,1} - \bar{Y}_{0,0})$

2.3 Real-World Applications

DiD is one of the most widely applied methods in empirical social science, economics, public health, and policy evaluation:

Labour Economics: Card & Krueger (1994) — Effect of New Jersey's minimum wage increase on fast-food employment, using Pennsylvania as the control group.
Health Policy: Effect of the Affordable Care Act (ACA) Medicaid expansion on health insurance coverage and health outcomes, comparing expansion to non-expansion states.
Education Policy: Effect of class size reductions (STAR experiment) or school voucher programmes on student achievement.
Environmental Economics: Effect of environmental regulations (e.g., Clean Air Act) on air pollution and health outcomes.
Finance: Effect of financial crises, banking regulations, or central bank interventions on lending and economic activity.
Criminology: Effect of policing policies, incarceration changes, or gun laws on crime rates.
Public Health: Effect of vaccination campaigns, smoking bans, or lockdown policies on health outcomes.
Development Economics: Effect of microcredit programmes, cash transfers, or infrastructure investments on household welfare.

2.4 DiD vs. Other Quasi-Experimental Methods

Method	Key Assumption	When to Use
DiD	Parallel trends in absence of treatment	Panel data or repeated cross-sections; policy timing varies
Regression Discontinuity (RD)	No manipulation around the cutoff	Assignment determined by a continuous threshold
Instrumental Variables (IV)	Instrument relevance and exclusion	Endogenous treatment with a valid instrument
Synthetic Control	Weighted average of controls matches treated pre-trend	Single treated unit; many potential controls
Event Study	No pre-trends; clean identification window	Multiple time periods around treatment timing
Propensity Score Matching	Conditional independence (selection on observables)	Rich covariate data; no unobservable confounders

3. The Mathematical Framework

3.1 The Canonical 2×2 DiD Model

The standard regression formulation of the 2×2 DiD model is:

$Y_{it} = \alpha + \beta \cdot \text{Treated}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{Treated}_i \times \text{Post}_t) + \epsilon_{it}$

Where:

$Y_{it}$ = outcome for unit $i$ at time $t$ .
$\text{Treated}_i \in \{0, 1\}$ = indicator for whether unit $i$ belongs to the treated group (time-invariant).
$\text{Post}_t \in \{0, 1\}$ = indicator for whether time period $t$ is the post-treatment period (unit-invariant).
$\text{Treated}_i \times \text{Post}_t$ = the DiD interaction term.
$\alpha$ = baseline mean for the control group in the pre-period.
$\beta$ = pre-treatment difference in levels between treated and control groups (selection bias term).
$\gamma$ = common time trend from pre to post period (time effect for the control group).
$\delta$ = the DiD estimator — the causal effect of the treatment.
$\epsilon_{it}$ = idiosyncratic error term.

Predicted cell means from the regression:

	Pre ( $\text{Post} = 0$ )	Post ( $\text{Post} = 1$ )	Difference
Control ( $\text{Treated} = 0$ )	$\alpha$	$\alpha + \gamma$	$\gamma$
Treated ( $\text{Treated} = 1$ )	$\alpha + \beta$	$\alpha + \beta + \gamma + \delta$	$\gamma + \delta$
Difference (T − C)	$\beta$	$\beta + \delta$	$\boldsymbol{\delta}$

3.2 The Two-Way Fixed Effects (TWFE) Model

The most general and widely used DiD regression extends the canonical model to panel data with unit fixed effects and time fixed effects:

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$

Where:

$\alpha_i$ = unit (entity) fixed effect — absorbs all time-invariant unit characteristics.
$\lambda_t$ = time fixed effect — absorbs all period-specific shocks common to all units.
$D_{it} \in \{0, 1\}$ = treatment indicator (= 1 if unit $i$ is treated at time $t$ ).
$\mathbf{X}_{it}$ = vector of time-varying controls.
$\delta$ = the DiD estimator of the Average Treatment Effect on the Treated (ATT).

Key insight: The TWFE model is the natural extension of the 2×2 DiD to multiple units and multiple time periods. The coefficient $\delta$ on the treatment indicator $D_{it}$ is the DiD estimate once unit and time fixed effects are included.

3.3 The Within-Estimator Interpretation

The TWFE estimator is equivalent to the within estimator (demeaning). Define:

$\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{Y}$

Where $\bar{Y}_i = T^{-1}\sum_t Y_{it}$ (unit mean), $\bar{Y}_t = n^{-1}\sum_i Y_{it}$ (time mean), and $\bar{Y} = (nT)^{-1}\sum_{it} Y_{it}$ (grand mean). Similarly define $\ddot{D}_{it}$ and $\ddot{\mathbf{X}}_{it}$ .

The TWFE estimator is:

$\hat{\delta}_{TWFE} = \frac{\sum_{it} \ddot{D}_{it} \ddot{Y}_{it}}{\sum_{it} \ddot{D}_{it}^2}$

This identifies $\delta$ from within-unit, within-time variation in treatment status.

3.4 Potential Outcomes Representation

In the potential outcomes framework, the DiD estimand is:

$\delta_{DiD} = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1, t = 1]$

This is the ATT in the post-treatment period — the average effect on the treated units of receiving treatment.

The DiD identification strategy replaces the unobserved counterfactual $E[Y_{it}(0) \mid D_i = 1, t = 1]$ with the observable:

$E[Y_{it}(0) \mid D_i = 1, t = 1] = E[Y_{it}(0) \mid D_i = 1, t = 0] + \underbrace{\left(E[Y_{it}(0) \mid D_i = 0, t = 1] - E[Y_{it}(0) \mid D_i = 0, t = 0]\right)}_{\text{time trend from control group}}$

This is exactly the parallel trends assumption — the counterfactual trend for the treated group equals the observed trend for the control group.

4. The Parallel Trends Assumption

4.1 The Assumption Stated

The parallel trends assumption (also called the common trends or parallel paths assumption) is the key identifying assumption of DiD:

In the absence of treatment, the average outcome for the treated group would have followed the same trend as the average outcome for the control group.

Formally:

$E[Y_{it}(0) \mid D_i = 1, t = 1] - E[Y_{it}(0) \mid D_i = 1, t = 0] = E[Y_{it}(0) \mid D_i = 0, t = 1] - E[Y_{it}(0) \mid D_i = 0, t = 0]$

Crucially: This assumption is about the counterfactual — what would have happened to the treated group had it not been treated. It is fundamentally untestable with post-treatment data, but can be supported with:

Pre-treatment trend evidence (parallel pre-trends test).
Institutional knowledge about why the groups are similar in their trends.
Placebo tests.

4.2 Visualising Parallel Trends

The canonical DiD diagram plots the outcome over time for both groups:

Outcome
  |
  |             ● Treated (actual)
  |           /
  |         /   ↑ δ = Treatment Effect
  |       /   /
  |     / ---/ ← Treated counterfactual (unobserved)
  |   /   /
  |  ●   /
  | / \ /
  |/   ● Control (observed, serves as counterfactual trend)
  |
  +--------+--------→ Time
        Pre       Post
           ↑
        Treatment
        begins

The treatment effect $\delta$ is the vertical distance between the actual treated outcome and the counterfactual treated outcome in the post-period. The control group's observed trajectory is the counterfactual trend.

4.3 When is Parallel Trends Plausible?

Parallel trends is more plausible when:

Treatment and control groups are similar in observed characteristics and pre-treatment trends.
Treatment is determined by a sharp, exogenous rule (geographic, legislative, administrative).
The treatment and control groups come from the same broad population (e.g., neighbouring counties, similar industries, adjacent cohorts).
There are no other contemporaneous changes that differentially affect treated and control groups.

Parallel trends is less plausible when:

Groups are systematically different in ways related to the outcome trajectory (e.g., high-income vs. low-income countries).
Treatment is self-selected based on anticipated trends (e.g., firms that chose to adopt a technology because they expected growth).
There are anticipation effects — units change behaviour before the treatment officially starts.

4.4 Parallel Trends in Different Functional Forms

The parallel trends assumption is not scale-invariant. It may hold on the levels scale but not on the logarithmic scale (or vice versa):

Levels scale: $\Delta E[Y(0) \mid D=1] = \Delta E[Y(0) \mid D=0]$ (additive parallel trends).
Log scale: $\Delta E[\ln Y(0) \mid D=1] = \Delta E[\ln Y(0) \mid D=0]$ (multiplicative/proportional parallel trends).

The choice of outcome transformation (levels, logs, rates) should be guided by theory about the nature of the treatment effect and the plausibility of parallel trends.

4.5 Conditional Parallel Trends

The parallel trends assumption may only hold conditional on observable covariates $\mathbf{X}_i$ :

$E[Y_{it}(0) \mid D_i = 1, \mathbf{X}_i, t = 1] - E[Y_{it}(0) \mid D_i = 1, \mathbf{X}_i, t = 0] = E[Y_{it}(0) \mid D_i = 0, \mathbf{X}_i, t = 1] - E[Y_{it}(0) \mid D_i = 0, \mathbf{X}_i, t = 0]$

When unconditional parallel trends is implausible, including covariates (Section 14) can restore the assumption by controlling for observable differences in time trends between groups.

5. Identification and Causal Inference

5.1 What DiD Identifies

Under the parallel trends assumption, the DiD regression coefficient $\delta$ identifies the Average Treatment Effect on the Treated (ATT) in the post-treatment period:

$\hat{\delta}_{DiD} \xrightarrow{p} ATT = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1, t \geq t_0]$

Where $t_0$ is the treatment onset period.

Not identified by DiD:

The Average Treatment Effect (ATE) — the effect averaged over both treated and control units.
The effect on the control group had it been treated.
The long-run effect if treatment effects change over time (addressed in event study designs).

5.2 The No Anticipation Assumption

A supplementary assumption is no anticipation: treated units do not change their behaviour in the pre-treatment period in anticipation of receiving treatment.

Formally, for all $t < t_0$ :

$Y_{it}(1) = Y_{it}(0) \quad \text{for all } i \text{ with } D_i = 1$

Why it matters: If treated units begin changing before the treatment officially starts (e.g., firms start investing as soon as a subsidy is announced), the pre-period outcome already reflects anticipatory responses. This violates the parallel trends assumption in the pre-period and biases the DiD estimator.

How to check: Pre-treatment placebo tests (event study coefficients for pre-period leads should be near zero).

5.3 The Stable Unit Treatment Value Assumption (SUTVA)

SUTVA has two components:

No interference: The treatment status of unit $i$ does not affect the potential outcomes of unit $j$ (no spillovers, general equilibrium effects, or cross-unit contamination).
No hidden versions of treatment: There is only one version of the treatment; all treated units receive the same treatment.

Violations: Spillovers arise when treatment of some units affects control units (e.g., a local employment policy in one area displaces workers to other areas, affecting those areas' outcomes). SUTVA violations bias the DiD estimator.

5.4 Exogeneity of Treatment Timing

In staggered DiD designs (Section 11), a key requirement is that the timing of treatment adoption is exogenous — not determined by pre-existing trends or anticipation of future outcomes. If units that were doing well adopt treatment earlier, the DiD estimator is biased.

5.5 DiD as a Special Case of the Fixed Effects Estimator

The 2×2 DiD estimator is numerically equivalent to the first-differences estimator in a two-period panel:

$\Delta Y_i = Y_{i,1} - Y_{i,0} = \delta \cdot \Delta D_i + \Delta \epsilon_i$

Where $\Delta D_i = D_{i,1} - D_{i,0} = 1$ for treated units and $0$ for control units. OLS on this first-differenced equation produces $\hat{\delta} = (\bar{\Delta Y}_{treated} - \bar{\Delta Y}_{control})$ — exactly the DiD formula.

6. Standard DiD Estimation

6.1 OLS Estimation of the Canonical DiD

The 2×2 DiD regression:

$Y_{it} = \alpha + \beta \cdot \text{Treated}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{Treated}_i \times \text{Post}_t) + \epsilon_{it}$

is estimated by OLS. The DiD coefficient:

$\hat{\delta} = (\bar{Y}_{1,1} - \bar{Y}_{0,1}) - (\bar{Y}_{1,0} - \bar{Y}_{0,0})$

This can be written as:

$\hat{\delta} = \hat{\beta}_{OLS} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}$

Where $\mathbf{X}$ contains the constant, $\text{Treated}_i$ , $\text{Post}_t$ , and their interaction $\text{Treated}_i \times \text{Post}_t$ .

6.2 TWFE Estimation with Panel Data

The TWFE estimator is obtained by including unit and time dummies (or using the within-transformation):

Using dummy variables:

$Y_{it} = \sum_{i=1}^n \alpha_i \mathbf{1}[unit = i] + \sum_{t=1}^T \lambda_t \mathbf{1}[period = t] + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$

Using the within (demeaning) transformation:

$\ddot{Y}_{it} = \delta \cdot \ddot{D}_{it} + \ddot{\mathbf{X}}_{it}^T\boldsymbol{\beta} + \ddot{\epsilon}_{it}$

Where $\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{\bar{Y}}$ and similarly for other variables.

The TWFE estimator $\hat{\delta}$ is consistent under the parallel trends assumption and strict exogeneity of treatment given fixed effects.

6.3 First Differences Estimator

An alternative to demeaning is first differencing, which subtracts the previous period's observation:

$\Delta Y_{it} = Y_{it} - Y_{i,t-1} = \lambda_t - \lambda_{t-1} + \delta(D_{it} - D_{i,t-1}) + \mathbf{X}_{it}^{*T}\boldsymbol{\beta} + \Delta\epsilon_{it}$

In a two-period model, first differences and within estimation are identical. For $T > 2$ , they differ in efficiency — first differences is more efficient when $\epsilon_{it}$ follows a random walk; within estimation is more efficient when $\epsilon_{it}$ is serially uncorrelated.

6.4 Weighted DiD

When the groups have unequal sizes or when reweighting is needed to improve comparability, a weighted DiD uses weights $w_i$ :

$\hat{\delta}_{WDiD} = \frac{\sum_i w_i D_i \Delta Y_i}{\sum_i w_i D_i} - \frac{\sum_i w_i (1-D_i) \Delta Y_i}{\sum_i w_i (1-D_i)}$

Common weighting schemes:

Population weights: Weight by group size.
Propensity score weights: Reweight control units to match the distribution of pre-treatment characteristics in the treated group (augmented inverse probability weighting — AIPW).
Variance weights: Inverse of estimated error variance for each unit.

6.5 DiD with Repeated Cross-Sections

When panel data (the same units followed over time) are unavailable, DiD can be implemented with repeated cross-sections — independent samples drawn from the same population at each time period. The DiD regression:

$Y_{ig} = \alpha + \beta \cdot G_g + \gamma \cdot T_t + \delta \cdot (G_g \times T_t) + \epsilon_{ig}$

Where:

$G_g \in \{0, 1\}$ = treated group indicator for individual $i$ in group $g$ .
$T_t \in \{0, 1\}$ = post-treatment period indicator.
$\delta$ = DiD estimator (interpreted as a change in group-period means).

The DiD estimator is valid under the assumption that the cross-sectional samples are representative of the same underlying population in each period, even though different individuals are observed.

7. Hypothesis Testing and Inference

7.1 Standard Error Choices

The choice of standard errors is critical in DiD analyses. Several options are available, with different assumptions:

7.1.1 OLS Standard Errors

$SE_{OLS}(\hat{\delta}) = \sqrt{\hat{\sigma}^2 [(\mathbf{X}^T\mathbf{X})^{-1}]_{\delta\delta}}$

Valid only under homoscedasticity and no serial correlation. Almost never appropriate for DiD — treated group observations are typically serially correlated.

7.1.2 Heteroscedasticity-Robust (HC) Standard Errors

$SE_{HC}(\hat{\delta}) = \sqrt{[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\hat{\Omega}\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}]_{\delta\delta}}$

Where $\hat{\Omega} = \text{diag}(\hat{\epsilon}_{it}^2)$ . Accounts for heteroscedasticity but not serial correlation.

7.1.3 Cluster-Robust Standard Errors

The most recommended standard errors for DiD. Clustering at the group level (e.g., state, firm, country) allows for arbitrary heteroscedasticity and serial correlation within clusters:

$SE_{cluster}(\hat{\delta}) = \sqrt{\left[(\mathbf{X}^T\mathbf{X})^{-1}\left(\sum_{g=1}^G \mathbf{X}_g^T\hat{\epsilon}_g\hat{\epsilon}_g^T\mathbf{X}_g\right)(\mathbf{X}^T\mathbf{X})^{-1}\right]_{\delta\delta}}$

Where $g$ indexes clusters, $\mathbf{X}_g$ and $\hat{\epsilon}_g$ are the design matrix and residuals for cluster $g$ .

Critical recommendation (Bertrand, Duflo & Mullainathan, 2004): Always cluster at the level of treatment assignment (e.g., state-level policy → cluster at state level). Failure to do so leads to severely underestimated standard errors and spurious significance.

7.1.4 Wild Cluster Bootstrap

When the number of clusters is small ( $G < 30$ ), cluster-robust standard errors based on asymptotic approximations can be unreliable. The wild cluster bootstrap (Cameron, Gelbach & Miller) provides more reliable inference:

Estimate the model and obtain cluster-robust residuals $\hat{\epsilon}_g$ .
For each bootstrap replication $b = 1, \dots, B$ $b = 1, \dots, B$ :
- Draw $v_g^{(b)} \in \{-1, +1\}$ with equal probability for each cluster $g$ .
- Construct bootstrap residuals $\tilde{\epsilon}_{it}^{(b)} = v_g^{(b)} \hat{\epsilon}_{it}$ .
- Form the bootstrap outcome: $Y_{it}^{*(b)} = \hat{Y}_{it} + \tilde{\epsilon}_{it}^{(b)}$ .
- Re-estimate $\hat{\delta}^{(b)}$ .
Use the distribution of $\hat{\delta}^{(b)}$ to compute p-values and confidence intervals.

Standard Error Type	When to Use	Key Assumption
OLS	Never (DiD context)	IID errors
HC (Robust)	Small $n$ , homogeneous clusters	Heteroscedastic, no serial corr.
Cluster-Robust	Standard recommendation	Within-cluster correlation allowed
Wild Cluster Bootstrap	Few clusters ( $G < 30$ )	More reliable with few clusters
Block Bootstrap	Panel data, spatial correlation	Resamples entire clusters

7.2 The Wald Test for the DiD Coefficient

The Wald test for the DiD effect tests $H_0: \delta = 0$ :

$t = \frac{\hat{\delta}}{SE(\hat{\delta})} \sim t_{n-k}$

Where $n - k$ is the residual degrees of freedom. With cluster-robust standard errors, use the $t$ -distribution with $G - 1$ degrees of freedom (where $G$ is the number of clusters):

$t = \frac{\hat{\delta}}{SE_{cluster}(\hat{\delta})} \sim t_{G-1}$

A $(1-\alpha) \times 100\%$ confidence interval for $\delta$ :

$\hat{\delta} \pm t_{\alpha/2,\, G-1} \times SE_{cluster}(\hat{\delta})$

7.3 F-Test for Joint Significance

To jointly test whether a vector of DiD coefficients is zero (e.g., in a model with multiple treatment indicators):

$H_0: \mathbf{R}\boldsymbol{\beta} = \mathbf{0}$

$F = \frac{(\mathbf{R}\hat{\boldsymbol{\beta}})^T[\mathbf{R}(\mathbf{X}^T\hat{\Omega}\mathbf{X})^{-1}\mathbf{R}^T]^{-1}(\mathbf{R}\hat{\boldsymbol{\beta}})}{q} \sim F_{q, G-q}$

Where $q$ is the number of restrictions.

7.4 Inference with Few Treated Units

A common challenge is when only a few units receive treatment (e.g., one state, two firms). In such cases:

Cluster-robust standard errors have very few clusters → unreliable asymptotic approximations.
Randomisation inference (permutation tests): Repeatedly re-assign treatment to randomly selected units and re-estimate $\hat{\delta}$ . The p-value is the fraction of placebo estimates at least as large as the observed estimate.
Synthetic control methods: Construct a weighted control unit that matches the treated unit's pre-period trajectory.

8. Effect Size Measures

8.1 The DiD Coefficient as Effect Size

The primary effect size in DiD is the DiD coefficient $\hat{\delta}$ itself. Its interpretation depends on the model specification:

Levels regression ( $Y$ in original units): $\hat{\delta}$ is the absolute change in the outcome caused by treatment (e.g., 3.2 percentage points, 500 USD, 2.1 hours).
Log outcome ( $\ln Y$ ): $\hat{\delta} \approx 100\hat{\delta}\%$ change for small values; more precisely, $(e^{\hat{\delta}} - 1) \times 100\%$ change.
Standardised outcome (mean 0, SD 1): $\hat{\delta}$ is in standard deviation units — directly comparable to Cohen's $d$ .

8.2 Percent Change Effect

When the outcome is in levels, the percent change caused by treatment is:

$\%\Delta = \frac{\hat{\delta}}{\bar{Y}_{treated,pre}} \times 100\%$

Where $\bar{Y}_{treated,pre}$ is the pre-treatment mean of the treated group. This contextualises the absolute effect size relative to the pre-treatment baseline.

8.3 Standardised Effect Size (Cohen's d Analogue)

Standardise the DiD estimate by the pre-treatment standard deviation of the outcome:

$d_{DiD} = \frac{\hat{\delta}}{s_{pre}}$

Where $s_{pre}$ is the pooled pre-treatment standard deviation across treated and control groups. Benchmarks follow Cohen (1988):

| $|d_{DiD}|$ | Effect Size | | :---------- | :---------- | | $0.20$ | Small | | $0.50$ | Medium | | $0.80$ | Large |

8.4 Relative Reduction/Increase

For outcomes where the baseline level matters (e.g., crime rates, disease incidence), report the relative effect:

$RR = \frac{\bar{Y}_{treated,post}^{observed}}{\bar{Y}_{treated,post}^{counterfactual}} = \frac{\bar{Y}_{1,1}}{\bar{Y}_{1,0} + (\bar{Y}_{0,1} - \bar{Y}_{0,0})}$

Or the relative DiD:

$\hat{\delta}_{rel} = \frac{\hat{\delta}}{\bar{Y}_{treated,pre}} \quad \text{(fractional change)}$

8.5 Number Needed to Treat (NNT)

For binary outcomes (e.g., employed/unemployed, insured/uninsured):

$NNT = \frac{1}{|\hat{\delta}|}$

The NNT represents the number of units that need to be treated to produce one additional success (or prevented failure), contextualising the policy significance of the effect.

8.6 $R^2$ and Explained Variance

While not a primary effect size for DiD, report the within- $R^2$ (after partialling out fixed effects) to convey how much treatment variation explains the residual variation in the outcome:

$R^2_{within} = 1 - \frac{SS_{residual}}{SS_{within-unit, within-time}}$

Report both the overall $R^2$ and the within $R^2$ for TWFE models.

9. Model Fit and Evaluation

9.1 Goodness-of-Fit Statistics

Standard regression fit statistics apply to the DiD regression:

Statistic	Formula	Description
$R^2$	$1 - SS_{res}/SS_{tot}$	Overall variance explained
Within $R^2$	$1 - SS_{res}/SS_{within}$	Variance explained within unit × time cells
Between $R^2$	Based on group-time means	Variance explained between group-time cells
Adjusted $R^2$	$1 - (1-R^2)(n-1)/(n-k-1)$	$R^2$ penalised for parameters
RMSE	$\sqrt{SS_{res}/(n-k)}$	Root mean squared error
AIC	$n\ln(SS_{res}/n) + 2k$	Penalised fit (lower is better)
BIC	$n\ln(SS_{res}/n) + k\ln(n)$	Strongly penalised fit (lower is better)

9.2 Fit of the Counterfactual

A key model evaluation step is assessing how well the control group serves as a counterfactual for the treated group's pre-treatment trajectory. Visually:

Plot the raw pre-treatment trends for treated and control groups.
Assess whether the trends are parallel (or conditionally parallel after covariate adjustment).

Quantitatively: Compute the pre-treatment DiD — the difference in trends during the pre-period. If this is near zero and statistically insignificant, the parallel trends assumption is supported.

9.3 Information Criteria for Model Comparison

When comparing DiD specifications (e.g., different control sets, different functional forms, different clustering levels), use AIC and BIC:

$AIC = 2k - 2\ln(\hat{L})$

$BIC = k\ln(n) - 2\ln(\hat{L})$

Where $k$ is the number of parameters and $\hat{L}$ is the maximised likelihood.

Note: AIC and BIC comparisons are only valid for models fitted to the same sample using the same outcome variable (e.g., levels vs. logs are not comparable on these criteria).

9.4 Assessing Balance on Pre-Treatment Covariates

A critical validity check is whether treated and control groups have similar pre-treatment characteristics:

Report a balance table of pre-treatment mean differences in key covariates between groups.
Test statistical differences using two-sample $t$ -tests or non-parametric tests.
Report standardised mean differences (SMD = mean difference / pooled SD) as effect sizes for balance.
SMD $< 0.10$ is commonly used as a threshold for acceptable balance.

10. Diagnostics and Assumption Testing

10.1 Pre-Trends Test (Event Study)

The event study (dynamic DiD) is the primary tool for testing the parallel trends assumption using pre-treatment data. It estimates a separate DiD coefficient for each time period:

$Y_{it} = \alpha_i + \lambda_t + \sum_{k \neq -1} \delta_k \cdot \mathbf{1}[t - T_i^* = k] \cdot D_i + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$

Where:

$T_i^*$ = treatment onset period for unit $i$ .
$k$ indexes periods relative to treatment onset (negative = pre-treatment, positive = post-treatment).
Period $k = -1$ is the omitted base period (normalised to zero for identification).
$D_i = 1$ for treated units.

Interpretation:

Pre-treatment coefficients ( $k < 0$ ): Should be statistically indistinguishable from zero. Significant pre-treatment coefficients indicate pre-existing differences in trends — a violation of parallel trends.
Post-treatment coefficients ( $k \geq 0$ ): Estimate the dynamic treatment effect at each horizon after treatment. Increasing post-treatment effects may indicate treatment ramp-up; decreasing effects may indicate decay.

Formal pre-trends test: Joint $F$ -test (or $\chi^2$ test) that all pre-treatment coefficients $\delta_k$ ( $k < 0$ ) are jointly zero:

$H_0: \delta_{-K} = \delta_{-K+1} = \dots = \delta_{-2} = 0$

A non-significant result supports the parallel trends assumption; a significant result casts doubt.

⚠️ Passing the pre-trends test does not prove parallel trends hold in the post-period — trends may diverge after treatment for reasons unrelated to treatment. The pre-trends test is a necessary but not sufficient condition for identification.

10.2 Placebo Tests

Placebo tests assess whether the estimated DiD effect could have arisen by chance or due to confounding:

10.2.1 Placebo Time Periods

Estimate the DiD using only pre-treatment data, using a false treatment date (e.g., 2 years before the true treatment):

$Y_{it} = \alpha_i + \lambda_t + \delta_{placebo} \cdot D_{it}^{placebo} + \epsilon_{it}$

A significant $\hat{\delta}_{placebo}$ when treatment has not yet occurred suggests confounding or a violation of parallel trends.

10.2.2 Placebo Treatment Groups

Assign treatment to groups that were not actually treated and estimate the DiD. If the "treatment effect" is significant for these falsely treated groups, the design has poor identification.

10.2.3 Outcome Placebo Tests

Estimate the DiD using outcomes that should not be affected by the treatment. A null result ( $\hat{\delta} \approx 0$ ) for these placebo outcomes increases confidence that the design is not picking up spurious effects.

10.3 Sensitivity to Parallel Trends Violations

Rambachan and Roth's sensitivity analysis (2023) provides a formal framework for assessing how large a violation of parallel trends would need to be to reverse the conclusion. The key parameter $M$ captures the maximum allowable deviation from parallel trends:

$|\Delta_{post} - \Delta_{pre}| \leq M$

Report breakdown values of $M$ — the maximum deviation consistent with the estimated effect remaining statistically significant or of the correct sign.

10.4 Testing for Anticipation Effects

Estimate the DiD including period $k = -1$ (the period immediately before treatment) in the event study:

$Y_{it} = \alpha_i + \lambda_t + \sum_{k = -K}^{-1} \delta_k \cdot \mathbf{1}[t - T_i^* = k] \cdot D_i + \sum_{k=0}^{K} \delta_k \cdot \mathbf{1}[t - T_i^* = k] \cdot D_i + \epsilon_{it}$

If $\delta_{-1}$ is significantly different from zero, anticipation effects may be present.

10.5 Checking for Compositional Changes

In DiD with repeated cross-sections, the composition of the treated or control groups may change between periods. If the treatment induces sample selection (e.g., a health policy causes sick people to enter/exit the workforce), the DiD estimator may be biased.

How to check:

Compare the distribution of pre-treatment characteristics across groups and periods.
Test for treatment effects on selection-related outcomes (e.g., sample size, attrition rates).
Use balanced panel data where possible to avoid compositional issues.

10.6 Residual Diagnostics

Standard regression diagnostics apply to the DiD residuals:

Serial correlation test (Wooldridge, 2002): Tests whether the residuals from the first-differenced equation are serially correlated. Under H₀ (no serial correlation in levels), the first-differenced residuals have correlation $-0.5$ . Significant deviation suggests serial correlation.
Heteroscedasticity: Breusch-Pagan or White test; motivates cluster-robust standard errors.
Normality of residuals: Jarque-Bera test; Q-Q plot (less critical with large $n$ ).
Outlier detection: Cook's distance; leverage ( $h_{ii}$ ); DFFITS.

11. Extensions: Staggered DiD and Multiple Time Periods

11.1 Staggered Treatment Adoption

In many real-world settings, different units adopt treatment at different points in time — this is called staggered (or differential timing) DiD. For example, different US states adopt a policy in different years.

The TWFE regression in this context:

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \epsilon_{it}$

The TWFE estimator $\hat{\delta}$ is a weighted average of all possible 2×2 DiD comparisons between groups that adopt treatment at different times — what Goodman-Bacon (2021) calls Bacon decomposition.

11.2 The Bacon Decomposition

Goodman-Bacon (2021) shows that the TWFE estimator in a staggered design decomposes as:

$\hat{\delta}_{TWFE} = \sum_{k \neq l} s_{kl} \hat{\delta}_{kl}$

Where $\hat{\delta}_{kl}$ is the 2×2 DiD comparing early adopters (treatment at time $k$ ) vs. late adopters (treatment at time $l > k$ ), and $s_{kl} > 0$ are weights summing to 1.

The problem of "forbidden comparisons": Some of these 2×2 DiDs compare a late adopter group in the post-period against an early adopter group that has already been treated — using already-treated units as a control group. If treatment effects are heterogeneous and dynamic (treatment effects change over time), this produces negative weights that can lead to:

A significant $\hat{\delta}_{TWFE}$ of the wrong sign even when all individual group-time ATTs are positive.
A misleading averaged effect that conceals substantial heterogeneity.

11.3 Robust Staggered DiD Estimators

Several robust estimators have been developed to address the staggered DiD problem:

11.3.1 Callaway & Sant'Anna (2021) — Cohort-Specific ATTs

Define a cohort as the set of units that first receive treatment at the same calendar time $g$ . The cohort-average treatment effect on the treated is:

$ATT(g, t) = E[Y_t(g) - Y_t(\infty) \mid G = g]$

Where $Y_t(g)$ is the potential outcome at time $t$ if first treated at time $g$ , and $Y_t(\infty)$ is the never-treated potential outcome.

Aggregation: Individual cohort-time ATTs are aggregated to form:

Simple average: $\theta = \text{Average}_{g,t}[ATT(g,t)]$ .
Calendar time aggregation: $\theta(t)$ = average ATT at calendar time $t$ across all treated cohorts.
Event time aggregation: $\theta(e)$ = average ATT at event time $e = t - g$ across all cohorts.

11.3.2 Sun & Abraham (2021) — Interaction-Weighted Estimator

Decompose the TWFE estimate using cohort × period interactions:

$Y_{it} = \alpha_i + \lambda_t + \sum_{g \neq \infty} \sum_{k \neq -1} \delta_{gk} \cdot \mathbf{1}[G_i = g] \cdot \mathbf{1}[t - g = k] + \epsilon_{it}$

The interaction-weighted (IW) estimator aggregates $\hat{\delta}_{gk}$ using the share of each cohort in each period as weights, producing a heterogeneity-robust estimate of the average effect.

11.3.3 de Chaisemartin & D'Haultfœuille (2020) — $\text{DID}_M$

The $\text{DID}_M$ estimator uses only "clean" comparisons — periods in which treatment status changes — to form the estimate:

$\text{DID}_M = \sum_{(i,t): S_{it}=1} w_{it} \Delta Y_{it} - \sum_{(i,t): S_{it}=0} w_{it} \Delta Y_{it}$

Where $S_{it} = 1$ if unit $i$ switches from untreated to treated between $t-1$ and $t$ , and weights $w_{it}$ ensure comparability.

11.3.4 Borusyak, Jaravel & Spiess (2024) — Imputation Estimator

Imputes the counterfactual using a linear factor model:

Estimate $\alpha_i$ and $\lambda_t$ using untreated observations only.
Impute the counterfactual: $\hat{Y}_{it}(0) = \hat{\alpha}_i + \hat{\lambda}_t$ for treated observations.
Estimate ATTs: $\hat{\tau}_{it} = Y_{it} - \hat{Y}_{it}(0)$ .

11.4 Choosing Among Staggered DiD Estimators

Estimator	Robust to Effect Heterogeneity	Multiple Controls	Covariates	Key Reference
TWFE	❌ (negative weights possible)	✅	✅	—
Callaway-Sant'Anna	✅	✅	✅	Callaway & Sant'Anna (2021)
Sun-Abraham	✅	✅	Limited	Sun & Abraham (2021)
de Chaisemartin-D'Haultfœuille	✅	Limited	Limited	dCH & DH (2020)
Borusyak-Jaravel-Spiess	✅	✅	✅	BJS (2024)

💡 For staggered designs, always report the Bacon decomposition to diagnose the extent of potentially problematic comparisons, and supplement TWFE with at least one robust estimator.

12. Extensions: Heterogeneous Treatment Effects

12.1 Why Treatment Effects May Be Heterogeneous

The standard DiD model estimates a single average treatment effect (ATT). In reality, treatment effects often vary across:

Units: Different firms, regions, or individuals respond differently to the same treatment.
Time: Treatment effects may grow, decay, or oscillate over time after treatment onset.
Subgroups: Effects may differ by gender, income, size, geography, or other characteristics.
Treatment intensity: Larger doses may produce larger effects (see Section 13).

12.2 Subgroup DiD Analysis

To examine how treatment effects vary across a categorical moderator $Z_i$ :

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \gamma \cdot (D_{it} \times Z_i) + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$

$\delta$ = treatment effect for the reference subgroup ( $Z_i = 0$ ).
$\gamma$ = differential treatment effect — how much the effect differs for $Z_i = 1$ relative to $Z_i = 0$ .
Total effect for $Z_i = 1$ : $\delta + \gamma$ .

Test of effect heterogeneity: $H_0: \gamma = 0$ (no heterogeneity). Use cluster-robust standard errors.

12.3 Dynamic Treatment Effects (Event Study)

The event study design (Section 10.1) directly estimates dynamic treatment effects:

$\delta_k = E[Y_{it}(1) - Y_{it}(0) \mid D_i = 1, t - T_i^* = k]$

For $k = 0, 1, 2, \dots, K$ (periods since treatment onset). Plot $\hat{\delta}_k$ with confidence intervals to visualise:

Immediate effects ( $k = 0$ ): Effect in the year/period treatment begins.
Ramp-up: Effects growing over time (learning, diffusion, cumulative investment).
Decay: Effects diminishing over time (adaptation, spillovers, fading).
Persistence: Effects stable over time (structural change).

12.4 Heterogeneity-Robust Aggregation

The robust staggered DiD estimators (Section 11.3) produce cohort-specific ATTs $ATT(g, t)$ that can be aggregated in various ways:

Average across all cohorts and post-periods: $\bar{\delta} = \frac{1}{|\mathcal{S}|}\sum_{(g,t) \in \mathcal{S}} ATT(g, t)$

Event-time average: ATT as a function of time since treatment: $\delta(k) = \frac{1}{|\{g: t \geq g + k\}|}\sum_{g: t \geq g+k} ATT(g, g + k)$

Calendar-time average: ATT in each calendar year: $\delta(t) = \frac{1}{|\{g: g \leq t\}|}\sum_{g \leq t} ATT(g, t)$

13. Extensions: Continuous and Fuzzy Treatment

13.1 Continuous Treatment Intensity (Dose-Response DiD)

When the treatment variable is continuous (e.g., amount of subsidy, level of minimum wage increase, exposure to a policy) rather than binary, the DiD model becomes:

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$

Where $D_{it}$ is now a continuous variable representing the intensity of treatment. The coefficient $\delta$ represents the effect of a one-unit increase in treatment intensity on the outcome.

Dose-response curve: Plot the predicted outcome as a function of treatment intensity at different time periods to visualise the dose-response relationship.

13.2 Fuzzy DiD (Instrumental Variables DiD)

In a fuzzy DiD design, the policy change shifts the probability of treatment but does not deterministically assign treatment. For example:

A subsidy makes adoption cheaper but does not mandate it.
An eligibility rule changes, but not all eligible units comply.

The binary treatment variable $D_{it}$ measures actual take-up; the policy indicator $Z_{it}$ is the instrument (discontinuous change in eligibility or incentive).

First stage: Treatment as a function of the instrument:

$D_{it} = \pi_0 + \pi_1 Z_{it} + \alpha_i^D + \lambda_t^D + \mathbf{X}_{it}^T\boldsymbol{\pi} + \nu_{it}$

Second stage: Outcome as a function of predicted treatment:

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot \hat{D}_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$

The IV-DiD estimator $\hat{\delta}_{IV}$ estimates the Local Average Treatment Effect (LATE) — the effect on compliers (units that switch treatment status in response to the policy change).

Fuzzy DiD estimator:

$\hat{\delta}_{FuzzyDiD} = \frac{\hat{\delta}_{reduced\, form}}{\hat{\pi}_1} = \frac{\Delta\bar{Y}_{treated} - \Delta\bar{Y}_{control}}{\Delta\bar{D}_{treated} - \Delta\bar{D}_{control}}$

13.3 Triple Differences (DDD)

Triple differences (DDD) adds a third source of variation to further control for confounders. The idea is to difference out group-specific time trends that are common to all individuals within a group:

$Y_{igjt} = \alpha + \sum \text{(all pairwise FE)} + \delta \cdot (\text{Treated}_g \times \text{Post}_t \times \text{Eligible}_j) + \epsilon_{igjt}$

Where $j$ is an additional dimension (e.g., age group, income group) that determines eligibility within the treated group.

DDD estimator:

$\hat{\delta}_{DDD} = \underbrace{[(\bar{Y}_{Tr,Post,El} - \bar{Y}_{Tr,Pre,El}) - (\bar{Y}_{Tr,Post,NoEl} - \bar{Y}_{Tr,Pre,NoEl})]}_{\text{DiD among Treated Group}} - \underbrace{[(\bar{Y}_{Ctrl,Post,El} - \bar{Y}_{Ctrl,Pre,El}) - (\bar{Y}_{Ctrl,Post,NoEl} - \bar{Y}_{Ctrl,Pre,NoEl})]}_{\text{DiD among Control Group}}$

DDD is valuable when the comparison across regions includes contamination from regional trends that differentially affect all groups in treated regions.

14. Covariates and Controls in DiD

14.1 Why Include Covariates?

Adding covariates to the DiD model serves two distinct purposes:

Improving efficiency (precision): Covariates that predict the outcome reduce residual variance, shrinking standard errors and widening the confidence interval.
Restoring conditional parallel trends: If unconditional parallel trends is implausible but trends are parallel after conditioning on observable characteristics, including covariates removes the confounding and restores identification.

14.2 Time-Invariant Covariates

In the TWFE model, time-invariant covariates $\mathbf{X}_i$ (e.g., gender, ethnicity, geographic characteristics) are absorbed by the unit fixed effect $\alpha_i$ and cannot be estimated separately. However, they can be included as interactions with the treatment or time variables to allow their effect to vary:

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \gamma \cdot (D_{it} \times X_i) + \epsilon_{it}$

14.3 Time-Varying Covariates

Time-varying covariates $X_{it}$ can be included directly in the TWFE model:

$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \boldsymbol{\beta}^T \mathbf{X}_{it} + \epsilon_{it}$

⚠️ Including time-varying covariates that are themselves affected by the treatment (i.e., "bad controls" or "mediators") is a common mistake. Including such variables absorbs part of the treatment effect, leading to underestimation of $\delta$ . Only include covariates that are determined before treatment or that are plausibly unaffected by treatment.

14.4 Regression Adjustment (Outcome Regression)

The regression-adjusted DiD uses the control group's pre-to-post relationship between covariates and the outcome to construct an improved counterfactual:

$\hat{\delta}_{RA} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - \hat{m}(\mathbf{X}_{1,0})$

Where $\hat{m}(\mathbf{X}_{1,0})$ is the predicted counterfactual change based on the control group's covariate-outcome relationship. This improves efficiency and removes covariate-related bias.

14.5 Doubly Robust DiD (DR-DiD)

The doubly robust estimator combines propensity score weighting and outcome regression. It is consistent if either the propensity score model or the outcome regression model is correctly specified:

$\hat{\delta}_{DR} = \frac{1}{n_1}\sum_{D_i = 1}\left[\Delta Y_i - \hat{m}(\mathbf{X}_i)\right] - \frac{1}{n_0}\sum_{D_i = 0}\hat{w}(\mathbf{X}_i)\left[\Delta Y_i - \hat{m}(\mathbf{X}_i)\right]$

Where $\hat{w}(\mathbf{X}_i)$ are propensity score reweighting terms and $\hat{m}(\mathbf{X}_i)$ is the regression-adjusted counterfactual change.

14.6 Controlling for Pre-Treatment Trends (Linear Trend Adjustment)

When parallel trends is violated by unit-specific linear time trends, include unit-specific trend terms:

$Y_{it} = \alpha_i + \lambda_t + \rho_i \cdot t + \delta \cdot D_{it} + \epsilon_{it}$

Where $\rho_i \cdot t$ is a unit-specific linear time trend. This allows each unit to have its own pre-treatment trajectory, controlling for heterogeneous trends.

⚠️ Including unit-specific trends is a strong assumption (units would have continued on their pre-treatment trend indefinitely) and can overcontrol. Use only when unit-specific trends are well-established in the pre-period and when there are sufficient pre-period observations to estimate them.

15. Using the DiD Component

The Difference-in-Differences component in the DataStatPro application provides a comprehensive workflow for DiD estimation, testing, and visualisation.

Step-by-Step Guide

Step 1 — Select Dataset Choose the dataset from the "Dataset" dropdown. The dataset should have:

A unit identifier column (individual, firm, region, country).
A time period column (year, quarter, month).
An outcome variable (continuous or binary).
A treatment indicator (binary: 0 = control, 1 = treated).

Step 2 — Select DiD Design Choose the DiD specification:

2×2 DiD (two groups, two periods — canonical design)
Two-Way Fixed Effects (TWFE) (panel data, multiple periods)
Staggered DiD (multiple treatment timing groups)
Event Study / Dynamic DiD (multiple pre- and post-periods)
Triple Differences (DDD) (three sources of variation)
Fuzzy DiD / IV-DiD (non-compliance with treatment)

Step 3 — Select Variables Map the required variables from your dataset:

Unit ID: The unique identifier for each unit (individual, firm, state).
Time ID: The time period variable.
Outcome (Y): The continuous or binary dependent variable.
Treatment Group ( $D_i$ ): Binary indicator for treated group (time-invariant for 2×2).
Treatment Indicator ( $D_{it}$ ): Binary indicator for treatment status (may vary by time for TWFE/staggered).
Covariates: Optional time-varying or time-invariant controls.

Step 4 — Specify Treatment Timing

2×2 DiD: Specify the pre- and post-period labels.
TWFE/Staggered: The application detects treatment timing from the $D_{it}$ variable automatically. Review the detected cohort structure.
Event Study: Specify the base period (omitted category, default: $k = -1$ ) and the number of leads and lags to include.

Step 5 — Configure Fixed Effects Select fixed effects to include:

✅ Unit Fixed Effects ( $\alpha_i$ ) — strongly recommended for panel data.
✅ Time Fixed Effects ( $\lambda_t$ ) — strongly recommended.
Unit-Specific Linear Trends $\rho_i \cdot t$ — optional; use when pre-trend concerns exist.

Step 6 — Configure Standard Errors Select the standard error type:

Cluster-Robust (default and recommended) — specify the clustering variable (typically the treatment assignment unit).
HC (Heteroscedasticity-Robust) — use when clusters are very small or unavailable.
Wild Cluster Bootstrap — specify for few clusters ( $G < 30$ ).
Block Bootstrap — for more general panel dependence.

Step 7 — Select Staggered DiD Estimator (if applicable) For staggered designs, choose the robust estimator:

TWFE (standard but potentially biased with heterogeneous effects)
Callaway-Sant'Anna
Sun-Abraham
Bacon Decomposition (diagnostic)
de Chaisemartin-D'Haultfœuille

Step 8 — Configure Inference Options

Confidence level: Default 95%.
Bootstrap replications: For wild bootstrap (default: 999).
Permutation replications: For randomisation inference (default: 999).

Step 9 — Select Display Options Choose which outputs to display:

✅ DiD Coefficient Table (estimate, SE, t, p, CI)
✅ Pre-Treatment Trends Plot
✅ Event Study Plot (with CIs)
✅ 2×2 DiD Table (group-period means)
✅ Parallel Trends Test (joint F-test on pre-period coefficients)
✅ Counterfactual Plot
✅ Placebo Test Results
✅ Bacon Decomposition Plot (staggered designs)
✅ Balance Table (pre-treatment covariate balance)
✅ Residual Diagnostics
✅ Coefficient Profile Plot across Subgroups
✅ Dynamic Effects Plot

Step 10 — Run the Analysis Click "Run DiD Model". The application will:

Construct the design matrix with appropriate interaction terms and fixed effects.
Estimate the DiD coefficient(s) via OLS with specified standard errors.
Compute the event study / dynamic effects coefficients.
Run the pre-trends test and parallel trends diagnostics.
Execute placebo tests (if requested).
Compute the Bacon decomposition (for staggered designs).
Generate all selected visualisations and tables.

16. Computational and Formula Details

16.1 The 2×2 DiD Estimator: Step-by-Step

Step 1: Compute group-period means

$\bar{Y}_{gt} = \frac{1}{n_{gt}}\sum_{i \in g}\sum_{t' = t} Y_{it}$

For groups $g \in \{0, 1\}$ (control, treated) and periods $t \in \{0, 1\}$ (pre, post).

Step 2: First differences

$\Delta\bar{Y}_{treated} = \bar{Y}_{1,1} - \bar{Y}_{1,0}$

$\Delta\bar{Y}_{control} = \bar{Y}_{0,1} - \bar{Y}_{0,0}$

Step 3: DiD estimate

$\hat{\delta} = \Delta\bar{Y}_{treated} - \Delta\bar{Y}_{control}$

Step 4: Standard error (homoscedastic OLS)

With $n_{gt}$ observations per cell, $n = n_{00} + n_{01} + n_{10} + n_{11}$ :

$SE_{OLS}(\hat{\delta}) = \hat{\sigma}\sqrt{\frac{1}{n_{00}} + \frac{1}{n_{01}} + \frac{1}{n_{10}} + \frac{1}{n_{11}}}$

Where $\hat{\sigma}^2 = \sum_{igt}(Y_{it} - \hat{Y}_{it})^2 / (n - 4)$ .

16.2 TWFE Estimation: The Demeaning Procedure

Step 1: Compute unit means, time means, and grand mean

$\bar{Y}_i = \frac{1}{T}\sum_{t=1}^T Y_{it}, \quad \bar{Y}_t = \frac{1}{n}\sum_{i=1}^n Y_{it}, \quad \bar{Y} = \frac{1}{nT}\sum_{i,t} Y_{it}$

Step 2: Demean all variables

$\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{Y}$ $\ddot{D}_{it} = D_{it} - \bar{D}_i - \bar{D}_t + \bar{D}$

Step 3: Estimate TWFE by OLS on demeaned variables

$\hat{\delta}_{TWFE} = \frac{\sum_{it}\ddot{D}_{it}\ddot{Y}_{it}}{\sum_{it}\ddot{D}_{it}^2}$

16.3 Cluster-Robust Variance Estimator

With $G$ clusters and the TWFE estimator:

$\hat{V}_{cluster}(\hat{\delta}) = \frac{G}{G-1} \cdot \frac{n-1}{n-k} \cdot (\ddot{\mathbf{D}}^T\ddot{\mathbf{D}})^{-1} \left(\sum_{g=1}^G \ddot{\mathbf{D}}_g^T\hat{\boldsymbol{\epsilon}}_g\hat{\boldsymbol{\epsilon}}_g^T\ddot{\mathbf{D}}_g\right) (\ddot{\mathbf{D}}^T\ddot{\mathbf{D}})^{-1}$

Where $G/(G-1)$ is a small-sample correction, $\ddot{\mathbf{D}}_g$ and $\hat{\boldsymbol{\epsilon}}_g$ are the demeaned treatment vector and residuals for cluster $g$ .

$SE_{cluster}(\hat{\delta}) = \sqrt{\hat{V}_{cluster}(\hat{\delta})}$

16.4 Event Study Regression: Full Specification

For a unit $i$ with treatment onset at period $T_i^*$ and balanced panel from $t = 1$ to $T$ :

Define event-time dummies:

$\mathbf{1}[k]_{it} = \mathbf{1}[t - T_i^* = k] \cdot D_i$

For $k = -K, \dots, -2, -1, 0, 1, \dots, K$ (omitting $k = -1$ as the reference).

Stack the regression:

$Y_{it} = \alpha_i + \lambda_t + \sum_{k=-K, k\neq-1}^{K} \delta_k \cdot \mathbf{1}[k]_{it} + \epsilon_{it}$

Estimate by TWFE (adding unit and time FE as dummy variables or using within-transformation).

Confidence bands: For each $k$ , compute $\hat{\delta}_k \pm 1.96 \times SE_{cluster}(\hat{\delta}_k)$ .

16.5 The Bacon Decomposition

For a staggered design with cohorts (groups adopting treatment at different times), the TWFE estimator decomposes as:

$\hat{\delta}_{TWFE} = \sum_{g < g'} \hat{s}_{gg'} \hat{\delta}_{gg'} + \sum_{g} \hat{s}_{gU} \hat{\delta}_{gU}$

Where:

$\hat{\delta}_{gg'}$ = 2×2 DiD comparing early adopters $g$ vs. late adopters $g' > g$ .
$\hat{\delta}_{gU}$ = 2×2 DiD comparing cohort $g$ vs. never-treated units.
Weights $\hat{s}$ are proportional to cell sizes and treatment variance.

The decomposition reveals how much of the TWFE estimate comes from each pairwise comparison, and which comparisons use already-treated units as controls (potentially problematic).

16.6 Pre-Trend Test Statistic

Joint F-test for pre-treatment event study coefficients:

$F = \frac{(\mathbf{R}\hat{\boldsymbol{\delta}})^T[\mathbf{R}\hat{V}\mathbf{R}^T]^{-1}(\mathbf{R}\hat{\boldsymbol{\delta}})}{q}$

Where $\mathbf{R}$ selects the pre-treatment coefficients ( $k = -K, \dots, -2$ ), $\hat{\boldsymbol{\delta}}$ is the vector of event study estimates, $\hat{V}$ is their variance-covariance matrix (cluster-robust), and $q = K - 1$ is the number of pre-period restrictions.

Under $H_0$ (parallel trends in pre-period): $F \sim F_{q, G-1}$ approximately.

16.7 DiD with Binary Outcomes

For binary outcomes ( $Y_{it} \in \{0, 1\}$ ), the linear probability model (LPM) DiD remains valid and interpretable:

$E[Y_{it} \mid D_{it}, \alpha_i, \lambda_t] = \alpha_i + \lambda_t + \delta \cdot D_{it}$

$\hat{\delta}$ estimates the probability change (in percentage points) caused by treatment. While predicted probabilities may fall outside $[0,1]$ , the LPM DiD estimator of $\delta$ is unbiased under parallel trends.

For probits or logits, the DiD interpretation is more complex and non-linear. The average marginal effect from a nonlinear DiD:

$AME_{DiD} = \frac{1}{n_{treated}}\sum_{i: D_i=1}\left[\hat{F}(\hat{\eta}_{i,post}) - \hat{F}(\hat{\eta}_{i,pre})\right] - \frac{1}{n_{control}}\sum_{i: D_i=0}\left[\hat{F}(\hat{\eta}_{i,post}) - \hat{F}(\hat{\eta}_{i,pre})\right]$

Where $\hat{F}$ is the estimated CDF (probit or logistic) and $\hat{\eta}$ is the linear predictor. Note: the LPM is generally preferred for DiD with binary outcomes due to tractability.

17. Worked Examples

Example 1: 2×2 DiD — Effect of Minimum Wage on Employment

Research Question: Did a 20% increase in the minimum wage in State A in 2019 affect the fast-food employment rate, using State B (which had no minimum wage change) as the control?

Data: Monthly employment rates for fast-food workers in State A (treated) and State B (control), 2017–2021. For simplicity, we use 2018 as pre-period and 2019 onward as post-period.

Step 1: Group-Period Mean Table

	Pre-2019 Mean ( $\bar{Y}_{g,0}$ )	Post-2019 Mean ( $\bar{Y}_{g,1}$ )	Change ( $\Delta\bar{Y}_g$ )
State A (Treated, $D=1$ )	72.4%	70.1%	-2.3 pp
State B (Control, $D=0$ )	74.1%	73.2%	-0.9 pp
DiD			-2.3 − (−0.9) = −1.4 pp

Step 2: OLS Regression

$\text{Employment}_{it} = \alpha + \beta \cdot \text{StateA}_i + \gamma \cdot \text{Post}_t + \delta \cdot (\text{StateA}_i \times \text{Post}_t) + \epsilon_{it}$

Coefficient	Estimate	Cluster-Robust SE	$t$	$p$	95% CI
Intercept ( $\alpha$ )	74.1	0.42	176.4	< 0.001	[73.2, 75.0]
State A ( $\beta$ )	-1.7	0.61	-2.79	0.023	[-3.1, -0.3]
Post ( $\gamma$ )	-0.9	0.31	-2.90	0.018	[-1.6, -0.2]
DiD ( $\delta$ )	-1.4	0.54	-2.59	0.031	[-2.6, -0.2]

$R^2 = 0.218$ , $n = 120$ monthly observations.

Step 3: Interpretation

The minimum wage increase in State A reduced fast-food employment by an estimated 1.4 percentage points ( $p = 0.031$ , 95% CI: $[-2.6, -0.2]$ pp). This represents a $1.4/72.4 = 1.9\%$ relative reduction from the pre-treatment baseline.

Step 4: Pre-Trends Check (Using 2017–2018 data)

Using quarterly 2017–2018 data, estimate a placebo DiD treating 2018 Q1 as "post":

Placebo $\hat{\delta} = 0.31$ , SE = 0.48, $p = 0.524$ → No significant pre-treatment difference in trends. Parallel trends assumption is supported.

Step 5: Visualisation Summary

Pre-period trends for both states are approximately parallel (both declining slightly). Post-2019, State A's employment declines more sharply than State B's, consistent with the minimum wage effect.

Example 2: TWFE — Effect of Broadband Access on Business Formation

Research Question: Did broadband internet access (treated when broadband penetration > 50%) increase the rate of new business formation across US counties, 2000–2010?

Data: Annual panel of $n = 3{,}142$ counties, $T = 11$ years ( $N = 34{,}562$ observations); outcome: log business formation rate; treatment: broadband penetration indicator.

Step 1: TWFE Regression

$\ln(\text{BusinessRate}_{it}) = \alpha_i + \lambda_t + \delta \cdot \text{Broadband}_{it} + \beta_1\ln(\text{Population}_{it}) + \beta_2\text{Unemployment}_{it} + \epsilon_{it}$

Step 2: Results

Variable	Coefficient	Cluster-Robust SE (County)	$t_{3141}$	$p$
Broadband (DiD)	0.0841	0.0214	3.93	< 0.001
ln(Population)	0.1243	0.0381	3.26	0.001
Unemployment	-0.0182	0.0051	-3.57	< 0.001
County FE	✅ (3,142 dummies)	—	—	—
Year FE	✅ (11 dummies)	—	—	—

Within $R^2 = 0.312$ , $N = 34{,}562$ .

Step 3: Interpretation

Broadband internet access increases the log business formation rate by 0.0841, corresponding to an $e^{0.0841} - 1 = 8.8\%$ increase in business formation. The effect is highly significant ( $p < 0.001$ ) after controlling for county and year fixed effects and time-varying population and unemployment controls.

Standardised effect:

$d_{DiD} = \frac{0.0841}{s_{pre}} = \frac{0.0841}{0.312} = 0.27$

A small-to-medium standardised effect.

Step 4: Event Study

Estimating event study coefficients for 3 years before and 5 years after treatment adoption:

Period ( $k$ )	$\hat{\delta}_k$	SE	$p$	Significant?
$k = -3$	0.011	0.018	0.543	No
$k = -2$	-0.008	0.015	0.591	No
$k = -1$	(reference = 0)	—	—	—
$k = 0$	0.041	0.019	0.031	Yes
$k = 1$	0.072	0.023	0.002	Yes
$k = 2$	0.084	0.024	< 0.001	Yes
$k = 3$	0.091	0.026	< 0.001	Yes
$k = 4$	0.088	0.027	0.001	Yes
$k = 5$	0.083	0.029	0.004	Yes

Pre-treatment coefficients: Joint $F_{2, 3141} = 0.47$ , $p = 0.624$ → No pre-trends. Post-treatment effects ramp up over 2–3 years and then stabilise — consistent with gradual adoption and business formation.

Example 3: Staggered DiD — Effect of Paid Family Leave Policies on Female Labour Force Participation

Research Question: Did the adoption of state-level paid family leave (PFL) policies affect female labour force participation (FLFP) across US states, with different states adopting at different times (2004–2016)?

Data: Annual panel of 50 states, 2000–2020; outcome: FLFP rate (%); treatment: PFL adoption indicator. 12 states adopt PFL at different times; 38 states never adopt (control).

Step 1: TWFE Estimate (Standard)

$\hat{\delta}_{TWFE} = 1.82$ pp, SE = 0.74, $p = 0.015$ .

Step 2: Bacon Decomposition

Comparison Type	Weight	2×2 DiD Estimate
Early adopters vs. Never treated	0.41	2.31
Late adopters vs. Never treated	0.28	2.04
Early vs. Late (early as treated)	0.22	1.41
Late vs. Early (late as treated)	0.09	0.63

The decomposition reveals that 31% of the TWFE estimate comes from comparing early vs. late adopters — using already-treated states as controls. The "Late vs. Early" comparison ( $\hat{\delta} = 0.63$ ) is notably smaller, suggesting heterogeneous treatment effects across adoption cohorts.

Step 3: Callaway-Sant'Anna Robust Estimator

Computing $ATT(g, t)$ for each adoption cohort $g$ and time $t$ :

Cohort (First Treated Year $g$ )	$n_{cohort}$	Average Post-Treatment ATT
2004 (California)	1	3.21 pp
2008 (New Jersey)	1	2.84 pp
2013 (Rhode Island)	1	2.41 pp
2016 (New York)	1	1.98 pp
Other early adopters (2004–2008)	4	2.68 pp
Other late adopters (2009–2016)	5	1.72 pp

Aggregated average ATT (Callaway-Sant'Anna): 2.31 pp (SE = 0.82, $p = 0.005$ ).

Observation: The Callaway-Sant'Anna estimate (2.31 pp) is larger than the TWFE estimate (1.82 pp), and the discrepancy is explained by the downward-biasing effect of using already-treated states as controls in the "Late vs. Early" TWFE comparison.

Step 4: Dynamic Effects (Event Study)

Event Time $k$	CS Estimate	SE	95% CI
$-3$	-0.12	0.28	[-0.67, 0.43]
$-2$	0.08	0.24	[-0.39, 0.55]
$-1$	(reference)	—	—
$0$	1.21	0.41	[0.41, 2.01]
$+1$	2.18	0.54	[1.12, 3.24]
$+2$	2.41	0.61	[1.21, 3.61]
$+3$	2.58	0.68	[1.25, 3.91]

Pre-trends test: $F_{2, 49} = 0.14$ , $p = 0.869$ → No pre-trends. Effects increase over the first two years post-adoption and stabilise, suggesting that businesses and workers gradually adjust to the new policy.

Conclusion: PFL policies increase female labour force participation by approximately 2.3 percentage points on average (Callaway-Sant'Anna robust estimate). The TWFE estimate is downward biased by about 0.5 pp due to heterogeneous treatment effects across adoption cohorts. Effects materialise immediately and grow slightly over the first two years.

Example 4: Triple Differences — Effect of Health Insurance Expansion on Hospital Admissions

Research Question: Did Medicaid expansion under the ACA increase hospital admissions for low-income adults (the eligible group) compared to higher-income adults (the ineligible group), in expansion vs. non-expansion states?

Data: State-year panel; outcome: hospitalisation rate per 1,000 adults; three dimensions: State (expansion vs. non-expansion), Year (pre-2014 vs. post-2014), Income group (low-income eligible vs. higher-income ineligible).

Triple Differences Regression:

$\text{Hosp}_{syt} = \alpha_{sy} + \alpha_{st} + \alpha_{yt} + \delta \cdot (\text{Exp}_s \times \text{Post}_y \times \text{LowInc}_t) + \epsilon_{syt}$

Where all two-way interactions are absorbed by the appropriate two-way fixed effects.

Coefficient	Estimate	SE	$p$
DDD ( $\delta$ )	12.4	3.21	< 0.001

The DDD estimate suggests that Medicaid expansion increased hospitalisation rates for low-income adults (who became eligible) by 12.4 per 1,000 relative to higher-income adults in expansion states, and relative to the same income groups in non-expansion states. This controls for any general time trends in hospitalisation, any state-specific income disparities, and any common shifts in healthcare utilisation across income groups.

18. Common Mistakes and How to Avoid Them

Mistake 1: Failing to Use Cluster-Robust Standard Errors

Problem: Using OLS standard errors (or even heteroscedasticity-robust HC standard errors) in a DiD design, resulting in severely underestimated standard errors and spuriously small p-values. DiD residuals are almost always serially correlated within units across time, and within groups (e.g., states) across individuals.
Solution: Always cluster standard errors at the level of treatment assignment. For state-level policies, cluster at the state level. If the number of clusters is small ( $G < 30$ ), use the wild cluster bootstrap instead of asymptotic cluster-robust SEs.

Mistake 2: Ignoring Pre-Trends

Problem: Reporting a significant DiD estimate without testing or reporting the parallel trends assumption, leaving the identification assumption entirely unvalidated and the results unconvincing.
Solution: Always conduct and report the event study pre-trends test. Plot the event study coefficients with confidence bands for at least 2–3 pre-treatment periods. Report the joint $F$ -test for pre-period coefficients and discuss any concerning patterns, even if not statistically significant.

Mistake 3: Including Bad Controls (Post-Treatment Outcomes)

Problem: Including time-varying covariates that are themselves caused by the treatment (mediators or "bad controls") — e.g., including health insurance status as a covariate when the treatment is a health policy that affects insurance. This absorbs part of the treatment effect through the covariate, biasing $\hat{\delta}$ toward zero.
Solution: Only include covariates that are predetermined (determined before treatment) or plausibly unaffected by the treatment. When in doubt, report the model with and without the covariate and check sensitivity.

Mistake 4: Applying Standard TWFE to Staggered Designs Without Checking

Problem: Using the standard TWFE estimator in a staggered design with heterogeneous treatment effects, obtaining a potentially biased or sign-reversed DiD estimate without investigating its composition.
Solution: Always run the Bacon decomposition for staggered designs to understand what the TWFE estimate represents. If heterogeneous treatment effects are plausible, supplement with (or switch to) a robust estimator: Callaway-Sant'Anna, Sun-Abraham, or Borusyak-Jaravel-Spiess. Report both for transparency.

Mistake 5: Confusing the ATT with the ATE

Problem: Interpreting the DiD estimate as the Average Treatment Effect (ATE) — the effect averaged over all units — when DiD actually identifies the ATT (Average Treatment Effect on the Treated).
Solution: Be precise in reporting: DiD estimates the ATT — the effect for the treated units specifically. The ATE (which would include the effect on control units) is not identified by DiD unless additional assumptions (or different estimators) are invoked.

Mistake 6: Treating the Pre-Trends Test as Definitive

Problem: Passing the pre-trends test (no significant pre-treatment coefficients) and concluding that parallel trends definitely holds in the post-period. This overstates the confidence in identification.
Solution: The pre-trends test is supportive evidence, not proof. Pre-treatment parallel trends do not guarantee post-treatment parallel trends. Supplement with theoretical arguments for why the groups would have trended similarly, with placebo tests, and with Rambachan-Roth sensitivity analysis. Be honest about residual uncertainty.

Mistake 7: Selecting the Control Group Retrospectively Based on Pre-Trends

Problem: Searching across many potential control groups and selecting the one that shows the most parallel pre-trends with the treated group. This "pre-trend matching" leads to data mining, overfitting, and inflated confidence in parallel trends.
Solution: Specify the control group a priori based on institutional knowledge and theoretical comparability, not post-hoc based on pre-trend patterns. If multiple control groups are plausible, report results for all of them and assess robustness.

Mistake 8: Ignoring Anticipation Effects

Problem: Treating the period immediately before the official treatment start as a clean pre-period, when in fact treated units may have already begun responding in anticipation of treatment.
Solution: Test for anticipation effects by examining whether the period just before treatment ( $k = -1$ ) shows a significant coefficient in the event study. Consider extending the "pre-period" to exclude periods potentially affected by anticipation. If anticipation is present, adjust the model (e.g., redefine the treatment as starting earlier).

Mistake 9: Using Levels When Parallel Trends Holds Only in Logs

Problem: Estimating the DiD in levels when treated and control groups are growing at the same rate (multiplicative parallel trends), rather than by the same amount (additive parallel trends). This produces spurious pre-trends on the levels scale.
Solution: Inspect raw pre-treatment trends in both levels and logs. If trends are parallel in logs but not levels, use the log outcome. Report the pre-trends test for the chosen specification and note the functional form assumption.

Mistake 10: Not Reporting the Full Event Study

Problem: Reporting only the single pooled DiD coefficient when the treatment has dynamic effects over multiple time periods, losing information about the timing, ramp-up, and persistence of the effect.
Solution: Always produce and report the full event study figure with pre- and post-treatment coefficients and confidence bands. The event study provides far more information than a single coefficient and is essential for assessing both identification (pre-trends) and the nature of the effect (dynamic patterns).

19. Troubleshooting

Issue	Likely Cause	Solution
DiD coefficient is unexpected sign	Parallel trends violation; wrong treatment assignment; data coding error	Check treatment coding; inspect raw trends plots; run event study for pre-trend evidence
Very large standard errors	Too few clusters ( $G < 5-10$ ); very small treatment group; collinearity	Use wild cluster bootstrap; report exact p-value (randomisation inference); check VIF
Significant pre-trends (parallel trends violated)	Systematic pre-existing trend differences; selection into treatment based on trends	Include unit-specific linear trends; use conditional parallel trends (covariates); consider synthetic control or matching
Event study coefficients show large dip at $k=-1$	Anticipation effects; treatment actually starts earlier than recorded	Extend pre-period; redefine treatment onset; check institutional knowledge
TWFE gives negative estimate despite positive raw DiD	Heterogeneous treatment effects in staggered design (Bacon negative weights)	Run Bacon decomposition; use Callaway-Sant'Anna or other robust staggered estimator
Never-treated group is very small	Limited comparison group; potential control group contamination	Expand the never-treated control; use only timing-variation comparisons; consider synthetic control
Pre-trends test passes but Rambachan-Roth bounds are wide	Weak pre-trend evidence; short pre-period; noisy outcome	Collect more pre-periods; use better outcome measure; report sensitivity analysis bounds prominently
Fixed effects absorb all treatment variation	Treatment is perfectly collinear with unit or time FE; no within-unit variation in $D_{it}$	Check whether treatment is time-varying; verify panel structure; ensure $D_{it}$ varies within units
Coefficient estimates change dramatically with different control sets	Model is sensitive to covariate specification; potential bad controls included	Report all specifications; identify and exclude bad controls (post-treatment variables); use doubly robust estimator
Wild cluster bootstrap gives $p$ -value of 1	Asymmetric bootstrap distribution; very few clusters; extreme outlier cluster	Increase bootstrap replications; use refinement bootstrap; investigate influential clusters; consider randomisation inference
Log outcome produces extreme predictions	Zeros in the outcome variable (log undefined)	Use inverse hyperbolic sine (IHS) transformation: $\text{arcsinh}(Y) = \ln(Y + \sqrt{Y^2+1})$ ; or use Poisson TWFE for count outcomes
DDD estimate is implausibly large	Interaction with ineligible group not clean; compositional changes	Verify eligibility classification; test for treatment effects on ineligible group (should be zero); check for spillovers
Staggered design: Callaway-Sant'Anna gives very wide CIs	Small cohort sizes; few pre-periods for some cohorts; sparse data	Report cohort-level ATTs separately; aggregate with caution; increase sample
Placebo treatment group shows significant effect	SUTVA violation (spillovers); contamination of control group	Investigate potential spillover mechanisms; redefine control group to exclude exposed units; report sensitivity to control group definition

20. Quick Reference Cheat Sheet

Core DiD Formulas

Formula	Description
$\hat{\delta}_{DiD} = (\bar{Y}_{1,1} - \bar{Y}_{1,0}) - (\bar{Y}_{0,1} - \bar{Y}_{0,0})$	2×2 DiD estimator
$Y_{it} = \alpha + \beta \cdot T_i + \gamma \cdot P_t + \delta \cdot (T_i \times P_t) + \epsilon_{it}$	2×2 DiD regression
$Y_{it} = \alpha_i + \lambda_t + \delta \cdot D_{it} + \mathbf{X}_{it}^T\boldsymbol{\beta} + \epsilon_{it}$	TWFE DiD
$\ddot{Y}_{it} = Y_{it} - \bar{Y}_i - \bar{Y}_t + \bar{Y}$	Within-transformation (demeaning)
$\hat{\delta}_{TWFE} = \sum_{it}\ddot{D}_{it}\ddot{Y}_{it} / \sum_{it}\ddot{D}_{it}^2$	TWFE estimator
$Y_{it} = \alpha_i + \lambda_t + \sum_{k \neq -1}\delta_k \cdot \mathbf{1}[t-T_i^*=k]D_i + \epsilon_{it}$	Event study regression
$t = \hat{\delta}/SE_{cluster}(\hat{\delta}) \sim t_{G-1}$	Test statistic (clustered)
$\hat{\delta} \pm t_{\alpha/2, G-1} \times SE_{cluster}(\hat{\delta})$	Confidence interval
$d_{DiD} = \hat{\delta}/s_{pre}$	Standardised effect size
$\%\Delta = (\hat{\delta}/\bar{Y}_{treated,pre}) \times 100$	Percent change effect

2×2 DiD Table Template

	Pre ( $t=0$ )	Post ( $t=1$ )	Difference
Treated ( $D=1$ )	$\bar{Y}_{1,0}$	$\bar{Y}_{1,1}$	$\bar{Y}_{1,1}-\bar{Y}_{1,0}$
Control ( $D=0$ )	$\bar{Y}_{0,0}$	$\bar{Y}_{0,1}$	$\bar{Y}_{0,1}-\bar{Y}_{0,0}$
Difference	$\bar{Y}_{1,0}-\bar{Y}_{0,0}$	$\bar{Y}_{1,1}-\bar{Y}_{0,1}$	$\hat{\delta}_{DiD}$

Standard Error Selection Guide

Setting	Recommended SE	When
Large $G$ ( $G > 50$ )	Cluster-robust (HC1)	Standard panel DiD
Moderate $G$ ( $30-50$ )	Cluster-robust with bias correction	Borderline case
Small $G$ ( $10-30$ )	Wild cluster bootstrap	Few clusters
Very few $G$ ( $< 10$ )	Randomisation inference	Only a few treated clusters
Cross-section with groups	Cluster at group level	Group-level treatment

Assumption Checklist

Assumption	How to Test	If Violated
Parallel trends	Pre-trends test (event study); placebo tests	Add covariates; unit-specific trends; Rambachan-Roth bounds
No anticipation	Check $k=-1$ event study coefficient	Redefine treatment timing; extend pre-period
SUTVA (no spillovers)	Placebo treatment on nearby controls; spillover tests	Redefine control group; use exclusion zones
No compositional change	Check covariate balance across periods	Use balanced panel; control for composition
Exogenous treatment timing	Institutional knowledge; pre-trends test	Instrument for timing; use conditional parallel trends
No interference	Study design; geographic checks	Spatial correlation SEs; define clean control zones

DiD Estimator Comparison for Staggered Designs

Estimator	Robust to Heterogeneity	Easy to Implement	Software
TWFE	❌	✅	Any regression software
Bacon Decomposition	Diagnostic only	✅	DataStatPro, bacondecomp (R/Stata)
Callaway-Sant'Anna	✅	Moderate	DataStatPro, did (R), csdid (Stata)
Sun-Abraham	✅	Moderate	DataStatPro, sunab (Stata)
de Chaisemartin-DH	✅	Moderate	DataStatPro, did_multiplegt (Stata)
Borusyak-Jaravel-Spiess	✅	Moderate	DataStatPro, did_imputation (Stata)

Effect Size Interpretation

Measure	Formula	Unit	Interpretation
DiD coefficient ( $\hat{\delta}$ )	Direct estimate	Same as $Y$	Absolute change in $Y$
Log DiD	$e^{\hat{\delta}} - 1$	% change	Percentage change in $Y$
Percent change	$\hat{\delta}/\bar{Y}_{treated,pre}$	%	Change relative to baseline
Cohen's $d$	$\hat{\delta}/s_{pre}$	SD units	Standardised effect
NNT	$1/	\hat{\delta}	$

Model Specification Checklist

Feature	Recommendation	Notes
Unit fixed effects	✅ Always include	Removes time-invariant confounders
Time fixed effects	✅ Always include	Removes common time shocks
Cluster-robust SEs	✅ Always use	Cluster at treatment assignment level
Pre-trends test	✅ Always report	Event study with $\geq 2$ pre-periods
Bacon decomposition	✅ For staggered	Diagnose TWFE composition
Robust staggered estimator	✅ For staggered	At least one robust estimator
Placebo tests	✅ Report	Placebo time, group, or outcome
Covariate balance table	✅ Report	Pre-treatment balance check
Unit-specific trends	⚠️ Use with caution	Only if strong pre-trend concern
Binary outcome LPM	✅ Preferred	More tractable than probit DiD
Log outcome	✅ If proportional trends	Check functional form

Key Identification Assumptions

Assumption	Formal Statement	Testable?	Diagnostic
Parallel trends	$E[\Delta Y(0) \mid D=1] = E[\Delta Y(0) \mid D=0]$	Partially (pre-period only)	Event study pre-trends
No anticipation	$Y_{it}(1) = Y_{it}(0)$ for $t < T_i^*$	Yes (pre-period $k=-1$ )	Check $\hat{\delta}_{-1}$
SUTVA	No spillovers; one version of treatment	Partially	Geographic placebo; excluded-zone test
Exogenous timing	$T_i^*$ not determined by anticipated outcomes	Partially	Pre-trend test; institutional knowledge
Stable composition	Sample composition unchanged by treatment	Yes	Covariate balance across periods

This tutorial provides a comprehensive foundation for understanding, applying, and interpreting Difference-in-Differences Models using the DataStatPro application. For further reading, consult Angrist & Pischke's "Mostly Harmless Econometrics" (Princeton University Press, 2009), Callaway & Sant'Anna's "Difference-in-Differences with Multiple Time Periods" (Journal of Econometrics, 2021), Roth et al.'s "What's Trending in Difference-in-Differences?" (Journal of Econometrics, 2023), or Goodman-Bacon's "Difference-in-Differences with Variation in Treatment Timing" (Journal of Econometrics, 2021). For feature requests or support, contact the DataStatPro team.

Difference-in-Differences

Difference-in-Differences Models: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 Counterfactuals and the Potential Outcomes Framework

1.2 Selection Bias

1.3 Panel Data

1.4 Fixed Effects

1.5 Ordinary Least Squares (OLS) Regression

1.6 Treatment Assignment and Natural Experiments

2. What is the Difference-in-Differences Design?

2.1 The Core Idea

2.2 The 2×2 DiD Setup

2.3 Real-World Applications

2.4 DiD vs. Other Quasi-Experimental Methods

3. The Mathematical Framework

3.1 The Canonical 2×2 DiD Model

3.2 The Two-Way Fixed Effects (TWFE) Model

3.3 The Within-Estimator Interpretation

3.4 Potential Outcomes Representation

4. The Parallel Trends Assumption

4.1 The Assumption Stated

4.2 Visualising Parallel Trends

4.3 When is Parallel Trends Plausible?

4.4 Parallel Trends in Different Functional Forms

4.5 Conditional Parallel Trends

5. Identification and Causal Inference

5.1 What DiD Identifies

5.2 The No Anticipation Assumption

5.3 The Stable Unit Treatment Value Assumption (SUTVA)

5.4 Exogeneity of Treatment Timing

5.5 DiD as a Special Case of the Fixed Effects Estimator

6. Standard DiD Estimation

6.1 OLS Estimation of the Canonical DiD

6.2 TWFE Estimation with Panel Data

6.3 First Differences Estimator

6.4 Weighted DiD

6.5 DiD with Repeated Cross-Sections

7. Hypothesis Testing and Inference

7.1 Standard Error Choices

7.1.1 OLS Standard Errors

7.1.2 Heteroscedasticity-Robust (HC) Standard Errors

7.1.3 Cluster-Robust Standard Errors

7.1.4 Wild Cluster Bootstrap

7.2 The Wald Test for the DiD Coefficient

7.3 F-Test for Joint Significance

7.4 Inference with Few Treated Units

8. Effect Size Measures

8.1 The DiD Coefficient as Effect Size

8.2 Percent Change Effect

8.3 Standardised Effect Size (Cohen's d Analogue)

8.4 Relative Reduction/Increase

8.5 Number Needed to Treat (NNT)

8.6 R2R^2R2 and Explained Variance

9. Model Fit and Evaluation

9.1 Goodness-of-Fit Statistics

9.2 Fit of the Counterfactual

9.3 Information Criteria for Model Comparison

9.4 Assessing Balance on Pre-Treatment Covariates

10. Diagnostics and Assumption Testing

10.1 Pre-Trends Test (Event Study)

10.2 Placebo Tests

10.2.1 Placebo Time Periods

10.2.2 Placebo Treatment Groups

10.2.3 Outcome Placebo Tests

10.3 Sensitivity to Parallel Trends Violations

10.4 Testing for Anticipation Effects

10.5 Checking for Compositional Changes

10.6 Residual Diagnostics

11. Extensions: Staggered DiD and Multiple Time Periods

11.1 Staggered Treatment Adoption

11.2 The Bacon Decomposition

11.3 Robust Staggered DiD Estimators

11.3.1 Callaway & Sant'Anna (2021) — Cohort-Specific ATTs

11.3.2 Sun & Abraham (2021) — Interaction-Weighted Estimator

11.3.3 de Chaisemartin & D'Haultfœuille (2020) — DIDM\text{DID}_MDIDM​

11.3.4 Borusyak, Jaravel & Spiess (2024) — Imputation Estimator

11.4 Choosing Among Staggered DiD Estimators

12. Extensions: Heterogeneous Treatment Effects

12.1 Why Treatment Effects May Be Heterogeneous

8.6 $R^2$ and Explained Variance

11.3.3 de Chaisemartin & D'Haultfœuille (2020) — $\text{DID}_M$