Survival Analysis: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of Survival Analysis all the way through advanced model specification, estimation, evaluation, and practical usage within the DataStatPro application. Whether you are encountering survival analysis for the first time or looking to deepen your understanding of time-to-event methods, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is Survival Analysis?
The Mathematics Behind Survival Analysis
Assumptions of Survival Analysis
Types of Survival Analysis Methods
Using the Survival Analysis Component
Data Structure and Censoring
Non-Parametric Methods: Kaplan-Meier Estimation
Comparing Survival Curves: Log-Rank and Related Tests
Semi-Parametric Methods: Cox Proportional Hazards Model
Parametric Survival Models
Model Fit and Evaluation
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into survival analysis, it is helpful to be familiar with the following foundational concepts. Each is briefly reviewed below.

1.1 Probability and Probability Distributions

A probability distribution describes the likelihood of all possible outcomes of a random variable. In survival analysis, the primary random variable is time — specifically, the time until a specific event occurs.

The probability density function (PDF) $f(t)$ of a continuous random variable $T$ satisfies:

$f(t) \geq 0 \quad \text{and} \quad \int_0^{\infty} f(t) \, dt = 1$

The cumulative distribution function (CDF) $F(t)$ gives the probability that the event has occurred by time $t$ :

$F(t) = P(T \leq t) = \int_0^t f(u) \, du$

1.2 The Complement: Survival Probability

The survival function $S(t)$ is the complement of the CDF — it gives the probability of surviving (not experiencing the event) beyond time $t$ :

$S(t) = P(T > t) = 1 - F(t) = \int_t^{\infty} f(u) \, du$

Key properties:

$S(0) = 1$ : At the start, everyone has survived (not yet experienced the event).
$S(\infty) = 0$ : Eventually, all individuals will experience the event (assuming all eventually do).
$S(t)$ is a non-increasing (monotonically decreasing) function of time.

1.3 Rates and Conditional Probability

A rate measures how frequently an event occurs per unit of time. The conditional probability $P(A \mid B)$ is the probability of event $A$ given that event $B$ has already occurred:

$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$

In survival analysis, we often ask: "Given that an individual has survived until time $t$ , what is the probability they experience the event in the next small interval of time?" This is the hazard — one of the central concepts in survival analysis.

1.4 Likelihood and Maximum Likelihood Estimation

The likelihood function $L(\boldsymbol{\theta})$ measures how probable the observed data are, given a set of model parameters $\boldsymbol{\theta}$ :

$L(\boldsymbol{\theta}) = \prod_{i=1}^{n} P(\text{observation}_i \mid \boldsymbol{\theta})$

Maximum Likelihood Estimation (MLE) finds the parameter values $\hat{\boldsymbol{\theta}}$ that maximise $L(\boldsymbol{\theta})$ (or equivalently, maximise the log-likelihood $\ell(\boldsymbol{\theta}) = \ln L(\boldsymbol{\theta})$ ). Most survival models are estimated via MLE or partial MLE.

1.5 The Exponential Distribution — A Simple Survival Model

The exponential distribution is the simplest parametric model for survival data. If $T \sim \text{Exp}(\lambda)$ :

$f(t) = \lambda e^{-\lambda t}, \quad t \geq 0$

$S(t) = e^{-\lambda t}$

$F(t) = 1 - e^{-\lambda t}$

Where $\lambda > 0$ is the rate parameter (events per unit time). The mean survival time is $E(T) = 1/\lambda$ .

The exponential distribution has the memoryless property: the probability of experiencing the event in the next moment does not depend on how long an individual has already survived. This is a very strong and often unrealistic assumption — more flexible distributions are usually needed.

1.6 Integration and Differentiation (Brief Review)

Survival analysis involves calculus. The key relationships to remember:

$f(t) = \frac{d}{dt}F(t) = -\frac{d}{dt}S(t)$

$S(t) = \int_t^{\infty} f(u) \, du$

These relationships connect the PDF, CDF, and survival function and are used extensively in deriving hazard functions and likelihood contributions.

2. What is Survival Analysis?

2.1 The Core Question

Survival analysis is a branch of statistics that analyses the time until a specific event of interest occurs. The defining questions are:

How long until the event occurs?
What is the probability of surviving beyond a given time?
Do different groups have different survival experiences?
What factors are associated with shorter or longer time to the event?

Despite the name "survival analysis," the event of interest does not have to be death. It can be any well-defined, non-repeating event:

Field	Event of Interest	Time Variable
Medicine	Death, disease recurrence, hospital discharge	Months from diagnosis
Engineering	Machine failure, component breakdown	Hours of operation
Finance	Loan default, customer churn, bankruptcy	Days since account opening
Psychology	Relapse after treatment, onset of disorder	Weeks since therapy
Marketing	Customer purchase, subscription cancellation	Days since sign-up
Social Science	Marriage, unemployment ending, graduation	Years from event start
Manufacturing	Product defect, warranty claim	Cycles of use

2.2 What Makes Survival Data Special?

Survival data has two unique characteristics that make it unsuitable for standard regression or t-tests:

1. Censoring: Not all individuals experience the event during the observation period. An individual who is still alive (or event-free) at the end of the study has a censored observation — we know they survived at least until time $t_c$ , but we do not know their full survival time.

2. Skewed, positive-only time distributions: Survival times are always positive and often highly right-skewed (many short times, few very long times). Standard methods that assume normality are inappropriate.

These characteristics require special methods — the tools of survival analysis — to produce valid estimates and inferences.

2.3 The Three Central Functions

Every survival analysis revolves around three mathematically related functions. Understanding all three is essential:

Function	Symbol	Question Answered	Range
Survival Function	$S(t)$	What is the probability of surviving past time $t$ ?	$[0, 1]$
Hazard Function	$h(t)$	At time $t$ , how fast are events occurring among survivors?	$[0, \infty)$
Cumulative Hazard	$H(t)$	Total accumulated risk of the event up to time $t$	$[0, \infty)$

These three functions are mathematically equivalent — knowing one fully determines the other two.

2.4 Survival Analysis vs. Standard Regression

Feature	Standard Regression	Survival Analysis
Outcome variable	Continuous, binary, count	Time to event
Censoring handled?	No	Yes — fundamental feature
Distribution assumed	Normal (OLS)	Exponential, Weibull, etc.
Negative outcomes	Possible	Never (time $\geq 0$ )
Primary interest	Mean or probability	Survival function, hazard
Key model	Linear / Logistic	Kaplan-Meier, Cox, Parametric

3. The Mathematics Behind Survival Analysis

3.1 The Survival Function $S(t)$

The survival function is the probability that the event time $T$ exceeds a specified time $t$ :

$S(t) = P(T > t) = 1 - F(t)$

Key mathematical properties:

$S(0) = 1$ (all subjects start event-free).
$\lim_{t \to \infty} S(t) = 0$ (all eventually experience the event, assuming no cure).
$S(t)$ is right-continuous and non-increasing.
$S(t)$ is a step function when estimated non-parametrically (Kaplan-Meier).

3.2 The Hazard Function $h(t)$

The hazard function (also called the hazard rate or instantaneous failure rate) is the conditional rate at which the event occurs at time $t$ , given survival up to $t$ :

$h(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t \mid T \geq t)}{\Delta t}$

This can be re-expressed using the PDF and survival function:

$h(t) = \frac{f(t)}{S(t)}$

Interpretation: $h(t) \cdot \Delta t \approx P(\text{event in } [t, t+\Delta t] \mid \text{survived to } t)$ for small $\Delta t$ .

Key properties of the hazard function:

$h(t) \geq 0$ for all $t$ (it is a rate, never negative).
It has no upper bound (unlike a probability).
It is not a probability — it is a rate (events per unit time among those at risk).
Different distributions produce different hazard shapes:

Distribution	Hazard Shape
Exponential	Constant (flat)
Weibull ( $\kappa > 1$ )	Monotonically increasing
Weibull ( $\kappa < 1$ )	Monotonically decreasing
Weibull ( $\kappa = 1$ )	Constant (= Exponential)
Log-normal	Initially increases then decreases (hump-shaped)
Log-logistic	Initially increases then decreases (hump-shaped)
Gompertz	Monotonically increasing (exponentially)
Piecewise constant	Constant within intervals, steps between

3.3 The Cumulative Hazard Function $H(t)$

The cumulative hazard function is the integral of the hazard function over time:

$H(t) = \int_0^t h(u) \, du$

It represents the total accumulated risk of the event from time 0 to time $t$ .

3.4 The Fundamental Relationship Between $S(t)$ , $h(t)$ , and $H(t)$

The three functions are mathematically equivalent through the following relationships:

From hazard to survival:

$S(t) = \exp\left(-\int_0^t h(u) \, du\right) = e^{-H(t)}$

From survival to hazard:

$h(t) = -\frac{d}{dt}\ln S(t) = \frac{f(t)}{S(t)}$

From survival to cumulative hazard:

$H(t) = -\ln S(t)$

From PDF to survival:

$S(t) = \int_t^{\infty} f(u) \, du$

These relationships form the mathematical backbone of survival analysis. Every parametric model defines one of these functions — the others follow automatically.

3.5 The Mean and Median Survival Time

The mean survival time (restricted to time $[0, \tau]$ ) is the area under the survival curve:

$E(T) = \int_0^{\tau} S(t) \, dt$

The median survival time $t_{0.5}$ is the time at which exactly half of the subjects have experienced the event (i.e., where the survival curve crosses 0.50):

$S(t_{0.5}) = 0.50$

Similarly, the $q$ -th quantile of survival is:

$t_q = S^{-1}(1 - q)$

💡 The median is usually preferred over the mean for survival data because the mean is sensitive to a few very long survival times (right tail), and may be undefined if the survival curve never reaches zero (i.e., if not all subjects experience the event).

3.6 The Likelihood Contribution of Censored Observations

A key mathematical challenge in survival analysis is constructing the likelihood function correctly for censored observations. Each observation contributes to the likelihood differently:

Uncensored (event occurred at time $t_i$ ):

$L_i = f(t_i) = h(t_i) \cdot S(t_i)$

Right-censored (event not observed; known to have survived to $t_i$ ):

$L_i = S(t_i) = P(T > t_i)$

General likelihood contribution:

$L_i = [h(t_i)]^{\delta_i} \cdot S(t_i)$

Where $\delta_i = 1$ if the event was observed (uncensored) and $\delta_i = 0$ if censored.

Full likelihood for $n$ independent observations:

$L(\boldsymbol{\theta}) = \prod_{i=1}^{n} [h(t_i; \boldsymbol{\theta})]^{\delta_i} \cdot S(t_i; \boldsymbol{\theta})$

Log-likelihood:

$\ell(\boldsymbol{\theta}) = \sum_{i=1}^{n} \left[\delta_i \ln h(t_i; \boldsymbol{\theta}) + \ln S(t_i; \boldsymbol{\theta})\right]$

$= \sum_{i=1}^{n} \left[\delta_i \ln h(t_i; \boldsymbol{\theta}) - H(t_i; \boldsymbol{\theta})\right]$

This formulation correctly accounts for the information provided by both events and censored observations.

3.7 Common Parametric Survival Distributions

Exponential Distribution

$h(t) = \lambda$

$S(t) = e^{-\lambda t}$

$f(t) = \lambda e^{-\lambda t}$

Parameter: $\lambda > 0$ (rate). Mean = $1/\lambda$ .

Weibull Distribution

The Weibull is the most widely used parametric survival model due to its flexibility:

$h(t) = \frac{\kappa}{\lambda}\left(\frac{t}{\lambda}\right)^{\kappa - 1}$

$S(t) = \exp\left[-\left(\frac{t}{\lambda}\right)^{\kappa}\right]$

$f(t) = \frac{\kappa}{\lambda}\left(\frac{t}{\lambda}\right)^{\kappa-1}\exp\left[-\left(\frac{t}{\lambda}\right)^{\kappa}\right]$

Parameters: $\lambda > 0$ (scale), $\kappa > 0$ (shape).

$\kappa = 1$ : Reduces to exponential (constant hazard).
$\kappa > 1$ : Increasing hazard (e.g., aging, wear-out).
$\kappa < 1$ : Decreasing hazard (e.g., infant mortality, early failure).

Log-Normal Distribution

$S(t) = 1 - \Phi\left(\frac{\ln t - \mu}{\sigma}\right)$

$h(t) = \frac{\phi\left(\frac{\ln t - \mu}{\sigma}\right)}{\sigma t \left[1 - \Phi\left(\frac{\ln t - \mu}{\sigma}\right)\right]}$

Parameters: $\mu$ (log-scale mean), $\sigma > 0$ (log-scale standard deviation). Produces a hump-shaped hazard (increases then decreases).

Log-Logistic Distribution

$S(t) = \frac{1}{1 + (t/\alpha)^{\gamma}}$

$h(t) = \frac{(\gamma/\alpha)(t/\alpha)^{\gamma-1}}{1 + (t/\alpha)^{\gamma}}$

Parameters: $\alpha > 0$ (scale), $\gamma > 0$ (shape). Produces a hump-shaped hazard when $\gamma > 1$ and a monotone decreasing hazard when $\gamma \leq 1$ .

Gompertz Distribution

$h(t) = \lambda e^{\gamma t}$

$S(t) = \exp\left[-\frac{\lambda}{\gamma}(e^{\gamma t} - 1)\right]$

Parameters: $\lambda > 0$ (initial hazard), $\gamma$ (rate of increase). Widely used in actuarial science and demography. Produces a monotonically increasing hazard.

3.8 Summary Table of Parametric Distributions

Distribution	Parameters	Hazard Shape	Best For
Exponential	$\lambda$	Constant	Simple baseline; memoryless events
Weibull	$\lambda, \kappa$	Monotone (increasing/decreasing/flat)	Most general-purpose analyses
Log-Normal	$\mu, \sigma$	Hump (up then down)	Medical remission, biological processes
Log-Logistic	$\alpha, \gamma$	Hump (or decreasing)	Similar to log-normal; has closed-form $S(t)$
Gompertz	$\lambda, \gamma$	Monotone increasing	Mortality, ageing studies
Generalised Gamma	$\mu, \sigma, Q$	Very flexible	Model selection; nests Weibull, log-normal

4. Assumptions of Survival Analysis

4.1 Non-Informative Censoring

The most critical assumption in survival analysis is that censoring is non-informative — the reason an observation is censored must be independent of the true (unobserved) event time.

Examples of non-informative censoring (acceptable):

Study ends on a fixed calendar date and the participant has not yet experienced the event.
Participant is still being followed at the time of analysis.
Participant was lost to follow-up for administrative reasons (e.g., moved country).

Examples of informative censoring (problematic):

Participants drop out of the study because they are getting sicker (their censoring time is related to their imminent event time).
A machine is removed from service because it is showing signs of failure.

⚠️ Informative censoring leads to biased estimates of the survival function and hazard. There is no purely statistical remedy — the study design must ensure non-informative censoring.

4.2 Independence of Observations

Each individual's survival time must be independent of all others. This is violated in:

Clustered data: Patients treated at the same hospital, family members, same litter animals.
Recurrent events: Multiple events per individual.
Matched data: Matched pairs or matched sets.

Solutions include frailty models (random effects), marginal models, and conditional models.

4.3 Proportional Hazards (Cox Model)

When using the Cox proportional hazards model (Section 10), the key assumption is that the hazard ratio between any two individuals is constant over time:

$\frac{h_i(t)}{h_j(t)} = \text{constant for all } t$

This means the hazard functions of any two individuals are proportional — they never cross. This is a strong assumption that must be checked (see Section 10.6).

4.4 Correct Parametric Form (Parametric Models)

When using parametric models, the assumed distribution must adequately describe the data. Choosing an incorrect parametric family leads to biased estimates. This must be checked via:

Graphical diagnostics (log-log plots, hazard plots).
Formal goodness-of-fit tests.
AIC/BIC comparison between distributions.

4.5 Consistent Definition of Time Origin and Event

The time origin (time zero) must be consistently and clearly defined for all subjects:

Date of diagnosis.
Date of surgery.
Date of study entry.
Date of birth.

Similarly, the event must be defined unambiguously — the same event criterion must be applied to all subjects.

⚠️ Inconsistent definitions of time origin or event across subjects will produce biased survival estimates and hazard ratios. This is a design issue, not a statistical one.

4.6 Sufficient Follow-Up and Events

Survival analysis requires:

Sufficient follow-up time so that a meaningful proportion of subjects experience the event.
A minimum number of observed events (not total subjects) for reliable estimation:
- Kaplan-Meier curves: At least 20–30 events for stable estimates.
- Cox model: At least 10 events per predictor variable (EPV).
- Parametric models: At least 50–100 events for reliable parameter estimation.

5. Types of Survival Analysis Methods

Survival analysis methods are broadly classified by the degree of parametric assumption they make:

5.1 Non-Parametric Methods

Non-parametric methods make no assumptions about the shape of the survival distribution or hazard function. They let the data speak for themselves.

Method	Purpose
Kaplan-Meier Estimator	Estimate the survival curve for one or more groups
Nelson-Aalen Estimator	Estimate the cumulative hazard function
Log-Rank Test	Compare survival curves between two or more groups
Wilcoxon (Breslow) Test	Compare survival curves with emphasis on early times
Peto-Peto Test	Compare survival curves with modified weighting

Advantages: No distributional assumptions; easy to visualise.
Disadvantages: Cannot adjust for multiple covariates simultaneously; no regression coefficients.

5.2 Semi-Parametric Methods

Semi-parametric methods make some parametric assumptions (about the covariate effects) but leave the baseline hazard unspecified.

Method	Purpose
Cox Proportional Hazards Model	Assess covariate effects on survival while controlling for confounders
Cox Model with Time-Varying Covariates	Covariates that change in value over follow-up
Stratified Cox Model	Allow baseline hazard to differ across strata
Frailty Models (Semi-Parametric)	Account for unobserved heterogeneity and clustering

Advantages: Handles multiple covariates; yields hazard ratios; robust (no distribution assumption).
Disadvantages: Cannot directly predict survival times; hazard ratios assume proportionality.

5.3 Parametric Methods

Parametric methods assume a specific distributional form for the survival times. They are more efficient (better precision) when the assumed distribution is correct.

Method	Purpose
Exponential Regression	Simplest parametric model; constant hazard
Weibull Regression	Flexible monotone hazard; generalises exponential
Log-Normal Regression	Hump-shaped hazard; log-transformed normal model
Log-Logistic Regression	Hump-shaped hazard; has closed-form survival function
Gompertz Regression	Exponentially increasing hazard; ageing/mortality
Generalised Gamma	Very flexible; nests several other distributions
Accelerated Failure Time (AFT) Models	Covariate effects accelerate or decelerate time

Advantages: Can predict absolute survival times; extrapolate beyond observed data; more efficient with correct specification.
Disadvantages: Wrong distributional assumption leads to biased results; more sensitive to model misspecification.

5.4 The Method Selection Framework

Subject 1: |—————————————X| Event at t=12 Subject 2: |————————————————————O| Censored at t=24 Subject 3: |——————X| Event at t=6 Subject 4: |————————————————————————————O| Censored at t=36 Subject 5: |—————————————————X| Event at t=18

X = Event observed O = Censored

💡 Always plot the event timeline before analysis. It reveals data quality issues such as negative times, very short follow-up, or suspiciously high censoring proportions.

7.4 The Risk Set

The risk set $\mathcal{R}(t)$ at time $t$ is the set of all subjects who are:

Still under observation at time $t$ (have not been censored before $t$ ), AND
Have not yet experienced the event before time $t$ .

The number at risk $n(t) = |\mathcal{R}(t)|$ decreases over time as subjects experience the event or are censored. A number-at-risk table is routinely displayed below Kaplan-Meier plots to communicate how many subjects contribute information at each time point.

8. Non-Parametric Methods: Kaplan-Meier Estimation

8.1 The Kaplan-Meier (Product-Limit) Estimator

The Kaplan-Meier (KM) estimator is the most widely used method for estimating the survival function non-parametrically. It is also called the product-limit estimator because it computes $\hat{S}(t)$ as a product of conditional survival probabilities at each event time.

Let $t_1 < t_2 < \dots < t_k$ be the $k$ distinct ordered event times (only times at which events occur, not censoring times). At each event time $t_j$ , let:

$d_j$ = number of events (deaths) occurring at $t_j$ .
$n_j$ = number of subjects at risk just before $t_j$ (the risk set size).

The conditional probability of the event at $t_j$ (given survival to $t_j$ ):

$\hat{q}_j = \frac{d_j}{n_j}$

The conditional probability of surviving past $t_j$ (given survival to $t_j$ ):

$\hat{p}_j = 1 - \hat{q}_j = 1 - \frac{d_j}{n_j} = \frac{n_j - d_j}{n_j}$

The Kaplan-Meier survival estimate at time $t$ is the product of all conditional survival probabilities at event times up to and including $t$ :

$\hat{S}(t) = \prod_{j: t_j \leq t} \left(1 - \frac{d_j}{n_j}\right) = \prod_{j: t_j \leq t} \frac{n_j - d_j}{n_j}$

8.2 Worked Computation of the KM Estimator

Suppose $n = 10$ subjects are followed, with the following event/censoring times (sorted):

Time	Status	$n_j$	$d_j$	$\hat{p}_j = (n_j - d_j)/n_j$	$\hat{S}(t)$
3	1 (event)	10	1	$9/10 = 0.900$	$0.900$
5	0 (censor)	—	—	—	$0.900$
7	1 (event)	8	1	$7/8 = 0.875$	$0.900 \times 0.875 = 0.788$
10	1 (event)	7	2	$5/7 = 0.714$	$0.788 \times 0.714 = 0.563$
14	0 (censor)	—	—	—	$0.563$
18	1 (event)	4	1	$3/4 = 0.750$	$0.563 \times 0.750 = 0.422$
22	0 (censor)	—	—	—	$0.422$
25	1 (event)	2	1	$1/2 = 0.500$	$0.422 \times 0.500 = 0.211$

Key notes:

The subject censored at $t = 5$ contributed to the risk set at $t = 3$ but not at $t = 7$ (they left before $t = 7$ ).
The KM estimate changes only at event times, not at censoring times.
Between event times, $\hat{S}(t)$ remains constant (step function).
At $t = 10$ , two events occurred simultaneously ( $d_j = 2$ ); $n_j$ is the number at risk just before $t = 10$ .

8.3 Confidence Intervals for the KM Estimator

Greenwood's Formula for the variance of $\hat{S}(t)$ :

$\widehat{\text{Var}}[\hat{S}(t)] = [\hat{S}(t)]^2 \sum_{j: t_j \leq t} \frac{d_j}{n_j(n_j - d_j)}$

Plain (linear) confidence interval:

$\hat{S}(t) \pm z_{\alpha/2} \sqrt{\widehat{\text{Var}}[\hat{S}(t)]}$

This can produce intervals outside $[0, 1]$ for extreme times. The log-log transformed confidence interval is preferred because it always stays within $[0, 1]$ :

Log-log (complementary log-log) transformation:

Let $\hat{U}(t) = \ln[-\ln \hat{S}(t)]$ . The variance is approximated by:

$\widehat{\text{Var}}[\hat{U}(t)] = \frac{1}{[\ln \hat{S}(t)]^2} \sum_{j: t_j \leq t} \frac{d_j}{n_j(n_j - d_j)}$

The 95% CI for $U(t)$ is $\hat{U}(t) \pm 1.96 \sqrt{\widehat{\text{Var}}[\hat{U}(t)]}$ , which transforms back to a CI for $S(t)$ that stays within $(0, 1)$ :

$\left[\hat{S}(t)^{\exp(1.96\sqrt{\widehat{\text{Var}}[\hat{U}(t)]})}, \quad \hat{S}(t)^{\exp(-1.96\sqrt{\widehat{\text{Var}}[\hat{U}(t)]})}\right]$

8.4 The Nelson-Aalen Estimator of the Cumulative Hazard

An alternative non-parametric estimator is the Nelson-Aalen estimator of the cumulative hazard function:

$\hat{H}(t) = \sum_{j: t_j \leq t} \frac{d_j}{n_j}$

The corresponding survival estimate derived from the Nelson-Aalen estimator is:

$\tilde{S}(t) = \exp(-\hat{H}(t))$

The Nelson-Aalen estimator is less biased than the KM estimator in small samples and is particularly useful for:

Plotting the cumulative hazard to assess parametric distribution fit (see Section 12).
Computing the baseline cumulative hazard in the Cox model.
Assessing the proportional hazards assumption graphically.

8.5 Median Survival Time and Confidence Interval

The KM estimate of the median survival time $\hat{t}_{0.5}$ is the smallest time $t$ such that $\hat{S}(t) \leq 0.50$ :

$\hat{t}_{0.5} = \inf\{t : \hat{S}(t) \leq 0.50\}$

The 95% confidence interval for the median is computed using the Brookmeyer-Crowley method, which inverts the confidence band for $\hat{S}(t)$ to find the times corresponding to the upper and lower CI bounds for $S(t) = 0.50$ .

⚠️ The median survival time is undefined (cannot be estimated) if the KM curve never drops to or below 0.50. This happens when fewer than half the subjects experience the event.

8.6 Reading and Reporting a Kaplan-Meier Curve

A complete KM plot includes:

The step function $\hat{S}(t)$ — drops at each event time.
Confidence band (typically 95%) shown as a shaded region or dashed lines.
Censoring marks — small vertical tick marks at censored times on the survival curve.
Number at risk table — shown below the x-axis at selected time points.
Median survival time — horizontal dashed line at $S(t) = 0.50$ , reading off the x-axis.

9. Comparing Survival Curves: Log-Rank and Related Tests

9.1 The Log-Rank Test

The log-rank test (also called the Mantel-Cox test) is the standard non-parametric test for comparing survival curves between two or more groups. It tests the null hypothesis:

$H_0: S_1(t) = S_2(t) = \dots = S_g(t) \quad \text{for all } t$

$H_1: \text{At least one group has a different survival function}$

At each distinct event time $t_j$ in the combined dataset, the test compares the observed number of events in each group to the expected number under $H_0$ (if all groups had the same survival function).

For each event time $t_j$ and group $g$ , the expected number of events under $H_0$ is:

$e_{gj} = n_{gj} \cdot \frac{d_j}{n_j}$

Where:

$n_{gj}$ = number at risk in group $g$ at time $t_j$ .
$d_j$ = total number of events at $t_j$ across all groups.
$n_j$ = total number at risk at $t_j$ across all groups.

The log-rank test statistic for two groups ( $g = 1, 2$ ) is:

$\chi^2_{\text{LR}} = \frac{\left(\sum_j (d_{1j} - e_{1j})\right)^2}{\sum_j V_{1j}}$

Where $V_{1j}$ is the hypergeometric variance at event time $t_j$ :

$V_{1j} = \frac{n_{1j} \cdot n_{2j} \cdot d_j \cdot (n_j - d_j)}{n_j^2(n_j - 1)}$

Under $H_0$ , $\chi^2_{\text{LR}}$ follows approximately a $\chi^2$ distribution with $g - 1$ degrees of freedom (where $g$ is the number of groups).

9.2 Weighted Log-Rank Tests

The log-rank test gives equal weight to all event times. Alternative tests weight earlier or later event times more heavily:

Test	Weight $w_j$	Emphasises	Best When
Log-Rank (Mantel-Cox)	$1$	All times equally	Groups differ throughout
Breslow (Wilcoxon)	$n_j$	Early times	Groups differ early
Peto-Peto	$\tilde{S}(t_j)$	Early times	Groups differ early (modified)
Fleming-Harrington	$[\hat{S}(t_j^-)]^{\rho}(1-\hat{S}(t_j^-))^{\gamma}$	Flexible	User-specified weighting

The Fleming-Harrington test with parameters $\rho$ and $\gamma$ is the most flexible:

$\rho = 0, \gamma = 0$ : Log-rank test.
$\rho = 1, \gamma = 0$ : Breslow (Wilcoxon) test.
$\rho = 0, \gamma = 1$ : Emphasises late differences.
$\rho = 1, \gamma = 1$ : Emphasises middle differences.

9.3 Stratified Log-Rank Test

When groups differ on a confounding variable (a variable that affects survival and is unevenly distributed across groups), the stratified log-rank test controls for the confounder by computing the log-rank statistic separately within each stratum and then combining:

$\chi^2_{\text{stratified}} = \frac{\left(\sum_s \sum_j (d_{1js} - e_{1js})\right)^2}{\sum_s \sum_j V_{1js}}$

Where $s$ indexes the strata.

9.4 Interpreting the Log-Rank Test

Result	Interpretation
$p < 0.05$	Reject $H_0$ — significant evidence that survival curves differ between groups
$p \geq 0.05$	Fail to reject $H_0$ — insufficient evidence of a difference

⚠️ The log-rank test is most powerful when hazard ratios are constant over time (proportional hazards). If survival curves cross (non-proportional hazards), the log-rank test may be underpowered. In this case, use the Fleming-Harrington test with appropriate weights, or report the weighted tests alongside the standard log-rank.

9.5 Pairwise Comparisons After a Significant Log-Rank Test

When comparing $g > 2$ groups and the overall log-rank test is significant, pairwise log-rank tests can identify which specific groups differ. Apply a Bonferroni correction (or similar multiple comparison adjustment) to control the family-wise error rate:

$\alpha^* = \frac{\alpha}{\binom{g}{2}} = \frac{0.05}{g(g-1)/2}$

For example, with $g = 3$ groups: $\alpha^* = 0.05 / 3 = 0.0167$ .

10. Semi-Parametric Methods: Cox Proportional Hazards Model

10.1 The Cox Model

The Cox proportional hazards model (Cox, 1972) is the most widely used regression model in survival analysis. It relates the hazard at time $t$ for individual $i$ to their covariate values $\mathbf{x}_i = (x_{i1}, x_{i2}, \dots, x_{ip})^T$ :

$h(t \mid \mathbf{x}_i) = h_0(t) \cdot \exp(\beta_1 x_{i1} + \beta_2 x_{i2} + \dots + \beta_p x_{ip})$

Or equivalently:

$h(t \mid \mathbf{x}_i) = h_0(t) \cdot \exp(\boldsymbol{\beta}^T \mathbf{x}_i)$

Where:

$h_0(t)$ is the baseline hazard function — the hazard for an individual with all covariates equal to zero. It is left completely unspecified (non-parametric) — this is what makes the Cox model semi-parametric.
$\boldsymbol{\beta} = (\beta_1, \beta_2, \dots, \beta_p)^T$ is the vector of regression coefficients to be estimated.
$\exp(\boldsymbol{\beta}^T \mathbf{x}_i)$ is the relative risk (risk multiplier) for individual $i$ relative to the baseline.

10.2 The Proportional Hazards Property

The ratio of hazards between any two individuals $i$ and $j$ is constant over time:

$\frac{h(t \mid \mathbf{x}_i)}{h(t \mid \mathbf{x}_j)} = \frac{h_0(t) \exp(\boldsymbol{\beta}^T \mathbf{x}_i)}{h_0(t) \exp(\boldsymbol{\beta}^T \mathbf{x}_j)} = \exp[\boldsymbol{\beta}^T(\mathbf{x}_i - \mathbf{x}_j)]$

The $h_0(t)$ terms cancel — the hazard ratio depends only on the covariate difference, not on time. This is the proportional hazards assumption: hazard functions of any two individuals are proportional to each other (they never cross).

10.3 The Hazard Ratio (HR)

For a single binary covariate $x$ (e.g., treatment vs. control, coded 1 vs. 0):

$\text{HR} = \frac{h(t \mid x = 1)}{h(t \mid x = 0)} = e^{\beta}$

Interpretation of the hazard ratio:

HR	Interpretation
$\text{HR} > 1$	Covariate increases the hazard (increases risk of event)
$\text{HR} = 1$	Covariate has no effect on the hazard
$\text{HR} < 1$	Covariate decreases the hazard (protective effect)

For a continuous covariate $x$ : $e^{\beta}$ is the HR for a one-unit increase in $x$ , holding all other covariates constant.

The $95\%$ confidence interval for the HR is:

$\left[e^{\hat{\beta} - 1.96 \cdot SE(\hat{\beta})}, \quad e^{\hat{\beta} + 1.96 \cdot SE(\hat{\beta})}\right]$

If the CI does not include 1, the covariate is statistically significant at the 5% level.

10.4 The Partial Likelihood

Because the baseline hazard $h_0(t)$ is left unspecified, standard MLE cannot be applied directly to the Cox model. Instead, Cox (1972) proposed the partial likelihood — a likelihood that eliminates $h_0(t)$ and depends only on $\boldsymbol{\beta}$ .

The partial likelihood is constructed by considering, at each event time $t_j$ , the probability that it was individual $i_j$ who experienced the event, given that exactly one event occurred at $t_j$ among all individuals at risk:

$L_j(\boldsymbol{\beta}) = \frac{\exp(\boldsymbol{\beta}^T \mathbf{x}_{i_j})}{\sum_{l \in \mathcal{R}(t_j)} \exp(\boldsymbol{\beta}^T \mathbf{x}_l)}$

The full partial likelihood (product over all event times):

$PL(\boldsymbol{\beta}) = \prod_{j=1}^{D} \frac{\exp(\boldsymbol{\beta}^T \mathbf{x}_{i_j})}{\sum_{l \in \mathcal{R}(t_j)} \exp(\boldsymbol{\beta}^T \mathbf{x}_l)}$

The log partial likelihood:

$\ell_{PL}(\boldsymbol{\beta}) = \sum_{j=1}^{D} \left[\boldsymbol{\beta}^T \mathbf{x}_{i_j} - \ln\sum_{l \in \mathcal{R}(t_j)} \exp(\boldsymbol{\beta}^T \mathbf{x}_l)\right]$

Where $D$ is the total number of events.

The coefficient estimates $\hat{\boldsymbol{\beta}}$ are found by maximising $\ell_{PL}$ using iterative numerical methods (e.g., Newton-Raphson).

10.5 Handling Ties

When multiple events occur at the same time (tied event times), the exact partial likelihood becomes computationally intractable. Three commonly used approximations are:

Breslow's approximation (default in most software):

$L_j^{\text{Breslow}} = \frac{\exp(\boldsymbol{\beta}^T \mathbf{s}_j)}{\left[\sum_{l \in \mathcal{R}(t_j)} \exp(\boldsymbol{\beta}^T \mathbf{x}_l)\right]^{d_j}}$

Where $\mathbf{s}_j = \sum_{k \in D_j} \mathbf{x}_k$ is the sum of covariate vectors for all $d_j$ individuals who experienced the event at $t_j$ .

Efron's approximation (more accurate with many ties):

$L_j^{\text{Efron}} = \frac{\exp(\boldsymbol{\beta}^T \mathbf{s}_j)}{\prod_{k=1}^{d_j}\left[\sum_{l \in \mathcal{R}(t_j)} \exp(\boldsymbol{\beta}^T \mathbf{x}_l) - \frac{k-1}{d_j}\sum_{l \in D_j} \exp(\boldsymbol{\beta}^T \mathbf{x}_l)\right]}$

Exact (discrete) method: Computes the exact combinatorial probability — accurate but computationally very expensive for large numbers of ties.

💡 Efron's approximation is generally preferred over Breslow's when there are many tied event times. DataStatPro uses Efron's approximation by default.

10.6 Testing the Proportional Hazards Assumption

The proportional hazards (PH) assumption is critical. If violated, hazard ratio estimates are biased and not meaningful. Multiple methods exist to test it:

Method 1: Schoenfeld Residuals Test

The Schoenfeld residual for individual $i$ at event time $t_j$ is the difference between the observed covariate value and its expected value under the model:

$r_{ij} = x_{ij} - \hat{x}_{ij} = x_{ij} - \frac{\sum_{l \in \mathcal{R}(t_j)} x_{lj} \exp(\hat{\boldsymbol{\beta}}^T \mathbf{x}_l)}{\sum_{l \in \mathcal{R}(t_j)} \exp(\hat{\boldsymbol{\beta}}^T \mathbf{x}_l)}$

If PH holds, the Schoenfeld residuals for covariate $j$ should be uncorrelated with time.

The Grambsch-Therneau test formally tests this:

$H_0: \text{Schoenfeld residuals for covariate } j \text{ are uncorrelated with time}$

A significant test ( $p < 0.05$ ) for covariate $j$ indicates a violation of the PH assumption for that covariate.

Graphically: Plot the scaled Schoenfeld residuals against time (or log-time). A flat, horizontal smoothed line supports PH; a clear trend suggests violation.

Method 2: Log-Log Survival Plot

Plot $\ln[-\ln\hat{S}(t)]$ (the complementary log-log of the KM survival estimate) against $\ln t$ for each group. Under proportional hazards, these lines should be approximately parallel:

$\ln H_g(t) = \ln H_0(t) + \boldsymbol{\beta}^T \mathbf{x}_g$

If the log-log curves are parallel, the PH assumption holds for that grouping variable.

Method 3: Time-Covariate Interaction

Add an interaction between each covariate and time (or $\ln t$ ) to the Cox model:

$h(t \mid \mathbf{x}_i) = h_0(t) \exp[\boldsymbol{\beta}^T \mathbf{x}_i + \boldsymbol{\gamma}^T (\mathbf{x}_i \cdot g(t))]$

Where $g(t)$ is a function of time (e.g., $t$ or $\ln t$ ). A significant $\hat{\gamma}_j$ indicates that the effect of covariate $j$ changes over time (PH violated).

Remedies When PH is Violated

Violation Type	Remedy
One covariate violates PH	Stratify by that covariate (stratified Cox model)
PH violated for all covariates	Use parametric AFT model instead
Time-varying effect	Add time × covariate interaction; use time-varying Cox model
Crossing hazards	Use restricted mean survival time (RMST) as effect measure

10.7 Residuals in the Cox Model

Several types of residuals are available for diagnosing the Cox model:

Martingale Residuals:

$\hat{M}_i = \delta_i - \hat{H}_0(t_i)\exp(\hat{\boldsymbol{\beta}}^T \mathbf{x}_i)$

Range from $-\infty$ to $1$ . Used to:

Assess functional form of continuous covariates (plot $\hat{M}$ vs. $x$ — a non-linear pattern suggests a non-linear transformation is needed).
Identify outliers (very negative values indicate subjects whose event occurred much later than the model predicted).

Deviance Residuals:

$D_i = \text{sign}(\hat{M}_i) \cdot \sqrt{-2[\hat{M}_i + \delta_i \ln(\delta_i - \hat{M}_i)]}$

Approximately normally distributed around 0. Values $|D_i| > 2$ suggest potential outliers.

Score (Schoenfeld) Residuals:

As described in Section 10.6. Used for testing the PH assumption.

Dfbeta Residuals (Influence Diagnostics):

$\widehat{df\beta}_{ij} \approx -r_{ij} \cdot \frac{\widehat{\text{Var}}(\hat{\beta}_j)}{n}$

Estimates how much $\hat{\beta}_j$ would change if observation $i$ were deleted. Large absolute values indicate influential observations.

10.8 The Baseline Survival Function

After estimating $\hat{\boldsymbol{\beta}}$ , the baseline survival function $\hat{S}_0(t)$ is estimated using the Breslow estimator:

$\hat{H}_0(t) = \sum_{j: t_j \leq t} \frac{d_j}{\sum_{l \in \mathcal{R}(t_j)} \exp(\hat{\boldsymbol{\beta}}^T \mathbf{x}_l)}$

$\hat{S}_0(t) = \exp(-\hat{H}_0(t))$

The survival function for an individual with covariates $\mathbf{x}$ is then:

$\hat{S}(t \mid \mathbf{x}) = [\hat{S}_0(t)]^{\exp(\hat{\boldsymbol{\beta}}^T \mathbf{x})}$

10.9 The Stratified Cox Model

When a covariate violates the PH assumption, it can be used as a stratifying variable instead of a covariate. The stratified Cox model allows a separate baseline hazard for each stratum $s$ , while constraining the regression coefficients to be equal across strata:

$h(t \mid \mathbf{x}_i, s) = h_{0s}(t) \cdot \exp(\boldsymbol{\beta}^T \mathbf{x}_i)$

This approach:

Relaxes the PH assumption for the stratifying variable.
Still assumes PH for all included covariates.
Does not estimate a coefficient for the stratifying variable (it is absorbed into the baseline).

11. Parametric Survival Models

11.1 Parametric Proportional Hazards (PH) Formulation

In parametric PH models, the hazard function is:

$h(t \mid \mathbf{x}_i) = h_0(t; \boldsymbol{\alpha}) \cdot \exp(\boldsymbol{\beta}^T \mathbf{x}_i)$

Where $h_0(t; \boldsymbol{\alpha})$ is a fully specified baseline hazard with parameters $\boldsymbol{\alpha}$ (e.g., Weibull shape parameter $\kappa$ ). Both $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ are estimated simultaneously via MLE.

Weibull PH model (the most common parametric PH model):

$h(t \mid \mathbf{x}_i) = \frac{\kappa}{\lambda}\left(\frac{t}{\lambda}\right)^{\kappa-1} \exp(\boldsymbol{\beta}^T \mathbf{x}_i)$

$S(t \mid \mathbf{x}_i) = \exp\left[-\left(\frac{t}{\lambda}\right)^{\kappa} \exp(\boldsymbol{\beta}^T \mathbf{x}_i)\right]$

11.2 The Accelerated Failure Time (AFT) Model

An alternative parametric formulation is the Accelerated Failure Time (AFT) model, which models the effect of covariates on the log of the survival time directly:

$\ln T_i = \boldsymbol{\beta}^T \mathbf{x}_i + \sigma \epsilon_i$

Or equivalently:

$T_i = \exp(\boldsymbol{\beta}^T \mathbf{x}_i) \cdot T_0$

Where $T_0$ is the baseline survival time and $\sigma$ is a scale parameter.

The survival function in the AFT model:

$S(t \mid \mathbf{x}_i) = S_0\left(\frac{t}{\exp(\boldsymbol{\beta}^T \mathbf{x}_i)}\right)$

The acceleration factor for covariate $x_j$ :

$\text{AF}_j = e^{\beta_j}$

Interpretation of AFT coefficients:

$e^{\beta_j}$	Interpretation
$> 1$	Covariate slows down the event (time is stretched — beneficial)
$= 1$	No effect on survival time
$< 1$	Covariate speeds up the event (time is compressed — harmful)

💡 The AFT model is more intuitive when covariates do not satisfy the PH assumption. Instead of multiplicative effects on the hazard, it gives multiplicative effects on time itself.

11.3 Relationship Between PH and AFT Formulations

The Weibull model has the unique property of satisfying both the PH and AFT formulations:

Formulation	Coefficient Relationship
PH form: $\beta_{PH}$	Effect on $\ln h(t)$
AFT form: $\beta_{AFT}$	Effect on $\ln T$
Relationship	$\beta_{AFT} = -\beta_{PH} / \kappa$

Where $\kappa$ is the Weibull shape parameter.

For the exponential, log-normal, log-logistic, and generalised gamma, the AFT formulation is the standard one. For the Weibull and Gompertz, both formulations are available.

11.4 Parametric Model Likelihood

For parametric models, the full MLE log-likelihood for $n$ subjects is:

$\ell(\boldsymbol{\theta}) = \sum_{i=1}^{n} \left[\delta_i \ln h(t_i; \boldsymbol{\theta}) - H(t_i; \boldsymbol{\theta})\right]$

Where $\boldsymbol{\theta} = (\boldsymbol{\alpha}, \boldsymbol{\beta})$ contains all model parameters.

For the Weibull model with shape $\kappa$ and scale $\lambda(\mathbf{x}_i) = \exp(\boldsymbol{\beta}^T\mathbf{x}_i)$ :

$\ell = \sum_{i=1}^{n}\left[\delta_i \ln\left(\frac{\kappa}{\lambda_i}\right) + \delta_i(\kappa-1)\ln\left(\frac{t_i}{\lambda_i}\right) - \left(\frac{t_i}{\lambda_i}\right)^{\kappa}\right]$

11.5 Predicting Survival Times From Parametric Models

A major advantage of parametric models over the Cox model is the ability to predict survival times for new individuals.

For an individual with covariates $\mathbf{x}^*$ :

Predicted survival function:

$\hat{S}(t \mid \mathbf{x}^*) = \exp\left[-\left(\frac{t}{\exp(\hat{\boldsymbol{\beta}}^T\mathbf{x}^*)}\right)^{\hat{\kappa}}\right] \quad \text{(Weibull)}$

Predicted median survival time:

Solve $\hat{S}(\hat{t}_{0.5} \mid \mathbf{x}^*) = 0.50$ :

$\hat{t}_{0.5} = \exp(\hat{\boldsymbol{\beta}}^T\mathbf{x}^*) \cdot [\ln 2]^{1/\hat{\kappa}} \quad \text{(Weibull)}$

Predicted mean survival time (Weibull):

$\hat{E}(T \mid \mathbf{x}^*) = \exp(\hat{\boldsymbol{\beta}}^T\mathbf{x}^*) \cdot \Gamma\left(1 + \frac{1}{\hat{\kappa}}\right)$

Where $\Gamma(\cdot)$ is the gamma function.

12. Model Fit and Evaluation

12.1 Graphical Diagnostics for Parametric Distribution Selection

Before fitting a parametric model, graphical methods help identify which distribution is most appropriate.

Log-Log (Weibull) Plot:

If the Weibull distribution is appropriate, a plot of $\ln[-\ln\hat{S}(t)]$ against $\ln t$ should be approximately linear:

$\ln H(t) = \kappa \ln t - \kappa \ln \lambda$

This is a straight line with slope $\kappa$ and intercept $-\kappa \ln \lambda$ .

Log-Normal Plot:

If the log-normal distribution is appropriate, a plot of $\Phi^{-1}(1 - \hat{S}(t))$ against $\ln t$ should be approximately linear.

Log-Logistic Plot:

If the log-logistic distribution is appropriate, a plot of $\ln[(1 - \hat{S}(t))/\hat{S}(t)]$ against $\ln t$ should be approximately linear.

Cumulative Hazard Plot:

Plot $\hat{H}(t)$ (Nelson-Aalen estimate) against $t$ or $\ln t$ :

Linear against $t$ → Exponential.
Linear against $\ln t$ → Weibull.
S-shaped → Log-logistic or log-normal.

12.2 AIC and BIC for Model Selection

The AIC and BIC are used to compare parametric models with different distributional assumptions (or different numbers of parameters):

$\text{AIC} = -2\hat{\ell}(\hat{\boldsymbol{\theta}}) + 2q$

$\text{BIC} = -2\hat{\ell}(\hat{\boldsymbol{\theta}}) + q \ln(n^*)$

Where $q$ is the number of estimated parameters and $n^*$ is the number of events (not total observations, in some implementations).

Lower AIC/BIC indicates a better-fitting, more parsimonious model.

💡 AIC and BIC can compare models from different distributional families (e.g., Weibull vs. log-normal) as long as they use the same outcome and data — unlike the $\Delta\chi^2$ test, which requires nested models.

12.3 Likelihood Ratio Test for Nested Models

When comparing nested parametric models (one model is a restricted version of another), the likelihood ratio test (LRT) formally tests whether the additional parameters significantly improve fit:

$\Lambda = -2[\ell(\hat{\boldsymbol{\theta}}_{\text{restricted}}) - \ell(\hat{\boldsymbol{\theta}}_{\text{full}})]$

Under $H_0$ , $\Lambda$ follows a $\chi^2$ distribution with $\Delta q$ degrees of freedom ( $\Delta q$ = number of additional parameters in the full model).

Example: Testing whether the Weibull model fits significantly better than the exponential (which is a special case with $\kappa = 1$ ):

$\Lambda = -2(\ell_{\text{Exp}} - \ell_{\text{Weibull}}) \sim \chi^2(1)$

12.4 Cox-Snell Residuals for Overall Fit

The Cox-Snell residuals can be used to assess the overall fit of any survival model (Cox or parametric):

$\hat{r}_i^{CS} = \hat{H}(t_i \mid \mathbf{x}_i) = -\ln \hat{S}(t_i \mid \mathbf{x}_i)$

If the model fits well, the Cox-Snell residuals should follow a unit exponential distribution (i.e., $\hat{H}(\hat{r}_i^{CS}) \approx \hat{r}_i^{CS}$ ).

Check: Plot the Nelson-Aalen estimate of the cumulative hazard of the Cox-Snell residuals against the residuals themselves. If the model fits well, this plot should fall approximately on the line $y = x$ (45-degree line).

12.5 Concordance Index (C-statistic)

The concordance index (C-index) measures the discriminative ability of a survival model — how well it ranks individuals by their predicted risk. It is the survival analysis analogue of the AUC for binary outcomes:

$C = \frac{\sum_{i < j} \mathbf{1}[t_i < t_j] \cdot \mathbf{1}[\hat{\eta}_i > \hat{\eta}_j] \cdot \delta_i}{\sum_{i < j} \mathbf{1}[t_i < t_j] \cdot \delta_i}$

Where $\hat{\eta}_i = \hat{\boldsymbol{\beta}}^T \mathbf{x}_i$ is the predicted linear predictor (risk score) for individual $i$ .

C-index	Interpretation
$0.50$	No discrimination (random)
$0.50 - 0.60$	Poor
$0.60 - 0.70$	Acceptable
$0.70 - 0.80$	Good
$0.80 - 0.90$	Excellent
$> 0.90$	Outstanding

12.6 The Global Likelihood Ratio Test (Cox Model)

For the Cox model, three omnibus tests assess whether any covariate is significantly associated with survival:

Likelihood Ratio (LR) Test:

$\chi^2_{LR} = -2[\ell_{PL}(\mathbf{0}) - \ell_{PL}(\hat{\boldsymbol{\beta}})] \sim \chi^2(p)$

Wald Test:

$\chi^2_W = \hat{\boldsymbol{\beta}}^T [\widehat{\text{Var}}(\hat{\boldsymbol{\beta}})]^{-1} \hat{\boldsymbol{\beta}} \sim \chi^2(p)$

Score (Log-Rank) Test:

$\chi^2_S = \mathbf{U}(\mathbf{0})^T [\mathbf{I}(\mathbf{0})]^{-1} \mathbf{U}(\mathbf{0}) \sim \chi^2(p)$

Where $\mathbf{U}(\mathbf{0})$ is the score vector and $\mathbf{I}(\mathbf{0})$ is the information matrix, both evaluated at $\boldsymbol{\beta} = \mathbf{0}$ .

All three tests are asymptotically equivalent. The LR test is generally considered the most reliable in finite samples.

12.7 Calibration: Observed vs. Predicted Survival

Calibration assesses whether the predicted survival probabilities match the observed survival rates. A well-calibrated model should show that, among individuals predicted to have (for example) a 70% chance of surviving 5 years, approximately 70% actually do survive 5 years.

The Hosmer-Lemeshow-style calibration test for survival groups subjects by predicted risk (deciles) and compares observed and expected survival in each group. Formally, the D'Agostino-Nam test extends this to survival data:

$\chi^2_{HLS} = \sum_{g=1}^G \frac{(O_g - E_g)^2}{E_g(1 - E_g/n_g)} \sim \chi^2(G-2)$

13. Advanced Topics

13.1 Time-Varying Covariates

In some studies, covariate values change during follow-up (e.g., treatment dose, blood pressure, smoking status). These are time-varying covariates and must be incorporated into the Cox model using a counting process data format.

In the counting process format, each period of follow-up during which a covariate value is constant becomes a separate row:

ID	Start	Stop	Event	Treatment	Dose
1	0	6	0	A	50mg
1	6	12	1	A	75mg
2	0	10	0	B	100mg
2	10	24	0	B	80mg

The counting process Cox model for time-varying covariates $\mathbf{x}_i(t)$ :

$h(t \mid \mathbf{x}_i(t)) = h_0(t) \exp(\boldsymbol{\beta}^T \mathbf{x}_i(t))$

⚠️ Careful: Internal time-varying covariates (values affected by the individual's own disease process — e.g., a biomarker) can introduce bias. Only external covariates (unaffected by the subject's status) can be safely included as time-varying covariates without special considerations.

13.2 Frailty Models (Random Effects for Survival)

Frailty models extend the Cox model to account for unobserved heterogeneity among subjects or clustering (e.g., patients within hospitals, multiple events per subject). A random frailty term $Z_i$ is introduced:

$h(t \mid \mathbf{x}_i, Z_i) = Z_i \cdot h_0(t) \exp(\boldsymbol{\beta}^T \mathbf{x}_i)$

Where $Z_i$ is a random variable (the "frailty") assumed to follow a specific distribution:

Gamma frailty (most common): $Z_i \sim \text{Gamma}(1/\theta, 1/\theta)$ , with $\text{Var}(Z_i) = \theta$ . The marginal survival function has a closed form.
Log-normal frailty: $\ln Z_i \sim \mathcal{N}(0, \sigma^2)$ .
Inverse-Gaussian frailty.

The frailty variance $\theta$ (or $\sigma^2$ ) quantifies the degree of unobserved heterogeneity. If $\theta = 0$ , the frailty model reduces to the standard Cox model.

The marginal survival function (Gamma frailty):

$S(t \mid \mathbf{x}_i) = E_{Z_i}[S(t \mid \mathbf{x}_i, Z_i)] = \left[1 + \theta H_0(t)\exp(\boldsymbol{\beta}^T\mathbf{x}_i)\right]^{-1/\theta}$

13.3 Competing Risks

Competing risks arise when multiple mutually exclusive event types are possible, and the occurrence of one event precludes the observation of others. For example, in a study of cancer-specific mortality, deaths from other causes are competing risks.

In the presence of competing risks, the standard KM estimator overestimates the probability of the event of interest because it treats competing events as independent censoring (which they are not — they actually prevent the event of interest from occurring).

The Cause-Specific Hazard for event type $k$ :

$h_k(t) = \lim_{\Delta t \to 0} \frac{P(t \leq T < t + \Delta t, K = k \mid T \geq t)}{\Delta t}$

The Cumulative Incidence Function (CIF) (also called the subdistribution or cause-specific CIF) gives the probability of experiencing event $k$ by time $t$ in the presence of competing risks:

$F_k(t) = \int_0^t h_k(u) \cdot S(u) \, du$

Where $S(u) = \exp\left(-\sum_{k'} \int_0^u h_{k'}(v) \, dv\right)$ is the overall survival function.

Note that $\sum_k F_k(\infty) = 1$ (if all events are eventual) and individual CIFs do not sum to 1 at any finite time $t$ .

The Gray Test compares CIFs between groups in the presence of competing risks (analogous to the log-rank test for the standard case).

Fine and Gray's Subdistribution Hazard Model directly models the effect of covariates on the CIF for event type $k$ :

$\log\left[-\log\left(1 - F_k(t \mid \mathbf{x})\right)\right] = \log\left[-\log(1 - F_{k0}(t))\right] + \boldsymbol{\beta}_k^T \mathbf{x}$

13.4 Restricted Mean Survival Time (RMST)

When the proportional hazards assumption is violated and hazard ratios are not interpretable, the Restricted Mean Survival Time (RMST) provides an alternative summary measure:

$\text{RMST}(\tau) = E[\min(T, \tau)] = \int_0^{\tau} S(t) \, dt$

Where $\tau$ is a pre-specified restriction time (the maximum follow-up horizon of interest).

RMST is the area under the survival curve up to time $\tau$ . It has a direct interpretation as the average event-free time up to $\tau$ .

Difference in RMST between two groups (e.g., treatment vs. control):

$\Delta\text{RMST}(\tau) = \text{RMST}_1(\tau) - \text{RMST}_0(\tau)$

This represents the average additional event-free time gained by the treatment group up to time $\tau$ , regardless of proportionality.

Estimated RMST and its variance:

$\widehat{\text{RMST}}(\tau) = \int_0^{\tau} \hat{S}(t) \, dt \approx \sum_{j: t_j \leq \tau} \hat{S}(t_j)(t_{j+1} - t_j)$

The variance is estimated using the delta method applied to Greenwood's formula.

14. Worked Examples

Example 1: Kaplan-Meier Estimation and Log-Rank Test — Two Treatment Groups

A clinical trial of $n = 40$ patients with advanced lung cancer randomises participants to either a new chemotherapy regimen (Group A, $n = 20$ ) or standard care (Group B, $n = 20$ ). Survival time (in months) is recorded.

Study Summary:

Group A: 14 events, 6 censored.
Group B: 17 events, 3 censored.

Step 1 — Compute the Kaplan-Meier Estimates (abbreviated)

Group A (selected event times):

Time (m)	$n_j$	$d_j$	$\hat{p}_j$	$\hat{S}(t)$	95% CI
2	20	1	0.950	0.950	(0.828, 1.000)
5	18	2	0.889	0.844	(0.674, 0.944)
9	15	1	0.933	0.788	(0.608, 0.906)
14	12	1	0.917	0.722	(0.528, 0.862)
22	8	2	0.750	0.542	(0.337, 0.723)
31	4	1	0.750	0.406	(0.197, 0.636)

Group B (selected event times):

Time (m)	$n_j$	$d_j$	$\hat{p}_j$	$\hat{S}(t)$	95% CI
1	20	2	0.900	0.900	(0.755, 0.978)
4	17	2	0.882	0.794	(0.617, 0.912)
7	14	2	0.857	0.680	(0.489, 0.824)
12	10	2	0.800	0.544	(0.347, 0.724)
18	6	2	0.667	0.363	(0.182, 0.586)
25	3	1	0.667	0.242	(0.071, 0.524)

Median Survival Time:

Group A: $\hat{t}_{0.5} = 26$ months (95% CI: 18, —)
Group B: $\hat{t}_{0.5} = 16$ months (95% CI: 10, 24)

Step 2 — Log-Rank Test

$H_0$ : The survival distributions are identical in Groups A and B.

Computing observed and expected events at each event time in the combined dataset:

$\sum_j d_{Aj} = 14$ (observed events in Group A)

$\sum_j e_{Aj} = 10.2$ (expected events in Group A under $H_0$ )

$\chi^2_{LR} = \frac{(14 - 10.2)^2}{\sum_j V_{Aj}} = \frac{(3.8)^2}{4.91} = \frac{14.44}{4.91} \approx 2.94$

$df = 1, \quad p = 0.086$

Interpretation: The log-rank test is not statistically significant at the 5% level ( $p = 0.086$ ), though there is a trend favouring Group A. The median survival time is 10 months longer for Group A (26 vs. 16 months). With a larger sample, this difference might reach statistical significance. The Kaplan-Meier curves show higher survival probabilities for Group A at all time points.

Example 2: Cox Proportional Hazards Model — Predicting Time to Readmission

A hospital study follows $n = 300$ patients after discharge from heart failure hospitalisation. The event of interest is 30-day readmission. Predictors include age (years), sex (0 = female, 1 = male), number of comorbidities (count), and treatment type (A = reference, B, C).

Model Results:

Variable	$\hat{\beta}$	SE	z	p	HR = $e^{\hat{\beta}}$	95% CI for HR
Age (years)	0.031	0.011	2.82	0.005	1.031	(1.009, 1.054)
Sex (Male)	0.287	0.142	2.02	0.043	1.332	(1.009, 1.759)
Comorbidities	0.198	0.063	3.14	0.002	1.219	(1.078, 1.379)
Treatment B	-0.412	0.158	-2.61	0.009	0.662	(0.487, 0.900)
Treatment C	-0.681	0.171	-3.98	$<$ 0.001	0.506	(0.362, 0.707)

Global LR Test: $\chi^2(5) = 38.2$ , $p < 0.001$ — at least one covariate significantly predicts readmission.

C-index: $\hat{C} = 0.72$ — Good discriminative ability.

Interpretation of Key Coefficients:

Age: For each additional year of age, the hazard of readmission increases by $(\text{HR} = 1.031)$ 3.1% (95% CI: 0.9%, 5.4%), holding all other covariates constant. This effect is statistically significant ( $p = 0.005$ ).
Sex (Male): Males have a 33.2% higher hazard of readmission than females ( $\text{HR} = 1.332$ , 95% CI: 1.009, 1.759; $p = 0.043$ ), after adjusting for age, comorbidities, and treatment.
Comorbidities: Each additional comorbidity increases the hazard of readmission by 21.9% ( $\text{HR} = 1.219$ , 95% CI: 1.078, 1.379; $p = 0.002$ ).
Treatment B: Patients on Treatment B have a 33.8% lower hazard of readmission compared to Treatment A ( $\text{HR} = 0.662$ , 95% CI: 0.487, 0.900; $p = 0.009$ ). Interpretation: Treatment B is significantly protective relative to Treatment A.
Treatment C: Patients on Treatment C have a 49.4% lower hazard compared to Treatment A ( $\text{HR} = 0.506$ , 95% CI: 0.362, 0.707; $p < 0.001$ ). Treatment C provides the strongest protection against readmission among the three treatments.

Proportional Hazards Assessment:

Grambsch-Therneau test results:

Variable	$\rho$ (correlation with time)	$\chi^2$	p
Age	0.041	0.28	0.597
Sex	-0.063	0.67	0.413
Comorbidities	0.082	1.12	0.290
Treatment B	0.051	0.44	0.507
Treatment C	0.097	1.58	0.209
GLOBAL		4.82	0.437

All individual tests and the global test are non-significant — the proportional hazards assumption is supported for all covariates.

Predicted Survival:

For a 65-year-old male with 3 comorbidities on Treatment C:

$\hat{\eta} = 0.031(65) + 0.287(1) + 0.198(3) + (-0.681)(1) = 2.015 + 0.287 + 0.594 - 0.681 = 2.215$

$\hat{S}(30 \mid \mathbf{x}) = [\hat{S}_0(30)]^{\exp(2.215)}$

Where $\hat{S}_0(30)$ is the estimated baseline 30-day survival probability.

Example 3: Weibull Parametric Model — Time to Equipment Failure

An engineering study monitors $n = 120$ industrial machines for failure. Follow-up is 18 months. The predictor is machine type (Standard = reference, Enhanced).

Distribution Selection (AIC comparison):

Distribution	Log-Likelihood	Parameters	AIC
Exponential	-241.3	2	486.6
Weibull	-233.8	3	473.6
Log-Normal	-237.1	3	480.2
Log-Logistic	-235.9	3	477.8
Gompertz	-234.5	3	475.0

Decision: Weibull has the lowest AIC (473.6) → select Weibull model.

LRT: Weibull vs. Exponential:

$\Lambda = -2(-241.3 - (-233.8)) = -2(-7.5) = 15.0 \sim \chi^2(1), \quad p < 0.001$

The Weibull fits significantly better than the exponential — the hazard is not constant.

Weibull Model Results:

Parameter	Estimate	SE	95% CI
Intercept ( $\ln \lambda$ )	2.841	0.201	(2.447, 3.235)
Enhanced vs. Standard ( $\beta$ )	0.624	0.183	(0.265, 0.983)
Shape ( $\kappa$ )	1.82	0.189	(1.481, 2.230)

Shape parameter: $\hat{\kappa} = 1.82 > 1$ → Increasing hazard — failure risk increases over time, consistent with mechanical wear-out.

Acceleration Factor (AFT interpretation):

$e^{\hat{\beta}} = e^{0.624} = 1.867$

Enhanced machines have failure times that are 86.7% longer than Standard machines on average (95% CI: 1.303, 2.674). In other words, enhanced machines last approximately 1.87 times as long.

Predicted Median Failure Times:

$\hat{t}_{0.5}^{\text{Standard}} = \exp(2.841) \cdot [\ln 2]^{1/1.82} = 17.12 \times 0.683 = 11.7$ months

$\hat{t}_{0.5}^{\text{Enhanced}} = 11.7 \times 1.867 = 21.8$ months

Standard machines have a predicted median failure time of 11.7 months, while Enhanced machines are predicted to last 21.8 months before failure, on average.

15. Common Mistakes and How to Avoid Them

Mistake 1: Treating Censored Observations as Events or Excluding Them

Problem: A common error among beginners is either incorrectly coding censored observations as events (which dramatically overestimates the hazard) or excluding censored observations entirely (which introduces severe selection bias — the excluded subjects are typically the longest-surviving ones).
Solution: Include all observations and code censored subjects correctly (status = 0). The statistical methods are specifically designed to use the partial information provided by censored observations.

Mistake 2: Using the Wrong Time Origin

Problem: Using different time origins for different subjects (e.g., some measured from diagnosis, others from treatment start) without harmonising the time scale. This creates an inconsistent and uninterpretable survival analysis.
Solution: Define the time origin before data collection and apply it consistently to all subjects. Document the chosen time origin clearly in the methods section.

Mistake 3: Confusing Censoring and Missing Data

Problem: Treating censored observations as missing (removing them) or treating truly missing data as censored (including them with a specific censoring time) are both errors that lead to biased results.
Solution:

Censored = subject left the study without experiencing the event during follow-up → include with status = 0.
Missing = event status or time is unknown for administrative/data quality reasons → treat as missing data (impute or exclude with justification), not censored.

Mistake 4: Not Checking the Proportional Hazards Assumption

Problem: Reporting Cox model hazard ratios without testing the PH assumption. If the assumption is violated, the hazard ratio is a biased average of a time-varying effect and is not meaningfully interpretable.
Solution: Always perform and report Schoenfeld residual tests and log-log plots for all covariates. If PH is violated, use stratification, time-varying coefficients, or switch to an AFT model.

Mistake 5: Applying the Standard KM Estimator With Competing Risks

Problem: Using the Kaplan-Meier estimator to estimate the probability of an event when competing risks are present. KM treats competing events as independent censoring, which overestimates the cumulative incidence of the event of interest.
Solution: Use the Cumulative Incidence Function (CIF) estimator (Aalen-Johansen method) when competing risks are present. Use the Gray test for group comparisons and the Fine-Gray model for regression.

Mistake 6: Using Too Few Events for the Cox Model (Overfitting)

Problem: Fitting a Cox model with many covariates but few events (e.g., 5 covariates with only 20 events). This leads to overfitting — coefficients and hazard ratios are unstable and will not replicate.
Solution: Follow the 10 Events Per Variable (EPV) rule. With 20 events, include at most 2 covariates. For small event counts, use penalised regression (ridge, LASSO) or Bayesian methods.

Mistake 7: Treating the Log-Rank Test as the Only Comparison Method

Problem: Reporting only the log-rank test when survival curves are non-proportional (crossing curves). The log-rank test is least powerful precisely when hazard ratios are not constant.
Solution: Always plot the KM curves first. If curves cross or show non-proportional patterns, report the Fleming-Harrington test with appropriate weights, or compare using RMST differences, which do not require the PH assumption.

Mistake 8: Ignoring the Effect of Left Truncation

Problem: In prevalence or registry studies, subjects are often only enrolled after surviving a certain threshold (e.g., alive at time of registration). Ignoring this creates immortal time bias — subjects who died early are systematically excluded.
Solution: Use left-truncated survival data methods. In the counting process format, specify the entry time (left truncation time) for each subject. This correctly adjusts the risk sets to only include subjects who were observable at each event time.

Mistake 9: Extrapolating Parametric Models Without Caution

Problem: Using parametric survival models to predict survival far beyond the observed follow-up period without acknowledging the uncertainty of such extrapolation.
Solution: Always explicitly state the observed follow-up range. Clearly label any predictions beyond observed follow-up as extrapolations with wide uncertainty. Validate predictions in external data when possible. Use sensitivity analysis with alternative distributions.

Mistake 10: Not Reporting Confidence Intervals for Survival Estimates

Problem: Reporting point estimates of $\hat{S}(t)$ or hazard ratios without confidence intervals, giving a false impression of precision.
Solution: Always report confidence intervals for all survival estimates: KM survival probabilities (Greenwood/log-log CIs), median survival times (Brookmeyer-Crowley CIs), and hazard ratios (Wald CIs). Include the number at risk at key time points on KM plots.

16. Troubleshooting

Problem	Likely Cause	Solution
Negative or zero survival times	Data entry errors; incorrect time origin	Clean data; check for recording errors; redefine time origin
KM curve never reaches 0.50	Fewer than 50% of subjects experience the event	Report that median is not estimable; report restricted mean or 75th percentile
KM curve ends abruptly at 1.0 or near 1.0	Last few observations are all events with no censored subjects near the end	Usually fine; report last observed survival probability
Log-rank test significant but KM curves overlap early	Late separation of curves (Weibull with increasing hazard)	Use Fleming-Harrington test with $\gamma > 0$ to emphasise late differences
Log-rank test non-significant but curves look different	Crossing curves; small sample	Report RMST; test for specific crossing patterns; increase sample size
Cox model fails to converge	Perfect separation (a covariate perfectly predicts events); very small event count	Remove problematic covariate; use Firth's penalised partial likelihood; reduce covariates
Hazard ratios are extremely large ( $> 10$ ) or extremely small ( $< 0.10$ )	Perfect or near-perfect separation; very small events in one category	Check event rates by category; collapse sparse categories; use penalised likelihood
Schoenfeld test significant for one covariate	PH assumption violated for that covariate	Stratify by that covariate; add time × covariate interaction; use AFT model
All Schoenfeld tests significant	Fundamental PH violation across all covariates	Switch to AFT parametric model or use RMST
Deviance residuals show large outliers	Influential observations with unexpected event timing	Investigate outliers clinically; check data quality; use dfbeta diagnostics
Very high C-index ( $> 0.90$ ) with few events	Overfitting; possible data leakage	Validate in independent sample; apply shrinkage; check for inadvertent inclusion of outcome-proximate predictors
AIC similar across all parametric distributions	Data are not informative enough to discriminate between distributions	Use the most theoretically appropriate distribution; report all models; use the simplest (most parsimonious) model
Competing risks suspected but ignored	Multiple event types recorded but not modelled	Use cumulative incidence functions and the Fine-Gray model
Left truncation not accounted for	Registry or prevalent cohort study design	Restructure data with entry time; use counting process format
Large number of tied event times	Discrete-time outcome (e.g., daily data recorded as monthly)	Use Efron or exact partial likelihood; consider discrete-time survival models

17. Quick Reference Cheat Sheet

Core Equations

Formula	Description
$S(t) = P(T > t) = 1 - F(t)$	Survival function
$h(t) = \frac{f(t)}{S(t)} = -\frac{d}{dt}\ln S(t)$	Hazard function
$H(t) = \int_0^t h(u)\,du = -\ln S(t)$	Cumulative hazard
$S(t) = e^{-H(t)} = \exp\left(-\int_0^t h(u)\,du\right)$	Survival from cumulative hazard
$\hat{S}(t) = \prod_{j: t_j \leq t} \frac{n_j - d_j}{n_j}$	Kaplan-Meier estimator
$\hat{H}(t) = \sum_{j: t_j \leq t} \frac{d_j}{n_j}$	Nelson-Aalen estimator
$\widehat{\text{Var}}[\hat{S}(t)] = [\hat{S}(t)]^2\sum_{j: t_j \leq t}\frac{d_j}{n_j(n_j - d_j)}$	Greenwood's formula
$\chi^2_{LR} = \frac{(\sum_j(d_{1j} - e_{1j}))^2}{\sum_j V_{1j}}$	Log-rank test statistic
$h(t \mid \mathbf{x}) = h_0(t)\exp(\boldsymbol{\beta}^T\mathbf{x})$	Cox PH model
$\text{HR} = e^{\hat{\beta}}$	Hazard ratio
$\ell_{PL}(\boldsymbol{\beta}) = \sum_j\left[\boldsymbol{\beta}^T\mathbf{x}_{i_j} - \ln\sum_{l \in \mathcal{R}(t_j)}\exp(\boldsymbol{\beta}^T\mathbf{x}_l)\right]$	Cox partial log-likelihood
$\hat{S}(t \mid \mathbf{x}) = [\hat{S}_0(t)]^{\exp(\hat{\boldsymbol{\beta}}^T\mathbf{x})}$	Cox predicted survival
$L_i = [h(t_i;\boldsymbol{\theta})]^{\delta_i} \cdot S(t_i;\boldsymbol{\theta})$	Parametric likelihood contribution
$\text{RMST}(\tau) = \int_0^{\tau} S(t)\,dt$	Restricted mean survival time
$\text{AIC} = -2\hat{\ell} + 2q$	Akaike Information Criterion
$\text{BIC} = -2\hat{\ell} + q\ln(n^*)$	Bayesian Information Criterion

Parametric Distribution Quick Reference

Distribution	$S(t)$	Hazard Shape	Parameters
Exponential	$e^{-\lambda t}$	Constant	$\lambda$
Weibull	$\exp[-(t/\lambda)^{\kappa}]$	Monotone ↑/↓/flat	$\lambda, \kappa$
Log-Normal	$1 - \Phi\left(\frac{\ln t - \mu}{\sigma}\right)$	Hump-shaped	$\mu, \sigma$
Log-Logistic	$\frac{1}{1 + (t/\alpha)^{\gamma}}$	Hump or decreasing	$\alpha, \gamma$
Gompertz	$\exp\left[-\frac{\lambda}{\gamma}(e^{\gamma t} - 1)\right]$	Monotone ↑	$\lambda, \gamma$

Hazard Ratio Interpretation

HR	Meaning
$> 1$	Higher hazard (increased risk) relative to reference
$= 1$	No effect on hazard
$< 1$	Lower hazard (protective effect) relative to reference

C-Index Benchmarks

C-Index	Discrimination
$0.50$	None (random)
$0.50 - 0.60$	Poor
$0.60 - 0.70$	Acceptable
$0.70 - 0.80$	Good
$0.80 - 0.90$	Excellent
$> 0.90$	Outstanding

Method Selection Guide

Scenario	Recommended Method
Describe survival in one group	Kaplan-Meier
Compare survival across $\geq 2$ groups (unadjusted)	Log-Rank Test
Compare when groups differ on a confounder	Stratified Log-Rank
Non-proportional hazards in group comparison	Fleming-Harrington / RMST
Covariate-adjusted survival analysis	Cox PH Model
PH assumption violated	Stratified Cox or AFT Model
Predict survival times	Parametric Model
Competing events present	Cumulative Incidence Function + Fine-Gray Model
Clustered or repeated events	Frailty Model
Covariates change over time	Time-Varying Cox Model

Proportional Hazards Testing Summary

Method	Tool	PH Holds If
Schoenfeld residuals	Grambsch-Therneau test	$p > 0.05$ for all covariates
Log-log plot	Visual inspection	Lines approximately parallel
Time × covariate interaction	LR or Wald test	$p > 0.05$ for all interactions

Events Per Variable (EPV) Requirements

Model	Minimum EPV	Recommended EPV
Cox PH	10	15–20
Parametric	10	15–20
Kaplan-Meier (stable curve)	20–30 events total	50+ events total
Fine-Gray (competing risks)	10 per event type	15–20 per event type

Censoring Types Summary

Censoring Type	Description	Example
Right	Event not yet occurred at end of follow-up	Study ends; patient still alive
Left	Event occurred before observation began	Pre-existing infection
Interval	Event occurred between two assessment points	Screening study
Left truncation	Subject only enters study after surviving to entry time	Registry study

This tutorial provides a comprehensive foundation for understanding, performing, and interpreting Survival Analysis using the DataStatPro application. For further reading, consult Kleinbaum & Klein's "Survival Analysis: A Self-Learning Text" (3rd ed., 2012), Hosmer, Lemeshow & May's "Applied Survival Analysis" (2nd ed., 2008), or Therneau & Grambsch's "Modeling Survival Data: Extending the Cox Model" (2000). For feature requests or support, contact the DataStatPro team.

Survival Analysis