Chi-Square Test of Association: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of categorical data analysis all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are encountering the chi-square test of association for the first time or deepening your understanding of relationships between categorical variables, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is a Chi-Square Test of Association?
The Mathematics Behind the Chi-Square Test of Association
Assumptions of the Chi-Square Test of Association
Variants of the Chi-Square Test of Association
Using the Chi-Square Test of Association Calculator Component
Step-by-Step Procedure
Interpreting the Output
Effect Sizes for the Chi-Square Test of Association
Confidence Intervals
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into the chi-square test of association, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 Categorical Variables and Frequency Data

Unlike continuous variables (measured on interval or ratio scales), categorical variables assign each observation to a discrete, mutually exclusive category. The chi-square test of association operates on frequency counts — the number of observations falling into each combination of category levels.

Nominal variables: Categories with no intrinsic order (e.g., blood type: A, B, AB, O; political party affiliation; treatment group).
Ordinal variables: Categories with a meaningful order but no equal spacing (e.g., education level: primary, secondary, tertiary; satisfaction: low, medium, high).

⚠️ The chi-square test treats all categorical data as nominal. If your variables are ordinal, consider tests that exploit the ordering, such as the Jonckheere–Terpstra test or a linear-by-linear association test, as they are more powerful.

1.2 Contingency Tables

A contingency table (also called a cross-tabulation or crosstab) organises the joint frequency distribution of two (or more) categorical variables. For two variables $X$ (with $r$ rows) and $Y$ (with $c$ columns), the contingency table has dimensions $r \times c$ .

Each cell $(i, j)$ contains the observed frequency $O_{ij}$ : the count of observations belonging to category $i$ of $X$ and category $j$ of $Y$ .

Example (2 × 2 table):

	$Y = 1$	$Y = 2$	Row Total
$X = 1$	$O_{11}$	$O_{12}$	$R_1$
$X = 2$	$O_{21}$	$O_{22}$	$R_2$
Column Total	$C_1$	$C_2$	$N$

Where $N = \sum_{i}\sum_{j} O_{ij}$ is the total sample size.

1.3 The Concept of Statistical Independence

Two categorical variables $X$ and $Y$ are statistically independent if knowledge of an observation's category on $X$ gives no information about its category on $Y$ . Formally, independence requires:

$P(X = i \text{ and } Y = j) = P(X = i) \times P(Y = j) \quad \text{for all } i, j$

Equivalently, the conditional distribution of $Y$ given $X = i$ is the same for all values of $i$ . The chi-square test of association tests whether the observed data are consistent with this independence assumption.

1.4 Expected Frequencies Under Independence

If $X$ and $Y$ are independent, the expected frequency for cell $(i, j)$ is the product of the corresponding marginal probabilities multiplied by the total sample size:

$E_{ij} = N \times P(X = i) \times P(Y = j)$

Since the true marginal probabilities are unknown, they are estimated from the sample:

$\hat{P}(X = i) = \frac{R_i}{N}, \qquad \hat{P}(Y = j) = \frac{C_j}{N}$

Yielding the fundamental formula for expected frequencies:

$E_{ij} = \frac{R_i \times C_j}{N}$

Where $R_i$ is the $i$ -th row total and $C_j$ is the $j$ -th column total.

1.5 The Chi-Square Distribution

The chi-square distribution $\chi^2_\nu$ is a right-skewed probability distribution parameterised by degrees of freedom $\nu$ . Key properties:

Defined only for non-negative values ( $\chi^2 \geq 0$ ).
Skewed right, becoming more symmetric as $\nu$ increases.
Mean $= \nu$ ; Variance $= 2\nu$ .
As $\nu \to \infty$ , the chi-square distribution approaches a normal distribution.
Is the sum of $\nu$ squared independent standard normal variables: $\chi^2_\nu = \sum_{k=1}^\nu Z_k^2$ , where $Z_k \sim \mathcal{N}(0,1)$ .

The chi-square statistic measures the overall discrepancy between observed and expected frequencies — large values indicate strong departure from independence.

1.6 The Null and Alternative Hypotheses in Categorical Tests

The chi-square test of association operates within the hypothesis testing framework:

$H_0$ : The two categorical variables are statistically independent (no association).
$H_1$ : The two categorical variables are not statistically independent (an association exists).

Note that $H_1$ is always non-directional for the chi-square test — it simply states that some association exists, without specifying the form or direction of the relationship.

1.7 The p-Value and Significance Level

As in all hypothesis tests, the p-value is the probability of obtaining a test statistic as extreme or more extreme than observed, assuming $H_0$ is true. The significance level $\alpha$ (conventionally $.05$ ) is the threshold below which we reject $H_0$ .

Because large $\chi^2$ values indicate departure from independence, the p-value is always computed from the right tail of the chi-square distribution:

$p = P(\chi^2_\nu \geq \chi^2_{obs})$

⚠️ Rejecting $H_0$ tells you that an association exists; it says nothing about the strength, direction, or practical importance of that association. Always accompany chi-square results with appropriate effect size measures.

1.8 Degrees of Freedom in Contingency Tables

For a two-way contingency table with $r$ rows and $c$ columns, the degrees of freedom are:

$\nu = (r - 1)(c - 1)$

This reflects the number of cells that are free to vary once the marginal totals are fixed. In a $2 \times 2$ table, $\nu = 1$ ; in a $3 \times 4$ table, $\nu = 6$ .

2. What is a Chi-Square Test of Association?

2.1 The Core Question

The chi-square test of association (also called Pearson's chi-square test of independence) is a non-parametric inferential test that determines whether two categorical variables measured on the same set of observations are statistically associated with one another.

The test compares the observed cell frequencies in a contingency table against the expected cell frequencies that would arise if the two variables were completely independent. A large discrepancy between observed and expected frequencies provides evidence against independence.

2.2 The General Logic

The test quantifies how different the observed table is from the table we would expect under perfect independence:

$\chi^2 = \sum_{\text{all cells}} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$

Each term in the sum is a standardised squared residual: the squared difference between what was observed and what was expected, scaled by the expected frequency. When $O_{ij} \approx E_{ij}$ for all cells (as would be expected under independence), $\chi^2$ will be small. Systematic departures from independence produce large $\chi^2$ .

2.3 When to Use the Chi-Square Test of Association

Condition	Requirement
Research design	Two categorical variables measured on the same observations
Variable scale	Both variables nominal or ordinal (treated as nominal)
Data format	Frequency counts in a contingency table
Sample size	Adequate expected frequencies (see Assumptions)
Observations	Independent of each other
Hypothesis	Test of association, not prediction or causation

2.4 Real-World Applications

Field	Research Question	Variables
Epidemiology	Is smoking status associated with lung cancer diagnosis?	Smoking (yes/no) × Disease (yes/no)
Marketing	Is product preference associated with age group?	Product (A/B/C) × Age (18–34/35–54/55+)
Education	Is passing rate associated with teaching method?	Method (lecture/flipped/hybrid) × Result (pass/fail)
Clinical Psychology	Is treatment type associated with recovery status?	Treatment (CBT/medication/combined) × Recovery (yes/no)
Genetics	Is a genotype associated with a disease phenotype?	Genotype (AA/Aa/aa) × Disease (affected/unaffected)
Sociology	Is gender associated with voting preference?	Gender (M/F/NB) × Party (Democrat/Republican/Other)
Public Health	Is vaccination status associated with infection outcome?	Vaccinated (yes/no) × Infected (yes/no)
Quality Control	Is production shift associated with defect rate?	Shift (morning/afternoon/night) × Defect (yes/no)

Situation	Correct Test
Two categorical variables, independent samples	Chi-square test of association
Two categorical variables, paired samples	McNemar's test
One categorical variable vs. known distribution	Chi-square goodness-of-fit test
Two categorical variables, small expected frequencies	Fisher's exact test
Ordered categorical variables	Linear-by-linear association test
Three or more categorical variables	Log-linear models
One binary outcome, continuous predictor	Logistic regression
Two proportions only (2 × 2 table)	Two-proportion z-test (equivalent result)

3. The Mathematics Behind the Chi-Square Test of Association

3.1 The Pearson Chi-Square Statistic

Given an $r \times c$ contingency table with observed cell frequencies $O_{ij}$ , the Pearson chi-square statistic is:

$\chi^2 = \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$

Where the expected frequency for cell $(i,j)$ under the null hypothesis of independence is:

$E_{ij} = \frac{R_i \times C_j}{N}$

With:

$R_i = \sum_{j=1}^c O_{ij}$ — the $i$ -th row marginal total
$C_j = \sum_{i=1}^r O_{ij}$ — the $j$ -th column marginal total
$N = \sum_{i=1}^r \sum_{j=1}^c O_{ij}$ — the grand total

Under $H_0$ (independence), $\chi^2$ asymptotically follows a chi-square distribution with $\nu = (r-1)(c-1)$ degrees of freedom.

3.2 Degrees of Freedom

For an $r \times c$ table:

$\nu = (r-1)(c-1)$

Intuition: A contingency table has $r \times c$ cells. Given the $r$ row totals and $c$ column totals (which sum to $N$ ), only $(r-1)(c-1)$ cells are free to vary — the remaining cells are determined by the marginal constraints. Each constraint consumes one degree of freedom.

Table Dimensions	Degrees of Freedom
$2 \times 2$	1
$2 \times 3$	2
$3 \times 3$	4
$3 \times 4$	6
$4 \times 4$	9
$r \times c$	$(r-1)(c-1)$

3.3 Computing the p-Value

The p-value is always computed from the right tail of the chi-square distribution:

$p = P(\chi^2_\nu \geq \chi^2_{obs}) = 1 - F_{\chi^2_\nu}(\chi^2_{obs})$

Where $F_{\chi^2_\nu}$ is the cumulative distribution function (CDF) of the chi-square distribution with $\nu$ degrees of freedom. The chi-square test is inherently non-directional — departures from independence in any direction accumulate in the right tail.

3.4 Critical Values

Reject $H_0$ if $\chi^2_{obs} \geq \chi^2_{crit,\; \alpha,\; \nu}$ .

Common critical values ( $\alpha = .05$ ):

$\nu$	$\chi^2_{crit}$ ( $\alpha=.05$ )	$\chi^2_{crit}$ ( $\alpha=.01$ )	$\chi^2_{crit}$ ( $\alpha=.001$ )
1	3.841	6.635	10.828
2	5.991	9.210	13.816
3	7.815	11.345	16.266
4	9.488	13.277	18.467
6	12.592	16.812	22.458
9	16.919	21.666	27.877
12	21.026	26.217	32.909

3.5 Standardised and Adjusted Residuals

Beyond the overall $\chi^2$ statistic, residuals reveal which specific cells deviate most from independence.

Raw residual: $e_{ij} = O_{ij} - E_{ij}$

Standardised residual (Pearson residual): $r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}$

Note that $\chi^2 = \sum_{i,j} r_{ij}^2$ .

Adjusted standardised residual (Freeman–Tukey; approximately $\mathcal{N}(0,1)$ ): $z_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - R_i/N)(1 - C_j/N)}}$

Adjusted standardised residuals $|z_{ij}| > 1.96$ indicate that cell $(i,j)$ deviates significantly from independence at $\alpha = .05$ . These are essential for locating the source of a significant chi-square result.

3.6 Yates' Continuity Correction (2 × 2 Tables)

For $2 \times 2$ tables, the chi-square statistic is a continuous approximation to a discrete distribution. Yates' continuity correction improves the approximation:

$\chi^2_{Yates} = \sum_{i=1}^{2}\sum_{j=1}^{2} \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}}$

Yates' correction makes the test more conservative (reduces Type I error). However, it is controversial — many statisticians consider it overly conservative and recommend using Fisher's exact test for small samples instead. DataStatPro reports both the uncorrected and Yates-corrected chi-square for $2 \times 2$ tables.

3.7 The Likelihood Ratio Chi-Square ( $G$ -Test)

An alternative to Pearson's $\chi^2$ is the likelihood ratio statistic $G$ (also called the $G$ -test or log-likelihood ratio):

$G = 2\sum_{i=1}^{r}\sum_{j=1}^{c} O_{ij} \ln\!\left(\frac{O_{ij}}{E_{ij}}\right)$

Under $H_0$ , $G$ also follows a chi-square distribution with $\nu = (r-1)(c-1)$ df. The $G$ -test is preferred in some fields (particularly genetics and log-linear modelling) because it is directly derived from maximum likelihood theory. For moderate to large samples, $G$ and $\chi^2$ yield nearly identical results. For small samples, $G$ can be less accurate.

3.8 Effect Size — Cramér's $V$

Cramér's $V$ is the most widely used effect size measure for the chi-square test of association. It scales the chi-square statistic to range from 0 to 1:

$V = \sqrt{\frac{\chi^2}{N \times \min(r-1,\; c-1)}}$

Where $\min(r-1, c-1)$ is the smaller of the number of rows minus one and the number of columns minus one. For $2 \times 2$ tables, $V = \phi$ (the phi coefficient). $V$ is interpretable as the average association strength across all possible pairs of category combinations.

3.9 Effect Size — Phi Coefficient ( $\phi$ ) for 2 × 2 Tables

For $2 \times 2$ tables specifically, the phi coefficient is:

$\phi = \sqrt{\frac{\chi^2}{N}}$

The phi coefficient is equivalent to the Pearson product-moment correlation between two binary variables and ranges from $-1$ to $+1$ (though its magnitude is the same as $V$ for $2 \times 2$ tables). A signed version conveying direction can be computed as:

$\phi = \frac{O_{11}O_{22} - O_{12}O_{21}}{\sqrt{R_1 R_2 C_1 C_2}}$

The relationship between $\phi$ , $\chi^2$ , and $N$ is:

$\chi^2 = N\phi^2 \implies \phi = \sqrt{\chi^2/N}$

3.10 Statistical Power

Power for the chi-square test is the probability of detecting a true association of given magnitude. Under $H_1$ , the chi-square statistic follows a non-central chi-square distribution with non-centrality parameter:

$\lambda = N \times \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(P_{ij} - P_{i\cdot}P_{\cdot j})^2}{P_{i\cdot}P_{\cdot j}}$

Where $P_{ij}$ are the true cell probabilities and $P_{i\cdot}$ , $P_{\cdot j}$ are the true marginal probabilities. In terms of $V$ :

$\lambda = N \times V^2 \times \min(r-1,\; c-1)$

Required sample size for desired power $1 - \beta$ at significance level $\alpha$ :

$N \approx \frac{\chi^2_{crit,\;\alpha,\;\nu} + z_{1-\beta}^2}{\min(r-1,\;c-1) \times V^2}$

Required $N$ for a $2 \times 2$ table, $\alpha = .05$ , two conventional effect sizes:

Cramér's $V$	Power = 0.80	Power = 0.90	Power = 0.95
0.10 (small)	785	1046	1294
0.20	197	263	325
0.30 (medium)	88	117	145
0.50 (large)	32	42	53
0.70	17	22	28

4. Assumptions of the Chi-Square Test of Association

4.1 Independence of Observations

Each observation must contribute to exactly one cell of the contingency table. This means each participant or unit is counted once and only once. Independence is a design assumption, not testable from the data.

Common violations:

Multiple responses from the same participant counted as separate observations.
Paired or matched data analysed as independent (use McNemar's test instead).
Clustered sampling where observations within clusters are correlated.

When violated: Use McNemar's test (matched pairs), Cochran's Q test (repeated measures), or mixed-effects models for clustered categorical data.

4.2 Adequate Expected Frequencies

The chi-square approximation is only valid when expected frequencies are sufficiently large. The most widely cited guidelines are:

Guideline	Rule
Cochran (1954)	All $E_{ij} \geq 1$ ; no more than 20% of cells with $E_{ij} < 5$
Yates (1934)	All $E_{ij} \geq 5$ (strict rule)
Agresti (2007)	Most $E_{ij} \geq 5$ ; use Fisher's exact for $2 \times 2$ with any $E_{ij} < 5$

When expected frequencies are inadequate:

For 2 × 2 tables: Use Fisher's exact test (computes exact p-values without the chi-square approximation).
For larger tables: Combine categories (where theoretically justifiable), collect more data, or use the exact multinomial test.

⚠️ Adequate expected frequencies are about $E_{ij}$ , not $O_{ij}$ . A cell can have a large observed count but a small expected count — always check $E_{ij}$ directly.

4.3 Fixed Marginal Totals (Study Design Consideration)

The chi-square test technically assumes that the row totals are fixed by the study design (sampling from pre-specified groups). If both margins are random, the test remains valid asymptotically, but Fisher's exact test (which conditions on both margins) is more appropriate for small samples.

In practice:

Fixed row margins: Prospective study (e.g., 50 smokers and 50 non-smokers recruited in advance).
Fixed grand total only: Cross-sectional survey where only $N$ is fixed.
Both designs yield valid chi-square tests for large samples.

4.4 Nominal or Ordinal Scale of Measurement

Both variables must be categorical. The chi-square test makes no use of any ordering information in ordinal variables (all ordering is discarded). If both variables are ordinal, more powerful tests exploiting the ordering (linear-by-linear association, Spearman's $\rho$ ) should be considered alongside or instead of chi-square.

4.5 Sufficiently Large Sample Size

In addition to adequate cell expectations, the overall sample size $N$ must be large enough for the asymptotic chi-square approximation to hold. A common guideline is $N \geq 20$ for a $2 \times 2$ table; larger tables require proportionally larger $N$ .

When violated: Use Fisher's exact test ( $2 \times 2$ ) or the exact multinomial test (larger tables).

4.6 No Structural Zeros

A structural zero is a cell that is logically impossible (e.g., "males who are pregnant"). Structural zeros violate the independence model and require special treatment (structural equation models or quasi-independence models).

4.7 Assumption Summary

Assumption	How to Check	Remedy if Violated
Independence of observations	Study design review	McNemar's test; multilevel models
Adequate $E_{ij} \geq 5$	Inspect expected frequency table	Fisher's exact test; collapse categories
Sufficient overall $N$	Count total observations	Fisher's exact; collect more data
Categorical variables	Measurement review	Use appropriate scale-specific tests
No structural zeros	Theoretical review	Quasi-independence models

5. Variants of the Chi-Square Test of Association

5.1 Pearson Chi-Square Test of Independence (Standard)

The classic form: compare observed to expected frequencies in a two-way contingency table using the Pearson statistic $\chi^2 = \sum (O-E)^2/E$ .

5.2 Fisher's Exact Test

When expected frequencies are small (especially in $2 \times 2$ tables), Fisher's exact test computes the exact probability of observing the data or a more extreme table, given the observed marginal totals:

$p = \frac{\binom{R_1}{O_{11}}\binom{R_2}{O_{21}}}{\binom{N}{C_1}} = \frac{R_1!\,R_2!\,C_1!\,C_2!}{N!\,O_{11}!\,O_{12}!\,O_{21}!\,O_{22}!}$

The p-value is the sum of hypergeometric probabilities for all tables as extreme as or more extreme than observed. Fisher's exact test is always valid (not an approximation) but extends with difficulty to larger tables (requires exact conditional multinomial computation).

5.3 Chi-Square Goodness-of-Fit Test

Although not a test of association, this closely related test assesses whether the observed distribution of a single categorical variable matches a theoretically specified distribution $\{p_{0,1}, p_{0,2}, \ldots, p_{0,k}\}$ :

$\chi^2 = \sum_{j=1}^k \frac{(O_j - E_j)^2}{E_j}, \quad E_j = N \times p_{0,j}$

Degrees of freedom: $\nu = k - 1$ (number of categories minus one).

5.4 McNemar's Test (Paired Nominal Data)

For paired or matched binary categorical data (e.g., before/after designs), McNemar's test is the appropriate alternative to chi-square. It focuses on the discordant pairs:

$\chi^2_{McNemar} = \frac{(b - c)^2}{b + c}$

Where $b$ and $c$ are the off-diagonal cells of the $2 \times 2$ matched-pairs table. With continuity correction: $(|b-c| - 1)^2 / (b+c)$ .

5.5 Linear-by-Linear Association Test

When both variables are ordinal, the linear-by-linear association test assigns integer scores to the ordered categories and tests for a trend:

$\chi^2_{linear} = (N-1) \times r^2_S$

Where $r_S$ is the Spearman correlation between the two ordinal variables. This test has 1 degree of freedom regardless of the table size and is more powerful than the overall chi-square when the true association is monotone.

5.6 Cochran-Mantel-Haenszel Test (Stratified Tables)

When the relationship between two binary variables must be assessed across multiple strata (e.g., the same 2 × 2 table measured in several hospitals), the Mantel- Haenszel test combines evidence across strata:

$\chi^2_{MH} = \frac{\left(\sum_k (O_{11k} - E_{11k})\right)^2}{\sum_k \text{Var}(O_{11k})}$

This controls for the stratifying variable and estimates a common odds ratio across strata, assuming the association is homogeneous.

5.7 Bayesian Chi-Square Analysis

The Bayesian approach to contingency table analysis estimates a Bayes Factor comparing the association model ( $H_1$ ) to the independence model ( $H_0$ ). Under a symmetric Dirichlet prior, the Bayes Factor can be approximated via the Bayesian Information Criterion (BIC):

$BF_{10} \approx \exp\!\left(\frac{\chi^2 - \nu \ln N}{2}\right)$

$BF_{10} > 3$ indicates moderate evidence for an association; $BF_{10} < 1/3$ indicates moderate evidence for independence.

6. Using the Chi-Square Test of Association Calculator Component

The Chi-Square Test of Association Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, and reporting tests of association in contingency tables.

Step-by-Step Guide

Step 1 — Select the Test

Navigate to Statistical Tests → Chi-Square Tests → Chi-Square Test of Association.

Step 2 — Input Method

Choose how to provide data:

Raw data: Upload or paste two categorical variable columns. DataStatPro automatically constructs the contingency table, computing row totals, column totals, and the grand total.
Contingency table: Enter the observed frequency counts directly into the interactive table grid. Specify row and column labels. Add or remove rows and columns using the $+$ and $-$ controls.
Summary proportions: Enter proportions and a total $N$ to reconstruct the table.

Step 3 — Define the Table Structure

Specify the number of rows $r$ and columns $c$ .
Label the row variable, column variable, and each category.
DataStatPro automatically computes all marginal totals and expected frequencies $E_{ij}$ .

Step 4 — Select the Alternative Test (if applicable)

DataStatPro automatically detects violated assumptions and suggests alternatives:

If any $E_{ij} < 5$ in a $2 \times 2$ table → Fisher's exact test is recommended.
If both variables are ordinal → Linear-by-linear association test is offered.
If data are paired → McNemar's test is prompted.

Step 5 — Set Significance Level

Default: $\alpha = .05$ . DataStatPro simultaneously reports results at $\alpha = .01$ and $\alpha = .001$ .

Step 6 — Select Display Options

✅ Observed and expected frequency tables with percentage breakdowns.
✅ Pearson $\chi^2$ , $df$ , exact p-value, and decision.
✅ Yates' continuity-corrected $\chi^2$ (for $2 \times 2$ tables).
✅ Fisher's exact test p-value (for $2 \times 2$ tables).
✅ Likelihood ratio $G$ statistic.
✅ Cramér's $V$ (all table sizes) and phi $\phi$ ( $2 \times 2$ only) with 95% CI.
✅ Standardised and adjusted standardised residuals with significance flags.
✅ Contribution of each cell to $\chi^2$ (heat-mapped).
✅ Mosaic plot and clustered bar chart visualisations.
✅ Chi-square distribution diagram with observed statistic and critical region.
✅ Power analysis: current power and required $N$ for 80%, 90%, 95% power.
✅ Bayesian analysis (Bayes Factor $BF_{10}$ ).
✅ APA 7th edition results paragraph (auto-generated).

Step 7 — Run the Analysis

Click "Run Chi-Square Test of Association". DataStatPro will:

Compute all $E_{ij}$ , verify assumption of adequate expected frequencies.
Compute $\chi^2$ , $G$ , $df$ , and exact p-value.
Apply Yates' correction and compute Fisher's exact p-value (for $2 \times 2$ ).
Compute all effect size measures ( $V$ , $\phi$ ) with confidence intervals.
Compute standardised and adjusted residuals for all cells.
Generate all selected visualisations.
Estimate post-hoc power and produce sample size recommendations.
Output an APA-compliant results paragraph.

7. Step-by-Step Procedure

7.1 Full Manual Procedure

Step 1 — State the Hypotheses

$H_0$ : [Variable $X$ ] and [Variable $Y$ ] are statistically independent.

$H_1$ : [Variable $X$ ] and [Variable $Y$ ] are not statistically independent (an association exists).

Step 2 — Construct the Contingency Table

Tally observations into an $r \times c$ table. Record:

Each cell's observed frequency $O_{ij}$ .
Row totals $R_i = \sum_j O_{ij}$ .
Column totals $C_j = \sum_i O_{ij}$ .
Grand total $N = \sum_i R_i = \sum_j C_j$ .

Step 3 — Check Assumptions

Verify independence of observations (design review).
Compute all expected frequencies: $E_{ij} = R_i C_j / N$ .
Confirm no more than 20% of cells have $E_{ij} < 5$ and all $E_{ij} \geq 1$ .
If assumptions are violated, switch to Fisher's exact test or collapse categories.

Step 4 — Compute Expected Frequencies

$E_{ij} = \frac{R_i \times C_j}{N} \quad \text{for all } i = 1,\ldots,r \text{ and } j = 1,\ldots,c$

Verify: $\sum_j E_{ij} = R_i$ and $\sum_i E_{ij} = C_j$ (marginal totals are preserved).

Step 5 — Compute the Chi-Square Statistic

$\chi^2 = \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$

Step 6 — Determine Degrees of Freedom

$\nu = (r-1)(c-1)$

Step 7 — Compute the p-Value

$p = P(\chi^2_\nu \geq \chi^2_{obs}) = 1 - F_{\chi^2_\nu}(\chi^2_{obs})$

Reject $H_0$ if $p \leq \alpha$ .

Step 8 — Compute Effect Size

Cramér's $V$ (all tables):

$V = \sqrt{\frac{\chi^2}{N \times \min(r-1,\; c-1)}}$

Phi coefficient ( $2 \times 2$ tables only):

$\phi = \sqrt{\frac{\chi^2}{N}}$

Step 9 — Compute Standardised Residuals

$z_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - R_i/N)(1 - C_j/N)}}$

Flag cells where $|z_{ij}| > 1.96$ (significant at $\alpha = .05$ ).

Step 10 — Interpret and Report

Use the APA reporting template in Section 15. Always report $\chi^2$ , $\nu$ , $p$ , $N$ , Cramér's $V$ (or $\phi$ ) with 95% CI, and a table of observed frequencies with expected frequencies (or at minimum percentage breakdowns).

8. Interpreting the Output

8.1 The Chi-Square Statistic

$\chi^2_{obs}$ Relative to $\chi^2_{crit}$	Interpretation
$\chi^2_{obs} < \chi^2_{crit}$	Fail to reject $H_0$ ; no significant association at $\alpha$
$\chi^2_{obs} \geq \chi^2_{crit}$	Reject $H_0$ ; significant association detected at $\alpha$
Large $\chi^2$ with large $N$	Can be significant even for very weak associations
Small $\chi^2$ with small $N$	May be non-significant even for large associations (low power)

8.2 The p-Value

p-Value	Conventional Interpretation
$p > .10$	No evidence against $H_0$ (independence)
$.05 < p \leq .10$	Marginal evidence of association (trend)
$.01 < p \leq .05$	Significant association at $\alpha = .05$
$.001 < p \leq .01$	Significant association at $\alpha = .01$
$p \leq .001$	Significant association at $\alpha = .001$

⚠️ A significant p-value only indicates that some departure from independence exists. It does not indicate the strength, direction, or practical importance of the association. Always examine effect sizes, residuals, and percentage breakdowns to understand the nature of the association.

8.3 Expected and Observed Frequency Tables

Comparing observed and expected frequencies reveals the pattern of association:

Cell Pattern	Interpretation
$O_{ij} \gg E_{ij}$ (large positive residual)	This combination occurs more often than independence predicts
$O_{ij} \ll E_{ij}$ (large negative residual)	This combination occurs less often than independence predicts
$O_{ij} \approx E_{ij}$ for all cells	Data are consistent with independence
One or two cells drive $\chi^2$	Association is localised; examine residuals

8.4 Cramér's $V$ — Magnitude Interpretation

Cohen's (1988) benchmarks for Cramér's $V$ :

These benchmarks depend on the minimum dimension $k = \min(r,c)$ :

$V$	$k = 2$ (incl. $2 \times 2$ )	$k = 3$	$k = 4$	$k = 5$
Small	$0.10$	$0.07$	$0.06$	$0.05$
Medium	$0.30$	$0.21$	$0.17$	$0.15$
Large	$0.50$	$0.35$	$0.29$	$0.25$

For the $2 \times 2$ case (phi = Cramér's $V$ ):

$\vert\phi\vert$ or $V$	Verbal Label
$< 0.10$	Negligible
$0.10 - 0.19$	Small
$0.20 - 0.29$	Small to medium
$0.30 - 0.49$	Medium
$0.50 - 0.69$	Large
$\geq 0.70$	Very large

⚠️ Cohen's benchmarks were developed for the behavioural sciences and are conventions of last resort, not universal standards. In epidemiology, an odds ratio of 1.5 (corresponding to a small $\phi$ ) may be highly practically significant; in genetics, very small $V$ values can be of great theoretical importance. Always contextualise effect sizes within your specific domain.

8.5 Standardised Residuals: Locating the Source of Association

After a significant global chi-square, examine adjusted standardised residuals $z_{ij}$ :

$\vert z_{ij} \vert$	Interpretation
$< 1.96$	Cell does not significantly deviate from independence ( $\alpha = .05$ )
$1.96 - 2.58$	Significant deviation at $\alpha = .05$
$2.58 - 3.29$	Significant deviation at $\alpha = .01$
$> 3.29$	Significant deviation at $\alpha = .001$

Positive residuals: the combination occurs more than expected under independence. Negative residuals: the combination occurs less than expected.

⚠️ When examining residuals across multiple cells, apply a multiple comparisons correction (e.g., Bonferroni: compare $|z_{ij}|$ to $z_{\alpha/(2rc)}$ ) to control the familywise error rate.

9. Effect Sizes for the Chi-Square Test of Association

9.1 Phi Coefficient ( $\phi$ ) — for 2 × 2 Tables

$\phi = \sqrt{\frac{\chi^2}{N}}$

Or signed (to indicate direction of association):

$\phi = \frac{O_{11}O_{22} - O_{12}O_{21}}{\sqrt{R_1 R_2 C_1 C_2}}$

Interpretation: Equivalent to the Pearson correlation between two binary variables. Ranges from $-1$ (perfect negative association) to $+1$ (perfect positive association); $0$ = independence.

9.2 Cramér's $V$ — for All Table Sizes

$V = \sqrt{\frac{\chi^2}{N \times \min(r-1,\; c-1)}}$

Interpretation: Average association strength rescaled to $[0, 1]$ . $V = 0$ indicates independence; $V = 1$ indicates perfect association (each row category uniquely determines the column category). For $2 \times 2$ tables, $V = |\phi|$ .

9.3 Tschuprow's $T$

An alternative to Cramér's $V$ for non-square tables:

$T = \sqrt{\frac{\chi^2}{N \times \sqrt{(r-1)(c-1)}}}$

$T$ penalises tables with very unequal dimensions more heavily than $V$ . For square tables ( $r = c$ ), $T = V$ . $T$ achieves its maximum of 1 only for square tables; for non-square tables, the maximum is less than 1.

9.4 Odds Ratio ( $OR$ ) — for 2 × 2 Tables

For $2 \times 2$ tables with binary exposure ( $X$ ) and binary outcome ( $Y$ ):

$OR = \frac{O_{11} \times O_{22}}{O_{12} \times O_{21}}$

Interpretation: The ratio of the odds of $Y = 1$ given $X = 1$ to the odds of $Y = 1$ given $X = 0$ . The odds ratio is the preferred effect size in clinical and epidemiological research.

$OR = 1$ : No association.
$OR > 1$ : Exposure is positively associated with the outcome.
$OR < 1$ : Exposure is negatively associated with the outcome.

Approximate 95% CI for $\ln(OR)$ :

$\ln(OR) \pm 1.96 \times \sqrt{\frac{1}{O_{11}} + \frac{1}{O_{12}} + \frac{1}{O_{21}} + \frac{1}{O_{22}}}$

Exponentiate the bounds to obtain the CI on the OR scale.

9.5 Relative Risk ( $RR$ ) — for Prospective 2 × 2 Studies

When row margins are fixed by design (prospective/experimental study):

$RR = \frac{O_{11}/R_1}{O_{21}/R_2}$

Interpretation: The ratio of the probability of the outcome in group 1 to the probability in group 2. Ranges from 0 to $\infty$ ; $RR = 1$ indicates no association.

⚠️ Relative risk is only interpretable when row margins are fixed (i.e., group sizes are pre-specified). For cross-sectional or case-control designs, use the odds ratio.

9.6 Effect Size Summary Table

Effect Size	Formula	Range	Interpretation
Phi ( $\phi$ )	$\sqrt{\chi^2/N}$	$[0, 1]$ ( $[-1,1]$ signed)	Correlation for binary variables; $2 \times 2$ only
Cramér's $V$	$\sqrt{\chi^2/(N\min(r-1,c-1))}$	$[0, 1]$	Association strength; all table sizes
Tschuprow's $T$	$\sqrt{\chi^2/(N\sqrt{(r-1)(c-1)})}$	$[0, 1]$	Conservative alternative to $V$
Odds ratio ( $OR$ )	$O_{11}O_{22}/(O_{12}O_{21})$	$(0, \infty)$	Clinical/epi effect; $2 \times 2$ only
Relative risk ( $RR$ )	$(O_{11}/R_1)/(O_{21}/R_2)$	$(0, \infty)$	Prospective studies; $2 \times 2$ only

10. Confidence Intervals

10.1 CI for Cramér's $V$

An asymptotic 95% CI for $V$ uses the non-central chi-square distribution. The non-centrality parameter $\hat{\lambda} = \chi^2 - \nu$ (adjusted for bias). Finding $\lambda_L$ and $\lambda_U$ from the non-central chi-square distribution:

$V_L = \sqrt{\frac{\lambda_L}{N \times \min(r-1,\; c-1)}}, \qquad V_U = \sqrt{\frac{\lambda_U}{N \times \min(r-1,\; c-1)}}$

DataStatPro computes exact CIs numerically. An approximate 95% CI uses:

$SE_V \approx \sqrt{\frac{V^2}{\chi^2} \times \frac{\nu + 2}{2N}}$

$V \pm 1.96 \times SE_V$ (adequate for $N > 50$ )

10.2 CI for the Odds Ratio

Exact (Cornfield) 95% CI for $OR$ :

Computed iteratively by finding the interval on the $OR$ scale within which the conditional distribution of $O_{11}$ covers 95% probability. DataStatPro computes this exactly.

Approximate (Woolf's) 95% CI:

$OR \times \exp\!\left(\pm 1.96 \times \sqrt{\frac{1}{O_{11}} + \frac{1}{O_{12}} + \frac{1}{O_{21}} + \frac{1}{O_{22}}}\right)$

10.3 CI for Proportions and Risk Difference

For $2 \times 2$ tables, the risk difference (absolute risk reduction) and its CI directly quantify the practical importance of the association:

$RD = \frac{O_{11}}{R_1} - \frac{O_{21}}{R_2}$

95% CI for $RD$ (Newcombe's method recommended over Wald):

$RD \pm 1.96 \times \sqrt{\frac{p_1(1-p_1)}{R_1} + \frac{p_2(1-p_2)}{R_2}}$

Where $p_1 = O_{11}/R_1$ and $p_2 = O_{21}/R_2$ .

10.4 Equivalence and Confidence Intervals

If the goal is to establish that the association is negligibly small (near- independence), use an equivalence testing approach:

Specify a maximum tolerable effect size $V_{max}$ (e.g., $V_{max} = 0.10$ ).
Conclude practical equivalence if the entire 95% CI for $V$ falls below $V_{max}$ .
If the upper CI bound exceeds $V_{max}$ , equivalence cannot be concluded.

11. Advanced Topics

11.1 Multiple Chi-Square Tests on the Same Dataset

Testing associations between multiple pairs of categorical variables in the same dataset inflates the familywise error rate:

$FWER = 1 - (1-\alpha)^k$

For $k = 10$ tests: $FWER = 1 - (0.95)^{10} = .401$ .

Correction strategies:

Bonferroni: $\alpha' = \alpha/k$ . Simple but conservative.
Holm-Bonferroni: Sequential adjustment — less conservative than Bonferroni.
Benjamini-Hochberg: Controls the False Discovery Rate (FDR) — appropriate for large-scale exploratory analyses (e.g., genome-wide association studies).

11.2 Partitioning Chi-Square in Larger Tables

For tables larger than $2 \times 2$ , a significant overall $\chi^2$ tells you that some association exists somewhere in the table, but not where. Beyond residual analysis, the overall $\chi^2$ can be partitioned into independent $2 \times 2$ subtables (using Helmert or polynomial contrast coding), each with 1 df, subject to the constraint that the partition df sum to the total df.

This allows focused hypothesis tests about specific row or column comparisons (e.g., "Do groups A and B differ from group C?" tested separately from "Do groups A and B differ from each other?").

11.3 Measures of Agreement vs. Measures of Association

For $r \times r$ square tables where both variables classify the same objects into the same categories (inter-rater agreement), use Cohen's kappa ( $\kappa$ ) rather than $\chi^2$ or $V$ :

$\kappa = \frac{P_o - P_e}{1 - P_e}$

Where $P_o = \sum_i O_{ii}/N$ is the observed agreement and $P_e = \sum_i (R_i C_i)/N^2$ is the expected agreement by chance. $\kappa = 1$ → perfect agreement; $\kappa = 0$ → agreement at chance level.

11.4 Log-Linear Models for Multi-Way Tables

For three or more categorical variables, pairwise chi-square tests are inadequate — they cannot distinguish direct associations from indirect ones mediated by a third variable. Log-linear models model the joint distribution of all variables and allow testing for higher-order interactions:

$\ln(E_{ijk}) = \mu + \lambda_i^A + \lambda_j^B + \lambda_k^C + \lambda_{ij}^{AB} + \lambda_{ik}^{AC} + \lambda_{jk}^{BC} + \lambda_{ijk}^{ABC}$

Model selection (backward elimination or BIC-based) identifies the most parsimonious model that fits the data, isolating which associations are genuine versus artefactual.

11.5 Simpson's Paradox

Simpson's Paradox occurs when an association observed in the overall table reverses or disappears when the data are stratified by a third variable. This is one of the most important reasons to never rely on the marginal chi-square test alone when potential confounders exist.

Classic example: Drug A appears superior to Drug B overall, but within each hospital stratum, Drug B is superior — the reversal is caused by the confounding of hospital quality with drug assignment.

Detection: Use the Cochran-Mantel-Haenszel procedure to compute stratum-adjusted estimates and compare to the unadjusted association.

11.6 The Relationship Between Chi-Square and Other Tests

The chi-square test of association is algebraically equivalent to several other tests in special cases:

Special Case	Equivalent Test
$2 \times 2$ table	Two-proportion z-test: $z^2 = \chi^2$
$2 \times 2$ , small $N$	Fisher's exact test
Both variables ordinal	Spearman correlation significance test
One binary, one continuous	Point-biserial correlation; independent t-test
$1 \times k$ table	Chi-square goodness-of-fit test

11.7 Bayesian Chi-Square Analysis

The Bayesian approach quantifies evidence rather than making a binary decision. The BIC-approximated Bayes Factor is:

$BF_{10} \approx \exp\!\left(\frac{\chi^2 - \nu \ln N}{2}\right)$

Interpreting $BF_{10}$ :

$BF_{10}$	Evidence for $H_1$ (Association) over $H_0$ (Independence)
$> 100$	Extreme
$30 - 100$	Very strong
$10 - 30$	Strong
$3 - 10$	Moderate
$1 - 3$	Anecdotal
$1$	No evidence
$1/3 - 1$	Anecdotal evidence for $H_0$
$< 1/3$	Moderate evidence for $H_0$ (independence)

Key advantage: $BF_{10} < 1/3$ constitutes positive evidence for independence — something p-values cannot provide.

12. Worked Examples

Example 1: Smoking Status and Lung Disease (2 × 2)

A public health researcher surveys $N = 200$ adults, recording smoking status (smoker/non-smoker) and presence of chronic lung disease (yes/no).

Contingency Table (Observed):

	Lung Disease: Yes	Lung Disease: No	Row Total
Smoker	40	60	100
Non-Smoker	20	80	100
Column Total	60	140	200

Step 1 — Hypotheses:

$H_0$ : Smoking status and lung disease are independent.

$H_1$ : Smoking status and lung disease are associated.

Step 2 — Expected Frequencies:

$E_{11} = (100 \times 60)/200 = 30.0$

$E_{12} = (100 \times 140)/200 = 70.0$

$E_{21} = (100 \times 60)/200 = 30.0$

$E_{22} = (100 \times 140)/200 = 70.0$

All $E_{ij} = 30$ or $70$ — assumption of adequate expected frequencies is satisfied.

Step 3 — Chi-Square Statistic:

$\chi^2 = \frac{(40-30)^2}{30} + \frac{(60-70)^2}{70} + \frac{(20-30)^2}{30} + \frac{(80-70)^2}{70}$

$= \frac{100}{30} + \frac{100}{70} + \frac{100}{30} + \frac{100}{70}$

$= 3.333 + 1.429 + 3.333 + 1.429 = 9.524$

Step 4 — Degrees of Freedom and p-Value:

$\nu = (2-1)(2-1) = 1$

$p = P(\chi^2_1 \geq 9.524) = .002$

Step 5 — Effect Sizes:

$\phi = \sqrt{9.524/200} = \sqrt{0.0476} = 0.218$

$OR = (40 \times 80)/(60 \times 20) = 3200/1200 = 2.667$

95% CI for $OR$ : $\exp\!\left(\ln(2.667) \pm 1.96\sqrt{1/40+1/60+1/20+1/80}\right)$

$= \exp(0.981 \pm 1.96 \times 0.299) = \exp(0.981 \pm 0.586) = [1.378, 5.163]$

Step 6 — Adjusted Standardised Residuals:

$z_{11} = (40-30)/\sqrt{30(1-100/200)(1-60/200)} = 10/\sqrt{30 \times 0.5 \times 0.7} = 10/\sqrt{10.5} = 3.086$

By symmetry: $z_{22} = 3.086$ ; $z_{12} = -3.086$ ; $z_{21} = -3.086$ .

All cells show significant deviations ( $|z| = 3.086 > 2.58$ , significant at $\alpha = .01$ ).

Summary:

Statistic	Value	Interpretation
$\chi^2(1)$	$9.524$
$p$ (two-tailed)	$.002$	Highly significant
$N$	$200$
$\phi$	$0.218$	Small-to-medium association
$OR$	$2.667$	Smokers 2.67× more likely to have lung disease
95% CI for $OR$	$[1.378, 5.163]$	Excludes 1; confirms significant association

APA write-up: "A chi-square test of association revealed a significant association between smoking status and lung disease, $\chi^2(1, N = 200) = 9.52$ , $p = .002$ , $\phi = 0.22$ . The odds of lung disease were 2.67 times higher for smokers than non-smokers (95% CI: [1.38, 5.16]). Adjusted standardised residuals indicated that smokers showed more lung disease ( $z = 3.09$ ) and non-smokers showed less lung disease ( $z = -3.09$ ) than expected under independence."

Example 2: Teaching Method and Pass/Fail Outcome (3 × 2)

An education researcher compares three teaching methods (lecture, flipped classroom, online) on student pass/fail outcomes for $N = 210$ students (70 per method).

Contingency Table (Observed):

	Pass	Fail	Row Total
Lecture	45	25	70
Flipped	55	15	70
Online	38	32	70
Col. Total	138	72	210

Step 1 — Expected Frequencies:

$E_{ij} = R_i C_j / N$ :

$E_{Lec, Pass} = 70 \times 138/210 = 46.0$

$E_{Lec, Fail} = 70 \times 72/210 = 24.0$

$E_{Flip, Pass} = 70 \times 138/210 = 46.0$

$E_{Flip, Fail} = 70 \times 72/210 = 24.0$

$E_{Online, Pass} = 70 \times 138/210 = 46.0$

$E_{Online, Fail} = 70 \times 72/210 = 24.0$

All $E_{ij} \geq 24$ — assumption satisfied.

Step 2 — Chi-Square Statistic ( $\nu = (3-1)(2-1) = 2$ ):

$\chi^2 = \frac{(45-46)^2}{46} + \frac{(25-24)^2}{24} + \frac{(55-46)^2}{46} + \frac{(15-24)^2}{24} + \frac{(38-46)^2}{46} + \frac{(32-24)^2}{24}$

$= 0.022 + 0.042 + 1.761 + 3.375 + 1.391 + 2.667 = 9.257$

Step 3 — p-Value:

$p = P(\chi^2_2 \geq 9.257) = .010$

Step 4 — Cramér's $V$ :

$V = \sqrt{9.257/(210 \times \min(2,1))} = \sqrt{9.257/210} = \sqrt{0.0441} = 0.210$

Step 5 — Adjusted Standardised Residuals:

Using $z_{ij} = (O_{ij}-E_{ij})/\sqrt{E_{ij}(1-R_i/N)(1-C_j/N)}$ :

| Cell | $z_{ij}$ | Significant ( $|z| > 1.96$ )? | | :--- | :------- | :-------------------------- | | Lecture/Pass | $-0.20$ | No | | Lecture/Fail | $0.28$ | No | | Flipped/Pass | $+2.77$ | Yes ( $\alpha = .01$ ) | | Flipped/Fail | $-3.82$ | Yes ( $\alpha = .001$ ) | | Online/Pass | $-2.57$ | Yes ( $\alpha = .05$ ) | | Online/Fail | $+3.54$ | Yes ( $\alpha = .001$ ) |

Interpretation: Teaching method is significantly associated with pass/fail outcome, $\chi^2(2) = 9.26$ , $p = .010$ , $V = 0.21$ . The flipped classroom exceeds the expected pass rate (more passes, fewer fails than expected), while online learning underperforms (fewer passes, more fails than expected). The lecture method does not significantly deviate from independence.

APA write-up: "A chi-square test of association indicated a significant association between teaching method and pass/fail outcome, $\chi^2(2, N = 210) = 9.26$ , $p = .010$ , $V = 0.21$ . Adjusted standardised residuals revealed that the flipped classroom had significantly more passes and fewer fails than expected ( $z = 2.77$ and $z = -3.82$ , respectively), while the online method had significantly fewer passes and more fails than expected ( $z = -2.57$ and $z = 3.54$ , respectively). The lecture method did not deviate significantly from independence."

Example 3: Fisher's Exact Test — Rare Side Effect

A clinical trial investigates whether a new drug is associated with a rare side effect. Only $N = 30$ participants are available.

Contingency Table (Observed):

	Side Effect: Yes	Side Effect: No	Row Total
Treatment	6	9	15
Placebo	2	13	15
Column Total	8	22	30

Step 1 — Check Assumptions:

$E_{Treatment, Yes} = (15 \times 8)/30 = 4.0$

$E_{Placebo, Yes} = (15 \times 8)/30 = 4.0$

Both cells involving side effects have $E_{ij} = 4.0 < 5$ → Fisher's exact test is required rather than the standard chi-square approximation.

Step 2 — Fisher's Exact Test p-Value:

The one-tailed p-value (testing $H_1$ : treatment increases side effects) is computed as the sum of hypergeometric probabilities for tables as extreme as or more extreme than observed:

$p_{one-tailed} = P(O_{11} \geq 6 \mid R_1 = 15, R_2 = 15, C_1 = 8, N = 30)$

$= P(O_{11} = 6) + P(O_{11} = 7) + P(O_{11} = 8)$

$= \frac{\binom{15}{6}\binom{15}{2}}{\binom{30}{8}} + \frac{\binom{15}{7}\binom{15}{1}}{\binom{30}{8}} + \frac{\binom{15}{8}\binom{15}{0}}{\binom{30}{8}}$

$= 0.1029 + 0.0228 + 0.0019 = 0.1276$

$p_{two-tailed} = .194$

Step 3 — Odds Ratio:

$OR = (6 \times 13)/(9 \times 2) = 78/18 = 4.33$

95% CI for $OR$ (exact Cornfield): $[0.71, 36.85]$

Interpretation: Despite a fourfold increase in the odds of a side effect in the treatment group, the small sample size provides insufficient evidence to conclude a statistically significant association, $p = .194$ (Fisher's exact, two-tailed), $\phi = 0.25$ , $OR = 4.33$ [95% CI: 0.71, 36.85]. The wide confidence interval reflects substantial uncertainty due to the small sample. A larger study is warranted.

APA write-up: "Due to small expected cell frequencies ( $E < 5$ ), Fisher's exact test was used. No statistically significant association was found between treatment condition and side effect occurrence ( $p = .194$ , two-tailed), though the effect size was small-to-medium ( $\phi = 0.25$ , $OR = 4.33$ , 95% CI: [0.71, 36.85]). The wide confidence interval and low power ( $\text{power} = 0.17$ ) indicate that the study was substantially underpowered for this effect size; these findings should be interpreted with caution and a larger replication study is recommended."

13. Common Mistakes and How to Avoid Them

Mistake 1: Using Chi-Square with Non-Independent Observations

Problem: Entering repeated measurements, matched pairs, or clustered data into a standard chi-square test. This violates the independence assumption and produces inflated Type I error rates.

Solution: For matched pairs or pre-post binary data, use McNemar's test. For clustered data, use generalised estimating equations (GEE) or mixed-effects logistic regression. For $k$ repeated binary measures, use Cochran's Q test.

Mistake 2: Ignoring Small Expected Frequencies

Problem: Running a standard chi-square test when several cells have expected frequencies below 5, leading to an unreliable chi-square approximation with inflated or deflated p-values.

Solution: Always inspect the expected frequency table before interpreting results. Use Fisher's exact test for $2 \times 2$ tables with small $E_{ij}$ . For larger tables, collapse theoretically justifiable categories, or use the exact multinomial test.

Mistake 3: Treating Chi-Square as Directional

Problem: Interpreting a significant chi-square result as indicating a specific direction (e.g., "Group A is higher than Group B"). The chi-square test is omnibus and non-directional — it only indicates that some association exists.

Solution: After a significant omnibus chi-square, examine adjusted standardised residuals to identify which specific cells deviate from independence and in which direction. Report percentage breakdowns and odds ratios to characterise the direction of the association.

Mistake 4: Confusing Association with Causation

Problem: Concluding that $X$ causes $Y$ because a significant association was found. Chi-square only establishes statistical association; causal inference requires experimental design or causal modelling.

Solution: Use appropriate causal language ("is associated with" rather than "causes") unless the study design (randomised experiment) supports causal claims. Consider potential confounders and use stratified analyses (CMH test) or log-linear models to adjust for them.

Mistake 5: Reporting Only the p-Value Without Effect Size

Problem: Reporting " $\chi^2(2) = 22.4$ , $p < .001$ " without effect size is insufficient. A $\chi^2$ driven purely by large $N$ may correspond to a trivially small $V = 0.04$ that has no practical importance.

Solution: Always report Cramér's $V$ (or $\phi$ for $2 \times 2$ tables) with its 95% CI. For $2 \times 2$ tables in clinical or epidemiological contexts, also report the odds ratio and its CI.

Mistake 6: Using Chi-Square for Continuous Data

Problem: Dichotomising a continuous variable (e.g., age → young/old) to use chi-square instead of a more appropriate parametric test. Dichotomisation discards information and dramatically reduces statistical power.

Solution: Use the continuous variable in a correlation, regression, or t-test where appropriate. Only categorise variables when the categorical form is theoretically meaningful (e.g., clinical threshold).

Mistake 7: Misinterpreting a Non-Significant Result as Evidence of Independence

Problem: Concluding that $p = .43$ means the variables are independent. As with all hypothesis tests, a non-significant result means insufficient evidence against $H_0$ , not that $H_0$ is true. With $N = 20$ , almost no association will reach significance.

Solution: Report power analysis and the 95% CI for $V$ . Use the Bayesian chi-square test ( $BF_{10} < 1/3$ ) or a TOST equivalence procedure to positively support independence.

Mistake 8: Applying Row or Column Percentages Inconsistently

Problem: Reporting column percentages when rows represent the grouping variable (or vice versa) makes patterns hard to interpret. Mixing row and column percentages within the same table causes confusion.

Solution: When rows represent the independent variable (groups), report row percentages (each row sums to 100%). When the marginal distributions are both random (cross-sectional survey), report both row and column percentages and let the research question guide interpretation.

14. Troubleshooting

Problem	Likely Cause	Solution
Chi-square is extremely large ( $\chi^2 > 50$ for a $2 \times 2$ table)	Very large $N$ ; even negligible associations become significant	Focus on $V$ or $\phi$ ; a large $\chi^2$ may correspond to $V < 0.05$
$p = 0.000$ exactly	Software rounds to zero; extremely large $\chi^2$	Report as $p < .001$ per APA; investigate effect size
Expected frequency $< 1$ in one or more cells	Very small cell counts or extreme marginal imbalance	Use Fisher's exact test ( $2 \times 2$ ); collapse categories; collect more data
Chi-square statistic equals 0	Observed frequencies exactly equal expected ( $O_{ij} = E_{ij}$ for all cells)	Verify data entry; this would mean perfect independence
Cramér's $V > 1$	Computation error or incorrect $N$ or $\min(r-1,c-1)$	Verify formula; $V$ is bounded $[0,1]$ by construction
Fisher's exact test and chi-square give very different $p$ -values	Small sample or extreme marginals	Prefer Fisher's exact; chi-square approximation is unreliable with small $E_{ij}$
All adjusted residuals are small but $\chi^2$ is significant	Association is diffuse — spread uniformly across all cells, not localised	Report the overall result; diffuse associations may be artefacts of sparse data
Large $V$ but $p > .05$	Small $N$ (low power)	Study is underpowered; $V$ may reflect a real but undetected effect; report power
Negative odds ratio or $OR < 0$	Computation error (OR cannot be negative)	Verify cell order and formula; $OR = O_{11}O_{22}/(O_{12}O_{21})$
McNemar result differs substantially from chi-square	Data are paired, not independent	Use McNemar's test; the standard chi-square is incorrect for paired data
$G$ -statistic and $\chi^2$ disagree substantially	Very small expected frequencies or highly asymmetric tables	Use Fisher's exact; both $G$ and $\chi^2$ are unreliable for very small samples
Structural zero in one cell (count = 0 by design)	Logically impossible cell combination	Use quasi-independence model; do not include structural zeros in standard chi-square

15. Quick Reference Cheat Sheet

Core Equations

Formula	Description
$E_{ij} = R_i C_j / N$	Expected frequency for cell $(i,j)$
$\chi^2 = \sum (O_{ij}-E_{ij})^2/E_{ij}$	Pearson chi-square statistic
$G = 2\sum O_{ij}\ln(O_{ij}/E_{ij})$	Likelihood ratio statistic
$\nu = (r-1)(c-1)$	Degrees of freedom
$p = P(\chi^2_\nu \geq \chi^2_{obs})$	Right-tail p-value
$V = \sqrt{\chi^2/(N\min(r-1,c-1))}$	Cramér's $V$ (all table sizes)
$\phi = \sqrt{\chi^2/N}$	Phi coefficient ( $2 \times 2$ only)
$z_{ij} = (O_{ij}-E_{ij})/\sqrt{E_{ij}(1-R_i/N)(1-C_j/N)}$	Adjusted standardised residual
$OR = O_{11}O_{22}/(O_{12}O_{21})$	Odds ratio ( $2 \times 2$ only)
$N \approx \chi^2_{crit}/(\min(r-1,c-1) \times V^2)$	Approximate required $N$ for 80% power

Decision Guide

Condition	Recommended Test
Two categorical variables, independent, adequate $N$	Chi-square test of association
$2 \times 2$ table with any $E_{ij} < 5$	Fisher's exact test
Paired binary data (pre/post)	McNemar's test
Both variables ordinal	Linear-by-linear association test
Three or more categorical variables	Log-linear model
Establishing independence (not just failing to reject)	Bayesian chi-square ( $BF_{10}$ ) or TOST equivalence
One categorical variable vs. known distribution	Chi-square goodness-of-fit test

Cramér's $V$ Benchmarks (Cohen, 1988)

Table Size ( $\min(r,c)$ )	Small	Medium	Large
2 (includes $2 \times 2$ )	$0.10$	$0.30$	$0.50$
3	$0.07$	$0.21$	$0.35$
4	$0.06$	$0.17$	$0.29$
5	$0.05$	$0.15$	$0.25$

Required Sample Size (2 × 2 Table, $\alpha = .05$ )

$V$	Power = 0.80	Power = 0.90
0.10	785	1046
0.20	197	263
0.30	88	117
0.50	32	42
0.70	17	22

Adjusted Standardised Residual Thresholds

$\vert z_{ij} \vert$	Significance Level
$> 1.96$	$\alpha = .05$
$> 2.58$	$\alpha = .01$
$> 3.29$	$\alpha = .001$

APA 7th Edition Reporting Templates

Standard (all table sizes): "A chi-square test of association revealed a [significant / non-significant] association between [Variable X] and [Variable Y], $\chi^2(\nu, N = \text{[value]}) = \text{[value]}$ , $p = \text{[value]}$ , $V = \text{[value]}$ [95% CI: LB, UB]."

With odds ratio (2 × 2 tables): "... The odds of [outcome] were [OR value] times higher in [group 1] than [group 2] (95% CI: [LB, UB])."

With residuals (larger tables): "... Adjusted standardised residuals indicated that [cell description] occurred significantly more/less frequently than expected ( $z = \text{[value]}$ )."

Fisher's exact test: "Due to small expected cell frequencies, Fisher's exact test was used. [Result statement], $p = \text{[value]}$ (Fisher's exact, two-tailed), $\phi = \text{[value]}$ , $OR = \text{[value]}$ [95% CI: LB, UB]."

With Bayesian analysis: "The Bayesian chi-square test yielded $BF_{10} = \text{[value]}$ , indicating [moderate / strong / extreme] evidence for [an association / independence]."

Reporting Checklist

Item	Required
Chi-square statistic with sign	✅ Always
Degrees of freedom $\nu = (r-1)(c-1)$	✅ Always
Sample size $N$ (in parentheses with $df$ )	✅ Always
Exact p-value	✅ Always
Observed frequency table (or percentage breakdown)	✅ Always
Expected frequency table	✅ When $N < 100$ or any $E_{ij}$ near threshold
Cramér's $V$ or phi $\phi$	✅ Always
95% CI for effect size	✅ Always
Odds ratio and 95% CI	✅ For $2 \times 2$ in clinical/epi contexts
Adjusted standardised residuals	✅ For tables larger than $2 \times 2$
Fisher's exact test	✅ When any $E_{ij} < 5$
Assumption check (expected frequencies)	✅ Always
Note on independence of observations	✅ Always
Power analysis	✅ For non-significant results; underpowered studies
Bayes Factor	Recommended for null (non-significant) results
TOST equivalence test	✅ When claiming independence

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting chi-square tests of association within the DataStatPro application. For further reading, consult Agresti's "An Introduction to Categorical Data Analysis" (3rd ed., 2018), Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Everitt's "The Analysis of Contingency Tables" (2nd ed., 1992), and Bishop, Fienberg & Holland's "Discrete Multivariate Analysis" (1975). For feature requests or support, contact the DataStatPro team.

Chi-Square Test

Chi-Square Test of Association: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 Categorical Variables and Frequency Data

1.2 Contingency Tables

1.3 The Concept of Statistical Independence

1.4 Expected Frequencies Under Independence

1.5 The Chi-Square Distribution

1.6 The Null and Alternative Hypotheses in Categorical Tests

1.7 The p-Value and Significance Level

1.8 Degrees of Freedom in Contingency Tables

2. What is a Chi-Square Test of Association?

2.1 The Core Question

2.2 The General Logic

2.3 When to Use the Chi-Square Test of Association

2.4 Real-World Applications

2.5 Distinguishing from Related Tests

3. The Mathematics Behind the Chi-Square Test of Association

3.1 The Pearson Chi-Square Statistic

3.2 Degrees of Freedom

3.3 Computing the p-Value

3.4 Critical Values

3.5 Standardised and Adjusted Residuals

3.6 Yates' Continuity Correction (2 × 2 Tables)

3.7 The Likelihood Ratio Chi-Square (GGG-Test)

3.8 Effect Size — Cramér's VVV

3.9 Effect Size — Phi Coefficient (ϕ\phiϕ) for 2 × 2 Tables

3.10 Statistical Power

4. Assumptions of the Chi-Square Test of Association

4.1 Independence of Observations

4.2 Adequate Expected Frequencies

4.3 Fixed Marginal Totals (Study Design Consideration)

4.4 Nominal or Ordinal Scale of Measurement

4.5 Sufficiently Large Sample Size

4.6 No Structural Zeros

4.7 Assumption Summary

5. Variants of the Chi-Square Test of Association

5.1 Pearson Chi-Square Test of Independence (Standard)

5.2 Fisher's Exact Test

5.3 Chi-Square Goodness-of-Fit Test

5.4 McNemar's Test (Paired Nominal Data)

5.5 Linear-by-Linear Association Test

5.6 Cochran-Mantel-Haenszel Test (Stratified Tables)

5.7 Bayesian Chi-Square Analysis

6. Using the Chi-Square Test of Association Calculator Component

Step-by-Step Guide

7. Step-by-Step Procedure

7.1 Full Manual Procedure

Step 1 — State the Hypotheses

Step 2 — Construct the Contingency Table

Step 3 — Check Assumptions

Step 4 — Compute Expected Frequencies

Step 5 — Compute the Chi-Square Statistic

Step 6 — Determine Degrees of Freedom

Step 7 — Compute the p-Value

Step 8 — Compute Effect Size

Step 9 — Compute Standardised Residuals

Step 10 — Interpret and Report

8. Interpreting the Output

8.1 The Chi-Square Statistic

8.2 The p-Value

8.3 Expected and Observed Frequency Tables

8.4 Cramér's VVV — Magnitude Interpretation

8.5 Standardised Residuals: Locating the Source of Association

9. Effect Sizes for the Chi-Square Test of Association

9.1 Phi Coefficient (ϕ\phiϕ) — for 2 × 2 Tables

9.2 Cramér's VVV — for All Table Sizes

9.3 Tschuprow's TTT

9.4 Odds Ratio (OROROR) — for 2 × 2 Tables

9.5 Relative Risk (RRRRRR) — for Prospective 2 × 2 Studies

9.6 Effect Size Summary Table

10. Confidence Intervals

10.1 CI for Cramér's VVV

10.2 CI for the Odds Ratio

10.3 CI for Proportions and Risk Difference

10.4 Equivalence and Confidence Intervals

11. Advanced Topics

11.1 Multiple Chi-Square Tests on the Same Dataset

11.2 Partitioning Chi-Square in Larger Tables

3.7 The Likelihood Ratio Chi-Square ( $G$ -Test)

3.8 Effect Size — Cramér's $V$

3.9 Effect Size — Phi Coefficient ( $\phi$ ) for 2 × 2 Tables

8.4 Cramér's $V$ — Magnitude Interpretation

9.1 Phi Coefficient ( $\phi$ ) — for 2 × 2 Tables

9.2 Cramér's $V$ — for All Table Sizes

9.3 Tschuprow's $T$

9.4 Odds Ratio ( $OR$ ) — for 2 × 2 Tables

9.5 Relative Risk ( $RR$ ) — for Prospective 2 × 2 Studies

10.1 CI for Cramér's $V$

Cramér's $V$ Benchmarks (Cohen, 1988)

Required Sample Size (2 × 2 Table, $\alpha = .05$ )