Chi-Square Test

Step-by-step guide to conducting chi-square tests using DataStatPro.

Chi-Square Test of Association: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of categorical data analysis all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are encountering the chi-square test of association for the first time or deepening your understanding of relationships between categorical variables, this guide builds your knowledge systematically from the ground up.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is a Chi-Square Test of Association?
  3. The Mathematics Behind the Chi-Square Test of Association
  4. Assumptions of the Chi-Square Test of Association
  5. Variants of the Chi-Square Test of Association
  6. Using the Chi-Square Test of Association Calculator Component
  7. Step-by-Step Procedure
  8. Interpreting the Output
  9. Effect Sizes for the Chi-Square Test of Association
  10. Confidence Intervals
  11. Advanced Topics
  12. Worked Examples
  13. Common Mistakes and How to Avoid Them
  14. Troubleshooting
  15. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into the chi-square test of association, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 Categorical Variables and Frequency Data

Unlike continuous variables (measured on interval or ratio scales), categorical variables assign each observation to a discrete, mutually exclusive category. The chi-square test of association operates on frequency counts — the number of observations falling into each combination of category levels.

  • Nominal variables: Categories with no intrinsic order (e.g., blood type: A, B, AB, O; political party affiliation; treatment group).
  • Ordinal variables: Categories with a meaningful order but no equal spacing (e.g., education level: primary, secondary, tertiary; satisfaction: low, medium, high).

⚠️ The chi-square test treats all categorical data as nominal. If your variables are ordinal, consider tests that exploit the ordering, such as the Jonckheere–Terpstra test or a linear-by-linear association test, as they are more powerful.

1.2 Contingency Tables

A contingency table (also called a cross-tabulation or crosstab) organises the joint frequency distribution of two (or more) categorical variables. For two variables XX (with rr rows) and YY (with cc columns), the contingency table has dimensions r×cr \times c.

Each cell (i,j)(i, j) contains the observed frequency OijO_{ij}: the count of observations belonging to category ii of XX and category jj of YY.

Example (2 × 2 table):

Y=1Y = 1Y=2Y = 2Row Total
X=1X = 1O11O_{11}O12O_{12}R1R_1
X=2X = 2O21O_{21}O22O_{22}R2R_2
Column TotalC1C_1C2C_2NN

Where N=ijOijN = \sum_{i}\sum_{j} O_{ij} is the total sample size.

1.3 The Concept of Statistical Independence

Two categorical variables XX and YY are statistically independent if knowledge of an observation's category on XX gives no information about its category on YY. Formally, independence requires:

P(X=i and Y=j)=P(X=i)×P(Y=j)for all i,jP(X = i \text{ and } Y = j) = P(X = i) \times P(Y = j) \quad \text{for all } i, j

Equivalently, the conditional distribution of YY given X=iX = i is the same for all values of ii. The chi-square test of association tests whether the observed data are consistent with this independence assumption.

1.4 Expected Frequencies Under Independence

If XX and YY are independent, the expected frequency for cell (i,j)(i, j) is the product of the corresponding marginal probabilities multiplied by the total sample size:

Eij=N×P(X=i)×P(Y=j)E_{ij} = N \times P(X = i) \times P(Y = j)

Since the true marginal probabilities are unknown, they are estimated from the sample:

P^(X=i)=RiN,P^(Y=j)=CjN\hat{P}(X = i) = \frac{R_i}{N}, \qquad \hat{P}(Y = j) = \frac{C_j}{N}

Yielding the fundamental formula for expected frequencies:

Eij=Ri×CjNE_{ij} = \frac{R_i \times C_j}{N}

Where RiR_i is the ii-th row total and CjC_j is the jj-th column total.

1.5 The Chi-Square Distribution

The chi-square distribution χν2\chi^2_\nu is a right-skewed probability distribution parameterised by degrees of freedom ν\nu. Key properties:

  • Defined only for non-negative values (χ20\chi^2 \geq 0).
  • Skewed right, becoming more symmetric as ν\nu increases.
  • Mean =ν= \nu; Variance =2ν= 2\nu.
  • As ν\nu \to \infty, the chi-square distribution approaches a normal distribution.
  • Is the sum of ν\nu squared independent standard normal variables: χν2=k=1νZk2\chi^2_\nu = \sum_{k=1}^\nu Z_k^2, where ZkN(0,1)Z_k \sim \mathcal{N}(0,1).

The chi-square statistic measures the overall discrepancy between observed and expected frequencies — large values indicate strong departure from independence.

1.6 The Null and Alternative Hypotheses in Categorical Tests

The chi-square test of association operates within the hypothesis testing framework:

  • H0H_0: The two categorical variables are statistically independent (no association).
  • H1H_1: The two categorical variables are not statistically independent (an association exists).

Note that H1H_1 is always non-directional for the chi-square test — it simply states that some association exists, without specifying the form or direction of the relationship.

1.7 The p-Value and Significance Level

As in all hypothesis tests, the p-value is the probability of obtaining a test statistic as extreme or more extreme than observed, assuming H0H_0 is true. The significance level α\alpha (conventionally .05.05) is the threshold below which we reject H0H_0.

Because large χ2\chi^2 values indicate departure from independence, the p-value is always computed from the right tail of the chi-square distribution:

p=P(χν2χobs2)p = P(\chi^2_\nu \geq \chi^2_{obs})

⚠️ Rejecting H0H_0 tells you that an association exists; it says nothing about the strength, direction, or practical importance of that association. Always accompany chi-square results with appropriate effect size measures.

1.8 Degrees of Freedom in Contingency Tables

For a two-way contingency table with rr rows and cc columns, the degrees of freedom are:

ν=(r1)(c1)\nu = (r - 1)(c - 1)

This reflects the number of cells that are free to vary once the marginal totals are fixed. In a 2×22 \times 2 table, ν=1\nu = 1; in a 3×43 \times 4 table, ν=6\nu = 6.


2. What is a Chi-Square Test of Association?

2.1 The Core Question

The chi-square test of association (also called Pearson's chi-square test of independence) is a non-parametric inferential test that determines whether two categorical variables measured on the same set of observations are statistically associated with one another.

The test compares the observed cell frequencies in a contingency table against the expected cell frequencies that would arise if the two variables were completely independent. A large discrepancy between observed and expected frequencies provides evidence against independence.

2.2 The General Logic

The test quantifies how different the observed table is from the table we would expect under perfect independence:

χ2=all cells(OijEij)2Eij\chi^2 = \sum_{\text{all cells}} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Each term in the sum is a standardised squared residual: the squared difference between what was observed and what was expected, scaled by the expected frequency. When OijEijO_{ij} \approx E_{ij} for all cells (as would be expected under independence), χ2\chi^2 will be small. Systematic departures from independence produce large χ2\chi^2.

2.3 When to Use the Chi-Square Test of Association

ConditionRequirement
Research designTwo categorical variables measured on the same observations
Variable scaleBoth variables nominal or ordinal (treated as nominal)
Data formatFrequency counts in a contingency table
Sample sizeAdequate expected frequencies (see Assumptions)
ObservationsIndependent of each other
HypothesisTest of association, not prediction or causation

2.4 Real-World Applications

FieldResearch QuestionVariables
EpidemiologyIs smoking status associated with lung cancer diagnosis?Smoking (yes/no) × Disease (yes/no)
MarketingIs product preference associated with age group?Product (A/B/C) × Age (18–34/35–54/55+)
EducationIs passing rate associated with teaching method?Method (lecture/flipped/hybrid) × Result (pass/fail)
Clinical PsychologyIs treatment type associated with recovery status?Treatment (CBT/medication/combined) × Recovery (yes/no)
GeneticsIs a genotype associated with a disease phenotype?Genotype (AA/Aa/aa) × Disease (affected/unaffected)
SociologyIs gender associated with voting preference?Gender (M/F/NB) × Party (Democrat/Republican/Other)
Public HealthIs vaccination status associated with infection outcome?Vaccinated (yes/no) × Infected (yes/no)
Quality ControlIs production shift associated with defect rate?Shift (morning/afternoon/night) × Defect (yes/no)
SituationCorrect Test
Two categorical variables, independent samplesChi-square test of association
Two categorical variables, paired samplesMcNemar's test
One categorical variable vs. known distributionChi-square goodness-of-fit test
Two categorical variables, small expected frequenciesFisher's exact test
Ordered categorical variablesLinear-by-linear association test
Three or more categorical variablesLog-linear models
One binary outcome, continuous predictorLogistic regression
Two proportions only (2 × 2 table)Two-proportion z-test (equivalent result)

3. The Mathematics Behind the Chi-Square Test of Association

3.1 The Pearson Chi-Square Statistic

Given an r×cr \times c contingency table with observed cell frequencies OijO_{ij}, the Pearson chi-square statistic is:

χ2=i=1rj=1c(OijEij)2Eij\chi^2 = \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Where the expected frequency for cell (i,j)(i,j) under the null hypothesis of independence is:

Eij=Ri×CjNE_{ij} = \frac{R_i \times C_j}{N}

With:

  • Ri=j=1cOijR_i = \sum_{j=1}^c O_{ij} — the ii-th row marginal total
  • Cj=i=1rOijC_j = \sum_{i=1}^r O_{ij} — the jj-th column marginal total
  • N=i=1rj=1cOijN = \sum_{i=1}^r \sum_{j=1}^c O_{ij} — the grand total

Under H0H_0 (independence), χ2\chi^2 asymptotically follows a chi-square distribution with ν=(r1)(c1)\nu = (r-1)(c-1) degrees of freedom.

3.2 Degrees of Freedom

For an r×cr \times c table:

ν=(r1)(c1)\nu = (r-1)(c-1)

Intuition: A contingency table has r×cr \times c cells. Given the rr row totals and cc column totals (which sum to NN), only (r1)(c1)(r-1)(c-1) cells are free to vary — the remaining cells are determined by the marginal constraints. Each constraint consumes one degree of freedom.

Table DimensionsDegrees of Freedom
2×22 \times 21
2×32 \times 32
3×33 \times 34
3×43 \times 46
4×44 \times 49
r×cr \times c(r1)(c1)(r-1)(c-1)

3.3 Computing the p-Value

The p-value is always computed from the right tail of the chi-square distribution:

p=P(χν2χobs2)=1Fχν2(χobs2)p = P(\chi^2_\nu \geq \chi^2_{obs}) = 1 - F_{\chi^2_\nu}(\chi^2_{obs})

Where Fχν2F_{\chi^2_\nu} is the cumulative distribution function (CDF) of the chi-square distribution with ν\nu degrees of freedom. The chi-square test is inherently non-directional — departures from independence in any direction accumulate in the right tail.

3.4 Critical Values

Reject H0H_0 if χobs2χcrit,  α,  ν2\chi^2_{obs} \geq \chi^2_{crit,\; \alpha,\; \nu}.

Common critical values (α=.05\alpha = .05):

ν\nuχcrit2\chi^2_{crit} (α=.05\alpha=.05)χcrit2\chi^2_{crit} (α=.01\alpha=.01)χcrit2\chi^2_{crit} (α=.001\alpha=.001)
13.8416.63510.828
25.9919.21013.816
37.81511.34516.266
49.48813.27718.467
612.59216.81222.458
916.91921.66627.877
1221.02626.21732.909

3.5 Standardised and Adjusted Residuals

Beyond the overall χ2\chi^2 statistic, residuals reveal which specific cells deviate most from independence.

Raw residual: eij=OijEije_{ij} = O_{ij} - E_{ij}

Standardised residual (Pearson residual): rij=OijEijEijr_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}

Note that χ2=i,jrij2\chi^2 = \sum_{i,j} r_{ij}^2.

Adjusted standardised residual (Freeman–Tukey; approximately N(0,1)\mathcal{N}(0,1)): zij=OijEijEij(1Ri/N)(1Cj/N)z_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - R_i/N)(1 - C_j/N)}}

Adjusted standardised residuals zij>1.96|z_{ij}| > 1.96 indicate that cell (i,j)(i,j) deviates significantly from independence at α=.05\alpha = .05. These are essential for locating the source of a significant chi-square result.

3.6 Yates' Continuity Correction (2 × 2 Tables)

For 2×22 \times 2 tables, the chi-square statistic is a continuous approximation to a discrete distribution. Yates' continuity correction improves the approximation:

χYates2=i=12j=12(OijEij0.5)2Eij\chi^2_{Yates} = \sum_{i=1}^{2}\sum_{j=1}^{2} \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}}

Yates' correction makes the test more conservative (reduces Type I error). However, it is controversial — many statisticians consider it overly conservative and recommend using Fisher's exact test for small samples instead. DataStatPro reports both the uncorrected and Yates-corrected chi-square for 2×22 \times 2 tables.

3.7 The Likelihood Ratio Chi-Square (GG-Test)

An alternative to Pearson's χ2\chi^2 is the likelihood ratio statistic GG (also called the GG-test or log-likelihood ratio):

G=2i=1rj=1cOijln ⁣(OijEij)G = 2\sum_{i=1}^{r}\sum_{j=1}^{c} O_{ij} \ln\!\left(\frac{O_{ij}}{E_{ij}}\right)

Under H0H_0, GG also follows a chi-square distribution with ν=(r1)(c1)\nu = (r-1)(c-1) df. The GG-test is preferred in some fields (particularly genetics and log-linear modelling) because it is directly derived from maximum likelihood theory. For moderate to large samples, GG and χ2\chi^2 yield nearly identical results. For small samples, GG can be less accurate.

3.8 Effect Size — Cramér's VV

Cramér's VV is the most widely used effect size measure for the chi-square test of association. It scales the chi-square statistic to range from 0 to 1:

V=χ2N×min(r1,  c1)V = \sqrt{\frac{\chi^2}{N \times \min(r-1,\; c-1)}}

Where min(r1,c1)\min(r-1, c-1) is the smaller of the number of rows minus one and the number of columns minus one. For 2×22 \times 2 tables, V=ϕV = \phi (the phi coefficient). VV is interpretable as the average association strength across all possible pairs of category combinations.

3.9 Effect Size — Phi Coefficient (ϕ\phi) for 2 × 2 Tables

For 2×22 \times 2 tables specifically, the phi coefficient is:

ϕ=χ2N\phi = \sqrt{\frac{\chi^2}{N}}

The phi coefficient is equivalent to the Pearson product-moment correlation between two binary variables and ranges from 1-1 to +1+1 (though its magnitude is the same as VV for 2×22 \times 2 tables). A signed version conveying direction can be computed as:

ϕ=O11O22O12O21R1R2C1C2\phi = \frac{O_{11}O_{22} - O_{12}O_{21}}{\sqrt{R_1 R_2 C_1 C_2}}

The relationship between ϕ\phi, χ2\chi^2, and NN is:

χ2=Nϕ2    ϕ=χ2/N\chi^2 = N\phi^2 \implies \phi = \sqrt{\chi^2/N}

3.10 Statistical Power

Power for the chi-square test is the probability of detecting a true association of given magnitude. Under H1H_1, the chi-square statistic follows a non-central chi-square distribution with non-centrality parameter:

λ=N×i=1rj=1c(PijPiPj)2PiPj\lambda = N \times \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(P_{ij} - P_{i\cdot}P_{\cdot j})^2}{P_{i\cdot}P_{\cdot j}}

Where PijP_{ij} are the true cell probabilities and PiP_{i\cdot}, PjP_{\cdot j} are the true marginal probabilities. In terms of VV:

λ=N×V2×min(r1,  c1)\lambda = N \times V^2 \times \min(r-1,\; c-1)

Required sample size for desired power 1β1 - \beta at significance level α\alpha:

Nχcrit,  α,  ν2+z1β2min(r1,  c1)×V2N \approx \frac{\chi^2_{crit,\;\alpha,\;\nu} + z_{1-\beta}^2}{\min(r-1,\;c-1) \times V^2}

Required NN for a 2×22 \times 2 table, α=.05\alpha = .05, two conventional effect sizes:

Cramér's VVPower = 0.80Power = 0.90Power = 0.95
0.10 (small)78510461294
0.20197263325
0.30 (medium)88117145
0.50 (large)324253
0.70172228

4. Assumptions of the Chi-Square Test of Association

4.1 Independence of Observations

Each observation must contribute to exactly one cell of the contingency table. This means each participant or unit is counted once and only once. Independence is a design assumption, not testable from the data.

Common violations:

  • Multiple responses from the same participant counted as separate observations.
  • Paired or matched data analysed as independent (use McNemar's test instead).
  • Clustered sampling where observations within clusters are correlated.

When violated: Use McNemar's test (matched pairs), Cochran's Q test (repeated measures), or mixed-effects models for clustered categorical data.

4.2 Adequate Expected Frequencies

The chi-square approximation is only valid when expected frequencies are sufficiently large. The most widely cited guidelines are:

GuidelineRule
Cochran (1954)All Eij1E_{ij} \geq 1; no more than 20% of cells with Eij<5E_{ij} < 5
Yates (1934)All Eij5E_{ij} \geq 5 (strict rule)
Agresti (2007)Most Eij5E_{ij} \geq 5; use Fisher's exact for 2×22 \times 2 with any Eij<5E_{ij} < 5

When expected frequencies are inadequate:

  • For 2 × 2 tables: Use Fisher's exact test (computes exact p-values without the chi-square approximation).
  • For larger tables: Combine categories (where theoretically justifiable), collect more data, or use the exact multinomial test.

⚠️ Adequate expected frequencies are about EijE_{ij}, not OijO_{ij}. A cell can have a large observed count but a small expected count — always check EijE_{ij} directly.

4.3 Fixed Marginal Totals (Study Design Consideration)

The chi-square test technically assumes that the row totals are fixed by the study design (sampling from pre-specified groups). If both margins are random, the test remains valid asymptotically, but Fisher's exact test (which conditions on both margins) is more appropriate for small samples.

In practice:

  • Fixed row margins: Prospective study (e.g., 50 smokers and 50 non-smokers recruited in advance).
  • Fixed grand total only: Cross-sectional survey where only NN is fixed.
  • Both designs yield valid chi-square tests for large samples.

4.4 Nominal or Ordinal Scale of Measurement

Both variables must be categorical. The chi-square test makes no use of any ordering information in ordinal variables (all ordering is discarded). If both variables are ordinal, more powerful tests exploiting the ordering (linear-by-linear association, Spearman's ρ\rho) should be considered alongside or instead of chi-square.

4.5 Sufficiently Large Sample Size

In addition to adequate cell expectations, the overall sample size NN must be large enough for the asymptotic chi-square approximation to hold. A common guideline is N20N \geq 20 for a 2×22 \times 2 table; larger tables require proportionally larger NN.

When violated: Use Fisher's exact test (2×22 \times 2) or the exact multinomial test (larger tables).

4.6 No Structural Zeros

A structural zero is a cell that is logically impossible (e.g., "males who are pregnant"). Structural zeros violate the independence model and require special treatment (structural equation models or quasi-independence models).

4.7 Assumption Summary

AssumptionHow to CheckRemedy if Violated
Independence of observationsStudy design reviewMcNemar's test; multilevel models
Adequate Eij5E_{ij} \geq 5Inspect expected frequency tableFisher's exact test; collapse categories
Sufficient overall NNCount total observationsFisher's exact; collect more data
Categorical variablesMeasurement reviewUse appropriate scale-specific tests
No structural zerosTheoretical reviewQuasi-independence models

5. Variants of the Chi-Square Test of Association

5.1 Pearson Chi-Square Test of Independence (Standard)

The classic form: compare observed to expected frequencies in a two-way contingency table using the Pearson statistic χ2=(OE)2/E\chi^2 = \sum (O-E)^2/E.

5.2 Fisher's Exact Test

When expected frequencies are small (especially in 2×22 \times 2 tables), Fisher's exact test computes the exact probability of observing the data or a more extreme table, given the observed marginal totals:

p=(R1O11)(R2O21)(NC1)=R1!R2!C1!C2!N!O11!O12!O21!O22!p = \frac{\binom{R_1}{O_{11}}\binom{R_2}{O_{21}}}{\binom{N}{C_1}} = \frac{R_1!\,R_2!\,C_1!\,C_2!}{N!\,O_{11}!\,O_{12}!\,O_{21}!\,O_{22}!}

The p-value is the sum of hypergeometric probabilities for all tables as extreme as or more extreme than observed. Fisher's exact test is always valid (not an approximation) but extends with difficulty to larger tables (requires exact conditional multinomial computation).

5.3 Chi-Square Goodness-of-Fit Test

Although not a test of association, this closely related test assesses whether the observed distribution of a single categorical variable matches a theoretically specified distribution {p0,1,p0,2,,p0,k}\{p_{0,1}, p_{0,2}, \ldots, p_{0,k}\}:

χ2=j=1k(OjEj)2Ej,Ej=N×p0,j\chi^2 = \sum_{j=1}^k \frac{(O_j - E_j)^2}{E_j}, \quad E_j = N \times p_{0,j}

Degrees of freedom: ν=k1\nu = k - 1 (number of categories minus one).

5.4 McNemar's Test (Paired Nominal Data)

For paired or matched binary categorical data (e.g., before/after designs), McNemar's test is the appropriate alternative to chi-square. It focuses on the discordant pairs:

χMcNemar2=(bc)2b+c\chi^2_{McNemar} = \frac{(b - c)^2}{b + c}

Where bb and cc are the off-diagonal cells of the 2×22 \times 2 matched-pairs table. With continuity correction: (bc1)2/(b+c)(|b-c| - 1)^2 / (b+c).

5.5 Linear-by-Linear Association Test

When both variables are ordinal, the linear-by-linear association test assigns integer scores to the ordered categories and tests for a trend:

χlinear2=(N1)×rS2\chi^2_{linear} = (N-1) \times r^2_S

Where rSr_S is the Spearman correlation between the two ordinal variables. This test has 1 degree of freedom regardless of the table size and is more powerful than the overall chi-square when the true association is monotone.

5.6 Cochran-Mantel-Haenszel Test (Stratified Tables)

When the relationship between two binary variables must be assessed across multiple strata (e.g., the same 2 × 2 table measured in several hospitals), the Mantel- Haenszel test combines evidence across strata:

χMH2=(k(O11kE11k))2kVar(O11k)\chi^2_{MH} = \frac{\left(\sum_k (O_{11k} - E_{11k})\right)^2}{\sum_k \text{Var}(O_{11k})}

This controls for the stratifying variable and estimates a common odds ratio across strata, assuming the association is homogeneous.

5.7 Bayesian Chi-Square Analysis

The Bayesian approach to contingency table analysis estimates a Bayes Factor comparing the association model (H1H_1) to the independence model (H0H_0). Under a symmetric Dirichlet prior, the Bayes Factor can be approximated via the Bayesian Information Criterion (BIC):

BF10exp ⁣(χ2νlnN2)BF_{10} \approx \exp\!\left(\frac{\chi^2 - \nu \ln N}{2}\right)

BF10>3BF_{10} > 3 indicates moderate evidence for an association; BF10<1/3BF_{10} < 1/3 indicates moderate evidence for independence.


6. Using the Chi-Square Test of Association Calculator Component

The Chi-Square Test of Association Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, and reporting tests of association in contingency tables.

Step-by-Step Guide

Step 1 — Select the Test

Navigate to Statistical Tests → Chi-Square Tests → Chi-Square Test of Association.

Step 2 — Input Method

Choose how to provide data:

  • Raw data: Upload or paste two categorical variable columns. DataStatPro automatically constructs the contingency table, computing row totals, column totals, and the grand total.
  • Contingency table: Enter the observed frequency counts directly into the interactive table grid. Specify row and column labels. Add or remove rows and columns using the ++ and - controls.
  • Summary proportions: Enter proportions and a total NN to reconstruct the table.

Step 3 — Define the Table Structure

  • Specify the number of rows rr and columns cc.
  • Label the row variable, column variable, and each category.
  • DataStatPro automatically computes all marginal totals and expected frequencies EijE_{ij}.

Step 4 — Select the Alternative Test (if applicable)

DataStatPro automatically detects violated assumptions and suggests alternatives:

  • If any Eij<5E_{ij} < 5 in a 2×22 \times 2 table → Fisher's exact test is recommended.
  • If both variables are ordinal → Linear-by-linear association test is offered.
  • If data are paired → McNemar's test is prompted.

Step 5 — Set Significance Level

Default: α=.05\alpha = .05. DataStatPro simultaneously reports results at α=.01\alpha = .01 and α=.001\alpha = .001.

Step 6 — Select Display Options

  • ✅ Observed and expected frequency tables with percentage breakdowns.
  • ✅ Pearson χ2\chi^2, dfdf, exact p-value, and decision.
  • ✅ Yates' continuity-corrected χ2\chi^2 (for 2×22 \times 2 tables).
  • ✅ Fisher's exact test p-value (for 2×22 \times 2 tables).
  • ✅ Likelihood ratio GG statistic.
  • ✅ Cramér's VV (all table sizes) and phi ϕ\phi (2×22 \times 2 only) with 95% CI.
  • ✅ Standardised and adjusted standardised residuals with significance flags.
  • ✅ Contribution of each cell to χ2\chi^2 (heat-mapped).
  • ✅ Mosaic plot and clustered bar chart visualisations.
  • ✅ Chi-square distribution diagram with observed statistic and critical region.
  • ✅ Power analysis: current power and required NN for 80%, 90%, 95% power.
  • ✅ Bayesian analysis (Bayes Factor BF10BF_{10}).
  • ✅ APA 7th edition results paragraph (auto-generated).

Step 7 — Run the Analysis

Click "Run Chi-Square Test of Association". DataStatPro will:

  1. Compute all EijE_{ij}, verify assumption of adequate expected frequencies.
  2. Compute χ2\chi^2, GG, dfdf, and exact p-value.
  3. Apply Yates' correction and compute Fisher's exact p-value (for 2×22 \times 2).
  4. Compute all effect size measures (VV, ϕ\phi) with confidence intervals.
  5. Compute standardised and adjusted residuals for all cells.
  6. Generate all selected visualisations.
  7. Estimate post-hoc power and produce sample size recommendations.
  8. Output an APA-compliant results paragraph.

7. Step-by-Step Procedure

7.1 Full Manual Procedure

Step 1 — State the Hypotheses

H0H_0: [Variable XX] and [Variable YY] are statistically independent.

H1H_1: [Variable XX] and [Variable YY] are not statistically independent (an association exists).

Step 2 — Construct the Contingency Table

Tally observations into an r×cr \times c table. Record:

  • Each cell's observed frequency OijO_{ij}.
  • Row totals Ri=jOijR_i = \sum_j O_{ij}.
  • Column totals Cj=iOijC_j = \sum_i O_{ij}.
  • Grand total N=iRi=jCjN = \sum_i R_i = \sum_j C_j.

Step 3 — Check Assumptions

  • Verify independence of observations (design review).
  • Compute all expected frequencies: Eij=RiCj/NE_{ij} = R_i C_j / N.
  • Confirm no more than 20% of cells have Eij<5E_{ij} < 5 and all Eij1E_{ij} \geq 1.
  • If assumptions are violated, switch to Fisher's exact test or collapse categories.

Step 4 — Compute Expected Frequencies

Eij=Ri×CjNfor all i=1,,r and j=1,,cE_{ij} = \frac{R_i \times C_j}{N} \quad \text{for all } i = 1,\ldots,r \text{ and } j = 1,\ldots,c

Verify: jEij=Ri\sum_j E_{ij} = R_i and iEij=Cj\sum_i E_{ij} = C_j (marginal totals are preserved).

Step 5 — Compute the Chi-Square Statistic

χ2=i=1rj=1c(OijEij)2Eij\chi^2 = \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Step 6 — Determine Degrees of Freedom

ν=(r1)(c1)\nu = (r-1)(c-1)

Step 7 — Compute the p-Value

p=P(χν2χobs2)=1Fχν2(χobs2)p = P(\chi^2_\nu \geq \chi^2_{obs}) = 1 - F_{\chi^2_\nu}(\chi^2_{obs})

Reject H0H_0 if pαp \leq \alpha.

Step 8 — Compute Effect Size

Cramér's VV (all tables):

V=χ2N×min(r1,  c1)V = \sqrt{\frac{\chi^2}{N \times \min(r-1,\; c-1)}}

Phi coefficient (2×22 \times 2 tables only):

ϕ=χ2N\phi = \sqrt{\frac{\chi^2}{N}}

Step 9 — Compute Standardised Residuals

zij=OijEijEij(1Ri/N)(1Cj/N)z_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}(1 - R_i/N)(1 - C_j/N)}}

Flag cells where zij>1.96|z_{ij}| > 1.96 (significant at α=.05\alpha = .05).

Step 10 — Interpret and Report

Use the APA reporting template in Section 15. Always report χ2\chi^2, ν\nu, pp, NN, Cramér's VV (or ϕ\phi) with 95% CI, and a table of observed frequencies with expected frequencies (or at minimum percentage breakdowns).


8. Interpreting the Output

8.1 The Chi-Square Statistic

χobs2\chi^2_{obs} Relative to χcrit2\chi^2_{crit}Interpretation
χobs2<χcrit2\chi^2_{obs} < \chi^2_{crit}Fail to reject H0H_0; no significant association at α\alpha
χobs2χcrit2\chi^2_{obs} \geq \chi^2_{crit}Reject H0H_0; significant association detected at α\alpha
Large χ2\chi^2 with large NNCan be significant even for very weak associations
Small χ2\chi^2 with small NNMay be non-significant even for large associations (low power)

8.2 The p-Value

p-ValueConventional Interpretation
p>.10p > .10No evidence against H0H_0 (independence)
.05<p.10.05 < p \leq .10Marginal evidence of association (trend)
.01<p.05.01 < p \leq .05Significant association at α=.05\alpha = .05
.001<p.01.001 < p \leq .01Significant association at α=.01\alpha = .01
p.001p \leq .001Significant association at α=.001\alpha = .001

⚠️ A significant p-value only indicates that some departure from independence exists. It does not indicate the strength, direction, or practical importance of the association. Always examine effect sizes, residuals, and percentage breakdowns to understand the nature of the association.

8.3 Expected and Observed Frequency Tables

Comparing observed and expected frequencies reveals the pattern of association:

Cell PatternInterpretation
OijEijO_{ij} \gg E_{ij} (large positive residual)This combination occurs more often than independence predicts
OijEijO_{ij} \ll E_{ij} (large negative residual)This combination occurs less often than independence predicts
OijEijO_{ij} \approx E_{ij} for all cellsData are consistent with independence
One or two cells drive χ2\chi^2Association is localised; examine residuals

8.4 Cramér's VV — Magnitude Interpretation

Cohen's (1988) benchmarks for Cramér's VV:

These benchmarks depend on the minimum dimension k=min(r,c)k = \min(r,c):

VVk=2k = 2 (incl. 2×22 \times 2)k=3k = 3k=4k = 4k=5k = 5
Small0.100.100.070.070.060.060.050.05
Medium0.300.300.210.210.170.170.150.15
Large0.500.500.350.350.290.290.250.25

For the 2×22 \times 2 case (phi = Cramér's VV):

ϕ\vert\phi\vert or VVVerbal Label
<0.10< 0.10Negligible
0.100.190.10 - 0.19Small
0.200.290.20 - 0.29Small to medium
0.300.490.30 - 0.49Medium
0.500.690.50 - 0.69Large
0.70\geq 0.70Very large

⚠️ Cohen's benchmarks were developed for the behavioural sciences and are conventions of last resort, not universal standards. In epidemiology, an odds ratio of 1.5 (corresponding to a small ϕ\phi) may be highly practically significant; in genetics, very small VV values can be of great theoretical importance. Always contextualise effect sizes within your specific domain.

8.5 Standardised Residuals: Locating the Source of Association

After a significant global chi-square, examine adjusted standardised residuals zijz_{ij}:

zij\vert z_{ij} \vertInterpretation
<1.96< 1.96Cell does not significantly deviate from independence (α=.05\alpha = .05)
1.962.581.96 - 2.58Significant deviation at α=.05\alpha = .05
2.583.292.58 - 3.29Significant deviation at α=.01\alpha = .01
>3.29> 3.29Significant deviation at α=.001\alpha = .001

Positive residuals: the combination occurs more than expected under independence. Negative residuals: the combination occurs less than expected.

⚠️ When examining residuals across multiple cells, apply a multiple comparisons correction (e.g., Bonferroni: compare zij|z_{ij}| to zα/(2rc)z_{\alpha/(2rc)}) to control the familywise error rate.


9. Effect Sizes for the Chi-Square Test of Association

9.1 Phi Coefficient (ϕ\phi) — for 2 × 2 Tables

ϕ=χ2N\phi = \sqrt{\frac{\chi^2}{N}}

Or signed (to indicate direction of association):

ϕ=O11O22O12O21R1R2C1C2\phi = \frac{O_{11}O_{22} - O_{12}O_{21}}{\sqrt{R_1 R_2 C_1 C_2}}

Interpretation: Equivalent to the Pearson correlation between two binary variables. Ranges from 1-1 (perfect negative association) to +1+1 (perfect positive association); 00 = independence.

9.2 Cramér's VV — for All Table Sizes

V=χ2N×min(r1,  c1)V = \sqrt{\frac{\chi^2}{N \times \min(r-1,\; c-1)}}

Interpretation: Average association strength rescaled to [0,1][0, 1]. V=0V = 0 indicates independence; V=1V = 1 indicates perfect association (each row category uniquely determines the column category). For 2×22 \times 2 tables, V=ϕV = |\phi|.

9.3 Tschuprow's TT

An alternative to Cramér's VV for non-square tables:

T=χ2N×(r1)(c1)T = \sqrt{\frac{\chi^2}{N \times \sqrt{(r-1)(c-1)}}}

TT penalises tables with very unequal dimensions more heavily than VV. For square tables (r=cr = c), T=VT = V. TT achieves its maximum of 1 only for square tables; for non-square tables, the maximum is less than 1.

9.4 Odds Ratio (OROR) — for 2 × 2 Tables

For 2×22 \times 2 tables with binary exposure (XX) and binary outcome (YY):

OR=O11×O22O12×O21OR = \frac{O_{11} \times O_{22}}{O_{12} \times O_{21}}

Interpretation: The ratio of the odds of Y=1Y = 1 given X=1X = 1 to the odds of Y=1Y = 1 given X=0X = 0. The odds ratio is the preferred effect size in clinical and epidemiological research.

  • OR=1OR = 1: No association.
  • OR>1OR > 1: Exposure is positively associated with the outcome.
  • OR<1OR < 1: Exposure is negatively associated with the outcome.

Approximate 95% CI for ln(OR)\ln(OR):

ln(OR)±1.96×1O11+1O12+1O21+1O22\ln(OR) \pm 1.96 \times \sqrt{\frac{1}{O_{11}} + \frac{1}{O_{12}} + \frac{1}{O_{21}} + \frac{1}{O_{22}}}

Exponentiate the bounds to obtain the CI on the OR scale.

9.5 Relative Risk (RRRR) — for Prospective 2 × 2 Studies

When row margins are fixed by design (prospective/experimental study):

RR=O11/R1O21/R2RR = \frac{O_{11}/R_1}{O_{21}/R_2}

Interpretation: The ratio of the probability of the outcome in group 1 to the probability in group 2. Ranges from 0 to \infty; RR=1RR = 1 indicates no association.

⚠️ Relative risk is only interpretable when row margins are fixed (i.e., group sizes are pre-specified). For cross-sectional or case-control designs, use the odds ratio.

9.6 Effect Size Summary Table

Effect SizeFormulaRangeInterpretation
Phi (ϕ\phi)χ2/N\sqrt{\chi^2/N}[0,1][0, 1] ([1,1][-1,1] signed)Correlation for binary variables; 2×22 \times 2 only
Cramér's VVχ2/(Nmin(r1,c1))\sqrt{\chi^2/(N\min(r-1,c-1))}[0,1][0, 1]Association strength; all table sizes
Tschuprow's TTχ2/(N(r1)(c1))\sqrt{\chi^2/(N\sqrt{(r-1)(c-1)})}[0,1][0, 1]Conservative alternative to VV
Odds ratio (OROR)O11O22/(O12O21)O_{11}O_{22}/(O_{12}O_{21})(0,)(0, \infty)Clinical/epi effect; 2×22 \times 2 only
Relative risk (RRRR)(O11/R1)/(O21/R2)(O_{11}/R_1)/(O_{21}/R_2)(0,)(0, \infty)Prospective studies; 2×22 \times 2 only

10. Confidence Intervals

10.1 CI for Cramér's VV

An asymptotic 95% CI for VV uses the non-central chi-square distribution. The non-centrality parameter λ^=χ2ν\hat{\lambda} = \chi^2 - \nu (adjusted for bias). Finding λL\lambda_L and λU\lambda_U from the non-central chi-square distribution:

VL=λLN×min(r1,  c1),VU=λUN×min(r1,  c1)V_L = \sqrt{\frac{\lambda_L}{N \times \min(r-1,\; c-1)}}, \qquad V_U = \sqrt{\frac{\lambda_U}{N \times \min(r-1,\; c-1)}}

DataStatPro computes exact CIs numerically. An approximate 95% CI uses:

SEVV2χ2×ν+22NSE_V \approx \sqrt{\frac{V^2}{\chi^2} \times \frac{\nu + 2}{2N}}

V±1.96×SEVV \pm 1.96 \times SE_V (adequate for N>50N > 50)

10.2 CI for the Odds Ratio

Exact (Cornfield) 95% CI for OROR:

Computed iteratively by finding the interval on the OROR scale within which the conditional distribution of O11O_{11} covers 95% probability. DataStatPro computes this exactly.

Approximate (Woolf's) 95% CI:

OR×exp ⁣(±1.96×1O11+1O12+1O21+1O22)OR \times \exp\!\left(\pm 1.96 \times \sqrt{\frac{1}{O_{11}} + \frac{1}{O_{12}} + \frac{1}{O_{21}} + \frac{1}{O_{22}}}\right)

10.3 CI for Proportions and Risk Difference

For 2×22 \times 2 tables, the risk difference (absolute risk reduction) and its CI directly quantify the practical importance of the association:

RD=O11R1O21R2RD = \frac{O_{11}}{R_1} - \frac{O_{21}}{R_2}

95% CI for RDRD (Newcombe's method recommended over Wald):

RD±1.96×p1(1p1)R1+p2(1p2)R2RD \pm 1.96 \times \sqrt{\frac{p_1(1-p_1)}{R_1} + \frac{p_2(1-p_2)}{R_2}}

Where p1=O11/R1p_1 = O_{11}/R_1 and p2=O21/R2p_2 = O_{21}/R_2.

10.4 Equivalence and Confidence Intervals

If the goal is to establish that the association is negligibly small (near- independence), use an equivalence testing approach:

  1. Specify a maximum tolerable effect size VmaxV_{max} (e.g., Vmax=0.10V_{max} = 0.10).
  2. Conclude practical equivalence if the entire 95% CI for VV falls below VmaxV_{max}.
  3. If the upper CI bound exceeds VmaxV_{max}, equivalence cannot be concluded.

11. Advanced Topics

11.1 Multiple Chi-Square Tests on the Same Dataset

Testing associations between multiple pairs of categorical variables in the same dataset inflates the familywise error rate:

FWER=1(1α)kFWER = 1 - (1-\alpha)^k

For k=10k = 10 tests: FWER=1(0.95)10=.401FWER = 1 - (0.95)^{10} = .401.

Correction strategies:

  • Bonferroni: α=α/k\alpha' = \alpha/k. Simple but conservative.
  • Holm-Bonferroni: Sequential adjustment — less conservative than Bonferroni.
  • Benjamini-Hochberg: Controls the False Discovery Rate (FDR) — appropriate for large-scale exploratory analyses (e.g., genome-wide association studies).

11.2 Partitioning Chi-Square in Larger Tables

For tables larger than 2×22 \times 2, a significant overall χ2\chi^2 tells you that some association exists somewhere in the table, but not where. Beyond residual analysis, the overall χ2\chi^2 can be partitioned into independent 2×22 \times 2 subtables (using Helmert or polynomial contrast coding), each with 1 df, subject to the constraint that the partition df sum to the total df.

This allows focused hypothesis tests about specific row or column comparisons (e.g., "Do groups A and B differ from group C?" tested separately from "Do groups A and B differ from each other?").

11.3 Measures of Agreement vs. Measures of Association

For r×rr \times r square tables where both variables classify the same objects into the same categories (inter-rater agreement), use Cohen's kappa (κ\kappa) rather than χ2\chi^2 or VV:

κ=PoPe1Pe\kappa = \frac{P_o - P_e}{1 - P_e}

Where Po=iOii/NP_o = \sum_i O_{ii}/N is the observed agreement and Pe=i(RiCi)/N2P_e = \sum_i (R_i C_i)/N^2 is the expected agreement by chance. κ=1\kappa = 1 → perfect agreement; κ=0\kappa = 0 → agreement at chance level.

11.4 Log-Linear Models for Multi-Way Tables

For three or more categorical variables, pairwise chi-square tests are inadequate — they cannot distinguish direct associations from indirect ones mediated by a third variable. Log-linear models model the joint distribution of all variables and allow testing for higher-order interactions:

ln(Eijk)=μ+λiA+λjB+λkC+λijAB+λikAC+λjkBC+λijkABC\ln(E_{ijk}) = \mu + \lambda_i^A + \lambda_j^B + \lambda_k^C + \lambda_{ij}^{AB} + \lambda_{ik}^{AC} + \lambda_{jk}^{BC} + \lambda_{ijk}^{ABC}

Model selection (backward elimination or BIC-based) identifies the most parsimonious model that fits the data, isolating which associations are genuine versus artefactual.

11.5 Simpson's Paradox

Simpson's Paradox occurs when an association observed in the overall table reverses or disappears when the data are stratified by a third variable. This is one of the most important reasons to never rely on the marginal chi-square test alone when potential confounders exist.

Classic example: Drug A appears superior to Drug B overall, but within each hospital stratum, Drug B is superior — the reversal is caused by the confounding of hospital quality with drug assignment.

Detection: Use the Cochran-Mantel-Haenszel procedure to compute stratum-adjusted estimates and compare to the unadjusted association.

11.6 The Relationship Between Chi-Square and Other Tests

The chi-square test of association is algebraically equivalent to several other tests in special cases:

Special CaseEquivalent Test
2×22 \times 2 tableTwo-proportion z-test: z2=χ2z^2 = \chi^2
2×22 \times 2, small NNFisher's exact test
Both variables ordinalSpearman correlation significance test
One binary, one continuousPoint-biserial correlation; independent t-test
1×k1 \times k tableChi-square goodness-of-fit test

11.7 Bayesian Chi-Square Analysis

The Bayesian approach quantifies evidence rather than making a binary decision. The BIC-approximated Bayes Factor is:

BF10exp ⁣(χ2νlnN2)BF_{10} \approx \exp\!\left(\frac{\chi^2 - \nu \ln N}{2}\right)

Interpreting BF10BF_{10}:

BF10BF_{10}Evidence for H1H_1 (Association) over H0H_0 (Independence)
>100> 100Extreme
3010030 - 100Very strong
103010 - 30Strong
3103 - 10Moderate
131 - 3Anecdotal
11No evidence
1/311/3 - 1Anecdotal evidence for H0H_0
<1/3< 1/3Moderate evidence for H0H_0 (independence)

Key advantage: BF10<1/3BF_{10} < 1/3 constitutes positive evidence for independence — something p-values cannot provide.


12. Worked Examples

Example 1: Smoking Status and Lung Disease (2 × 2)

A public health researcher surveys N=200N = 200 adults, recording smoking status (smoker/non-smoker) and presence of chronic lung disease (yes/no).

Contingency Table (Observed):

Lung Disease: YesLung Disease: NoRow Total
Smoker4060100
Non-Smoker2080100
Column Total60140200

Step 1 — Hypotheses:

H0H_0: Smoking status and lung disease are independent.

H1H_1: Smoking status and lung disease are associated.

Step 2 — Expected Frequencies:

E11=(100×60)/200=30.0E_{11} = (100 \times 60)/200 = 30.0

E12=(100×140)/200=70.0E_{12} = (100 \times 140)/200 = 70.0

E21=(100×60)/200=30.0E_{21} = (100 \times 60)/200 = 30.0

E22=(100×140)/200=70.0E_{22} = (100 \times 140)/200 = 70.0

All Eij=30E_{ij} = 30 or 7070 — assumption of adequate expected frequencies is satisfied.

Step 3 — Chi-Square Statistic:

χ2=(4030)230+(6070)270+(2030)230+(8070)270\chi^2 = \frac{(40-30)^2}{30} + \frac{(60-70)^2}{70} + \frac{(20-30)^2}{30} + \frac{(80-70)^2}{70}

=10030+10070+10030+10070= \frac{100}{30} + \frac{100}{70} + \frac{100}{30} + \frac{100}{70}

=3.333+1.429+3.333+1.429=9.524= 3.333 + 1.429 + 3.333 + 1.429 = 9.524

Step 4 — Degrees of Freedom and p-Value:

ν=(21)(21)=1\nu = (2-1)(2-1) = 1

p=P(χ129.524)=.002p = P(\chi^2_1 \geq 9.524) = .002

Step 5 — Effect Sizes:

ϕ=9.524/200=0.0476=0.218\phi = \sqrt{9.524/200} = \sqrt{0.0476} = 0.218

OR=(40×80)/(60×20)=3200/1200=2.667OR = (40 \times 80)/(60 \times 20) = 3200/1200 = 2.667

95% CI for OROR: exp ⁣(ln(2.667)±1.961/40+1/60+1/20+1/80)\exp\!\left(\ln(2.667) \pm 1.96\sqrt{1/40+1/60+1/20+1/80}\right)

=exp(0.981±1.96×0.299)=exp(0.981±0.586)=[1.378,5.163]= \exp(0.981 \pm 1.96 \times 0.299) = \exp(0.981 \pm 0.586) = [1.378, 5.163]

Step 6 — Adjusted Standardised Residuals:

z11=(4030)/30(1100/200)(160/200)=10/30×0.5×0.7=10/10.5=3.086z_{11} = (40-30)/\sqrt{30(1-100/200)(1-60/200)} = 10/\sqrt{30 \times 0.5 \times 0.7} = 10/\sqrt{10.5} = 3.086

By symmetry: z22=3.086z_{22} = 3.086; z12=3.086z_{12} = -3.086; z21=3.086z_{21} = -3.086.

All cells show significant deviations (z=3.086>2.58|z| = 3.086 > 2.58, significant at α=.01\alpha = .01).

Summary:

StatisticValueInterpretation
χ2(1)\chi^2(1)9.5249.524
pp (two-tailed).002.002Highly significant
NN200200
ϕ\phi0.2180.218Small-to-medium association
OROR2.6672.667Smokers 2.67× more likely to have lung disease
95% CI for OROR[1.378,5.163][1.378, 5.163]Excludes 1; confirms significant association

APA write-up: "A chi-square test of association revealed a significant association between smoking status and lung disease, χ2(1,N=200)=9.52\chi^2(1, N = 200) = 9.52, p=.002p = .002, ϕ=0.22\phi = 0.22. The odds of lung disease were 2.67 times higher for smokers than non-smokers (95% CI: [1.38, 5.16]). Adjusted standardised residuals indicated that smokers showed more lung disease (z=3.09z = 3.09) and non-smokers showed less lung disease (z=3.09z = -3.09) than expected under independence."


Example 2: Teaching Method and Pass/Fail Outcome (3 × 2)

An education researcher compares three teaching methods (lecture, flipped classroom, online) on student pass/fail outcomes for N=210N = 210 students (70 per method).

Contingency Table (Observed):

PassFailRow Total
Lecture452570
Flipped551570
Online383270
Col. Total13872210

Step 1 — Expected Frequencies:

Eij=RiCj/NE_{ij} = R_i C_j / N:

ELec,Pass=70×138/210=46.0E_{Lec, Pass} = 70 \times 138/210 = 46.0

ELec,Fail=70×72/210=24.0E_{Lec, Fail} = 70 \times 72/210 = 24.0

EFlip,Pass=70×138/210=46.0E_{Flip, Pass} = 70 \times 138/210 = 46.0

EFlip,Fail=70×72/210=24.0E_{Flip, Fail} = 70 \times 72/210 = 24.0

EOnline,Pass=70×138/210=46.0E_{Online, Pass} = 70 \times 138/210 = 46.0

EOnline,Fail=70×72/210=24.0E_{Online, Fail} = 70 \times 72/210 = 24.0

All Eij24E_{ij} \geq 24 — assumption satisfied.

Step 2 — Chi-Square Statistic (ν=(31)(21)=2\nu = (3-1)(2-1) = 2):

χ2=(4546)246+(2524)224+(5546)246+(1524)224+(3846)246+(3224)224\chi^2 = \frac{(45-46)^2}{46} + \frac{(25-24)^2}{24} + \frac{(55-46)^2}{46} + \frac{(15-24)^2}{24} + \frac{(38-46)^2}{46} + \frac{(32-24)^2}{24}

=0.022+0.042+1.761+3.375+1.391+2.667=9.257= 0.022 + 0.042 + 1.761 + 3.375 + 1.391 + 2.667 = 9.257

Step 3 — p-Value:

p=P(χ229.257)=.010p = P(\chi^2_2 \geq 9.257) = .010

Step 4 — Cramér's VV:

V=9.257/(210×min(2,1))=9.257/210=0.0441=0.210V = \sqrt{9.257/(210 \times \min(2,1))} = \sqrt{9.257/210} = \sqrt{0.0441} = 0.210

Step 5 — Adjusted Standardised Residuals:

Using zij=(OijEij)/Eij(1Ri/N)(1Cj/N)z_{ij} = (O_{ij}-E_{ij})/\sqrt{E_{ij}(1-R_i/N)(1-C_j/N)}:

| Cell | zijz_{ij} | Significant (z>1.96|z| > 1.96)? | | :--- | :------- | :-------------------------- | | Lecture/Pass | 0.20-0.20 | No | | Lecture/Fail | 0.280.28 | No | | Flipped/Pass | +2.77+2.77 | Yes (α=.01\alpha = .01) | | Flipped/Fail | 3.82-3.82 | Yes (α=.001\alpha = .001) | | Online/Pass | 2.57-2.57 | Yes (α=.05\alpha = .05) | | Online/Fail | +3.54+3.54 | Yes (α=.001\alpha = .001) |

Interpretation: Teaching method is significantly associated with pass/fail outcome, χ2(2)=9.26\chi^2(2) = 9.26, p=.010p = .010, V=0.21V = 0.21. The flipped classroom exceeds the expected pass rate (more passes, fewer fails than expected), while online learning underperforms (fewer passes, more fails than expected). The lecture method does not significantly deviate from independence.

APA write-up: "A chi-square test of association indicated a significant association between teaching method and pass/fail outcome, χ2(2,N=210)=9.26\chi^2(2, N = 210) = 9.26, p=.010p = .010, V=0.21V = 0.21. Adjusted standardised residuals revealed that the flipped classroom had significantly more passes and fewer fails than expected (z=2.77z = 2.77 and z=3.82z = -3.82, respectively), while the online method had significantly fewer passes and more fails than expected (z=2.57z = -2.57 and z=3.54z = 3.54, respectively). The lecture method did not deviate significantly from independence."


Example 3: Fisher's Exact Test — Rare Side Effect

A clinical trial investigates whether a new drug is associated with a rare side effect. Only N=30N = 30 participants are available.

Contingency Table (Observed):

Side Effect: YesSide Effect: NoRow Total
Treatment6915
Placebo21315
Column Total82230

Step 1 — Check Assumptions:

ETreatment,Yes=(15×8)/30=4.0E_{Treatment, Yes} = (15 \times 8)/30 = 4.0

EPlacebo,Yes=(15×8)/30=4.0E_{Placebo, Yes} = (15 \times 8)/30 = 4.0

Both cells involving side effects have Eij=4.0<5E_{ij} = 4.0 < 5Fisher's exact test is required rather than the standard chi-square approximation.

Step 2 — Fisher's Exact Test p-Value:

The one-tailed p-value (testing H1H_1: treatment increases side effects) is computed as the sum of hypergeometric probabilities for tables as extreme as or more extreme than observed:

ponetailed=P(O116R1=15,R2=15,C1=8,N=30)p_{one-tailed} = P(O_{11} \geq 6 \mid R_1 = 15, R_2 = 15, C_1 = 8, N = 30)

=P(O11=6)+P(O11=7)+P(O11=8)= P(O_{11} = 6) + P(O_{11} = 7) + P(O_{11} = 8)

=(156)(152)(308)+(157)(151)(308)+(158)(150)(308)= \frac{\binom{15}{6}\binom{15}{2}}{\binom{30}{8}} + \frac{\binom{15}{7}\binom{15}{1}}{\binom{30}{8}} + \frac{\binom{15}{8}\binom{15}{0}}{\binom{30}{8}}

=0.1029+0.0228+0.0019=0.1276= 0.1029 + 0.0228 + 0.0019 = 0.1276

ptwotailed=.194p_{two-tailed} = .194

Step 3 — Odds Ratio:

OR=(6×13)/(9×2)=78/18=4.33OR = (6 \times 13)/(9 \times 2) = 78/18 = 4.33

95% CI for OROR (exact Cornfield): [0.71,36.85][0.71, 36.85]

Interpretation: Despite a fourfold increase in the odds of a side effect in the treatment group, the small sample size provides insufficient evidence to conclude a statistically significant association, p=.194p = .194 (Fisher's exact, two-tailed), ϕ=0.25\phi = 0.25, OR=4.33OR = 4.33 [95% CI: 0.71, 36.85]. The wide confidence interval reflects substantial uncertainty due to the small sample. A larger study is warranted.

APA write-up: "Due to small expected cell frequencies (E<5E < 5), Fisher's exact test was used. No statistically significant association was found between treatment condition and side effect occurrence (p=.194p = .194, two-tailed), though the effect size was small-to-medium (ϕ=0.25\phi = 0.25, OR=4.33OR = 4.33, 95% CI: [0.71, 36.85]). The wide confidence interval and low power (power=0.17\text{power} = 0.17) indicate that the study was substantially underpowered for this effect size; these findings should be interpreted with caution and a larger replication study is recommended."


13. Common Mistakes and How to Avoid Them

Mistake 1: Using Chi-Square with Non-Independent Observations

Problem: Entering repeated measurements, matched pairs, or clustered data into a standard chi-square test. This violates the independence assumption and produces inflated Type I error rates.

Solution: For matched pairs or pre-post binary data, use McNemar's test. For clustered data, use generalised estimating equations (GEE) or mixed-effects logistic regression. For kk repeated binary measures, use Cochran's Q test.


Mistake 2: Ignoring Small Expected Frequencies

Problem: Running a standard chi-square test when several cells have expected frequencies below 5, leading to an unreliable chi-square approximation with inflated or deflated p-values.

Solution: Always inspect the expected frequency table before interpreting results. Use Fisher's exact test for 2×22 \times 2 tables with small EijE_{ij}. For larger tables, collapse theoretically justifiable categories, or use the exact multinomial test.


Mistake 3: Treating Chi-Square as Directional

Problem: Interpreting a significant chi-square result as indicating a specific direction (e.g., "Group A is higher than Group B"). The chi-square test is omnibus and non-directional — it only indicates that some association exists.

Solution: After a significant omnibus chi-square, examine adjusted standardised residuals to identify which specific cells deviate from independence and in which direction. Report percentage breakdowns and odds ratios to characterise the direction of the association.


Mistake 4: Confusing Association with Causation

Problem: Concluding that XX causes YY because a significant association was found. Chi-square only establishes statistical association; causal inference requires experimental design or causal modelling.

Solution: Use appropriate causal language ("is associated with" rather than "causes") unless the study design (randomised experiment) supports causal claims. Consider potential confounders and use stratified analyses (CMH test) or log-linear models to adjust for them.


Mistake 5: Reporting Only the p-Value Without Effect Size

Problem: Reporting "χ2(2)=22.4\chi^2(2) = 22.4, p<.001p < .001" without effect size is insufficient. A χ2\chi^2 driven purely by large NN may correspond to a trivially small V=0.04V = 0.04 that has no practical importance.

Solution: Always report Cramér's VV (or ϕ\phi for 2×22 \times 2 tables) with its 95% CI. For 2×22 \times 2 tables in clinical or epidemiological contexts, also report the odds ratio and its CI.


Mistake 6: Using Chi-Square for Continuous Data

Problem: Dichotomising a continuous variable (e.g., age → young/old) to use chi-square instead of a more appropriate parametric test. Dichotomisation discards information and dramatically reduces statistical power.

Solution: Use the continuous variable in a correlation, regression, or t-test where appropriate. Only categorise variables when the categorical form is theoretically meaningful (e.g., clinical threshold).


Mistake 7: Misinterpreting a Non-Significant Result as Evidence of Independence

Problem: Concluding that p=.43p = .43 means the variables are independent. As with all hypothesis tests, a non-significant result means insufficient evidence against H0H_0, not that H0H_0 is true. With N=20N = 20, almost no association will reach significance.

Solution: Report power analysis and the 95% CI for VV. Use the Bayesian chi-square test (BF10<1/3BF_{10} < 1/3) or a TOST equivalence procedure to positively support independence.


Mistake 8: Applying Row or Column Percentages Inconsistently

Problem: Reporting column percentages when rows represent the grouping variable (or vice versa) makes patterns hard to interpret. Mixing row and column percentages within the same table causes confusion.

Solution: When rows represent the independent variable (groups), report row percentages (each row sums to 100%). When the marginal distributions are both random (cross-sectional survey), report both row and column percentages and let the research question guide interpretation.


14. Troubleshooting

ProblemLikely CauseSolution
Chi-square is extremely large (χ2>50\chi^2 > 50 for a 2×22 \times 2 table)Very large NN; even negligible associations become significantFocus on VV or ϕ\phi; a large χ2\chi^2 may correspond to V<0.05V < 0.05
p=0.000p = 0.000 exactlySoftware rounds to zero; extremely large χ2\chi^2Report as p<.001p < .001 per APA; investigate effect size
Expected frequency <1< 1 in one or more cellsVery small cell counts or extreme marginal imbalanceUse Fisher's exact test (2×22 \times 2); collapse categories; collect more data
Chi-square statistic equals 0Observed frequencies exactly equal expected (Oij=EijO_{ij} = E_{ij} for all cells)Verify data entry; this would mean perfect independence
Cramér's V>1V > 1Computation error or incorrect NN or min(r1,c1)\min(r-1,c-1)Verify formula; VV is bounded [0,1][0,1] by construction
Fisher's exact test and chi-square give very different pp-valuesSmall sample or extreme marginalsPrefer Fisher's exact; chi-square approximation is unreliable with small EijE_{ij}
All adjusted residuals are small but χ2\chi^2 is significantAssociation is diffuse — spread uniformly across all cells, not localisedReport the overall result; diffuse associations may be artefacts of sparse data
Large VV but p>.05p > .05Small NN (low power)Study is underpowered; VV may reflect a real but undetected effect; report power
Negative odds ratio or OR<0OR < 0Computation error (OR cannot be negative)Verify cell order and formula; OR=O11O22/(O12O21)OR = O_{11}O_{22}/(O_{12}O_{21})
McNemar result differs substantially from chi-squareData are paired, not independentUse McNemar's test; the standard chi-square is incorrect for paired data
GG-statistic and χ2\chi^2 disagree substantiallyVery small expected frequencies or highly asymmetric tablesUse Fisher's exact; both GG and χ2\chi^2 are unreliable for very small samples
Structural zero in one cell (count = 0 by design)Logically impossible cell combinationUse quasi-independence model; do not include structural zeros in standard chi-square

15. Quick Reference Cheat Sheet

Core Equations

FormulaDescription
Eij=RiCj/NE_{ij} = R_i C_j / NExpected frequency for cell (i,j)(i,j)
χ2=(OijEij)2/Eij\chi^2 = \sum (O_{ij}-E_{ij})^2/E_{ij}Pearson chi-square statistic
G=2Oijln(Oij/Eij)G = 2\sum O_{ij}\ln(O_{ij}/E_{ij})Likelihood ratio statistic
ν=(r1)(c1)\nu = (r-1)(c-1)Degrees of freedom
p=P(χν2χobs2)p = P(\chi^2_\nu \geq \chi^2_{obs})Right-tail p-value
V=χ2/(Nmin(r1,c1))V = \sqrt{\chi^2/(N\min(r-1,c-1))}Cramér's VV (all table sizes)
ϕ=χ2/N\phi = \sqrt{\chi^2/N}Phi coefficient (2×22 \times 2 only)
zij=(OijEij)/Eij(1Ri/N)(1Cj/N)z_{ij} = (O_{ij}-E_{ij})/\sqrt{E_{ij}(1-R_i/N)(1-C_j/N)}Adjusted standardised residual
OR=O11O22/(O12O21)OR = O_{11}O_{22}/(O_{12}O_{21})Odds ratio (2×22 \times 2 only)
Nχcrit2/(min(r1,c1)×V2)N \approx \chi^2_{crit}/(\min(r-1,c-1) \times V^2)Approximate required NN for 80% power

Decision Guide

ConditionRecommended Test
Two categorical variables, independent, adequate NNChi-square test of association
2×22 \times 2 table with any Eij<5E_{ij} < 5Fisher's exact test
Paired binary data (pre/post)McNemar's test
Both variables ordinalLinear-by-linear association test
Three or more categorical variablesLog-linear model
Establishing independence (not just failing to reject)Bayesian chi-square (BF10BF_{10}) or TOST equivalence
One categorical variable vs. known distributionChi-square goodness-of-fit test

Cramér's VV Benchmarks (Cohen, 1988)

Table Size (min(r,c)\min(r,c))SmallMediumLarge
2 (includes 2×22 \times 2)0.100.100.300.300.500.50
30.070.070.210.210.350.35
40.060.060.170.170.290.29
50.050.050.150.150.250.25

Required Sample Size (2 × 2 Table, α=.05\alpha = .05)

VVPower = 0.80Power = 0.90
0.107851046
0.20197263
0.3088117
0.503242
0.701722

Adjusted Standardised Residual Thresholds

zij\vert z_{ij} \vertSignificance Level
>1.96> 1.96α=.05\alpha = .05
>2.58> 2.58α=.01\alpha = .01
>3.29> 3.29α=.001\alpha = .001

APA 7th Edition Reporting Templates

Standard (all table sizes): "A chi-square test of association revealed a [significant / non-significant] association between [Variable X] and [Variable Y], χ2(ν,N=[value])=[value]\chi^2(\nu, N = \text{[value]}) = \text{[value]}, p=[value]p = \text{[value]}, V=[value]V = \text{[value]} [95% CI: LB, UB]."

With odds ratio (2 × 2 tables): "... The odds of [outcome] were [OR value] times higher in [group 1] than [group 2] (95% CI: [LB, UB])."

With residuals (larger tables): "... Adjusted standardised residuals indicated that [cell description] occurred significantly more/less frequently than expected (z=[value]z = \text{[value]})."

Fisher's exact test: "Due to small expected cell frequencies, Fisher's exact test was used. [Result statement], p=[value]p = \text{[value]} (Fisher's exact, two-tailed), ϕ=[value]\phi = \text{[value]}, OR=[value]OR = \text{[value]} [95% CI: LB, UB]."

With Bayesian analysis: "The Bayesian chi-square test yielded BF10=[value]BF_{10} = \text{[value]}, indicating [moderate / strong / extreme] evidence for [an association / independence]."

Reporting Checklist

ItemRequired
Chi-square statistic with sign✅ Always
Degrees of freedom ν=(r1)(c1)\nu = (r-1)(c-1)✅ Always
Sample size NN (in parentheses with dfdf)✅ Always
Exact p-value✅ Always
Observed frequency table (or percentage breakdown)✅ Always
Expected frequency table✅ When N<100N < 100 or any EijE_{ij} near threshold
Cramér's VV or phi ϕ\phi✅ Always
95% CI for effect size✅ Always
Odds ratio and 95% CI✅ For 2×22 \times 2 in clinical/epi contexts
Adjusted standardised residuals✅ For tables larger than 2×22 \times 2
Fisher's exact test✅ When any Eij<5E_{ij} < 5
Assumption check (expected frequencies)✅ Always
Note on independence of observations✅ Always
Power analysis✅ For non-significant results; underpowered studies
Bayes FactorRecommended for null (non-significant) results
TOST equivalence test✅ When claiming independence

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting chi-square tests of association within the DataStatPro application. For further reading, consult Agresti's "An Introduction to Categorical Data Analysis" (3rd ed., 2018), Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Everitt's "The Analysis of Contingency Tables" (2nd ed., 1992), and Bishop, Fienberg & Holland's "Discrete Multivariate Analysis" (1975). For feature requests or support, contact the DataStatPro team.