Chi-Square Test of Association: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of categorical data analysis all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are encountering the chi-square test of association for the first time or deepening your understanding of relationships between categorical variables, this guide builds your knowledge systematically from the ground up.
Table of Contents
- Prerequisites and Background Concepts
- What is a Chi-Square Test of Association?
- The Mathematics Behind the Chi-Square Test of Association
- Assumptions of the Chi-Square Test of Association
- Variants of the Chi-Square Test of Association
- Using the Chi-Square Test of Association Calculator Component
- Step-by-Step Procedure
- Interpreting the Output
- Effect Sizes for the Chi-Square Test of Association
- Confidence Intervals
- Advanced Topics
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into the chi-square test of association, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.
1.1 Categorical Variables and Frequency Data
Unlike continuous variables (measured on interval or ratio scales), categorical variables assign each observation to a discrete, mutually exclusive category. The chi-square test of association operates on frequency counts — the number of observations falling into each combination of category levels.
- Nominal variables: Categories with no intrinsic order (e.g., blood type: A, B, AB, O; political party affiliation; treatment group).
- Ordinal variables: Categories with a meaningful order but no equal spacing (e.g., education level: primary, secondary, tertiary; satisfaction: low, medium, high).
⚠️ The chi-square test treats all categorical data as nominal. If your variables are ordinal, consider tests that exploit the ordering, such as the Jonckheere–Terpstra test or a linear-by-linear association test, as they are more powerful.
1.2 Contingency Tables
A contingency table (also called a cross-tabulation or crosstab) organises the joint frequency distribution of two (or more) categorical variables. For two variables (with rows) and (with columns), the contingency table has dimensions .
Each cell contains the observed frequency : the count of observations belonging to category of and category of .
Example (2 × 2 table):
| Row Total | |||
|---|---|---|---|
| Column Total |
Where is the total sample size.
1.3 The Concept of Statistical Independence
Two categorical variables and are statistically independent if knowledge of an observation's category on gives no information about its category on . Formally, independence requires:
Equivalently, the conditional distribution of given is the same for all values of . The chi-square test of association tests whether the observed data are consistent with this independence assumption.
1.4 Expected Frequencies Under Independence
If and are independent, the expected frequency for cell is the product of the corresponding marginal probabilities multiplied by the total sample size:
Since the true marginal probabilities are unknown, they are estimated from the sample:
Yielding the fundamental formula for expected frequencies:
Where is the -th row total and is the -th column total.
1.5 The Chi-Square Distribution
The chi-square distribution is a right-skewed probability distribution parameterised by degrees of freedom . Key properties:
- Defined only for non-negative values ().
- Skewed right, becoming more symmetric as increases.
- Mean ; Variance .
- As , the chi-square distribution approaches a normal distribution.
- Is the sum of squared independent standard normal variables: , where .
The chi-square statistic measures the overall discrepancy between observed and expected frequencies — large values indicate strong departure from independence.
1.6 The Null and Alternative Hypotheses in Categorical Tests
The chi-square test of association operates within the hypothesis testing framework:
- : The two categorical variables are statistically independent (no association).
- : The two categorical variables are not statistically independent (an association exists).
Note that is always non-directional for the chi-square test — it simply states that some association exists, without specifying the form or direction of the relationship.
1.7 The p-Value and Significance Level
As in all hypothesis tests, the p-value is the probability of obtaining a test statistic as extreme or more extreme than observed, assuming is true. The significance level (conventionally ) is the threshold below which we reject .
Because large values indicate departure from independence, the p-value is always computed from the right tail of the chi-square distribution:
⚠️ Rejecting tells you that an association exists; it says nothing about the strength, direction, or practical importance of that association. Always accompany chi-square results with appropriate effect size measures.
1.8 Degrees of Freedom in Contingency Tables
For a two-way contingency table with rows and columns, the degrees of freedom are:
This reflects the number of cells that are free to vary once the marginal totals are fixed. In a table, ; in a table, .
2. What is a Chi-Square Test of Association?
2.1 The Core Question
The chi-square test of association (also called Pearson's chi-square test of independence) is a non-parametric inferential test that determines whether two categorical variables measured on the same set of observations are statistically associated with one another.
The test compares the observed cell frequencies in a contingency table against the expected cell frequencies that would arise if the two variables were completely independent. A large discrepancy between observed and expected frequencies provides evidence against independence.
2.2 The General Logic
The test quantifies how different the observed table is from the table we would expect under perfect independence:
Each term in the sum is a standardised squared residual: the squared difference between what was observed and what was expected, scaled by the expected frequency. When for all cells (as would be expected under independence), will be small. Systematic departures from independence produce large .
2.3 When to Use the Chi-Square Test of Association
| Condition | Requirement |
|---|---|
| Research design | Two categorical variables measured on the same observations |
| Variable scale | Both variables nominal or ordinal (treated as nominal) |
| Data format | Frequency counts in a contingency table |
| Sample size | Adequate expected frequencies (see Assumptions) |
| Observations | Independent of each other |
| Hypothesis | Test of association, not prediction or causation |
2.4 Real-World Applications
| Field | Research Question | Variables |
|---|---|---|
| Epidemiology | Is smoking status associated with lung cancer diagnosis? | Smoking (yes/no) × Disease (yes/no) |
| Marketing | Is product preference associated with age group? | Product (A/B/C) × Age (18–34/35–54/55+) |
| Education | Is passing rate associated with teaching method? | Method (lecture/flipped/hybrid) × Result (pass/fail) |
| Clinical Psychology | Is treatment type associated with recovery status? | Treatment (CBT/medication/combined) × Recovery (yes/no) |
| Genetics | Is a genotype associated with a disease phenotype? | Genotype (AA/Aa/aa) × Disease (affected/unaffected) |
| Sociology | Is gender associated with voting preference? | Gender (M/F/NB) × Party (Democrat/Republican/Other) |
| Public Health | Is vaccination status associated with infection outcome? | Vaccinated (yes/no) × Infected (yes/no) |
| Quality Control | Is production shift associated with defect rate? | Shift (morning/afternoon/night) × Defect (yes/no) |
2.5 Distinguishing from Related Tests
| Situation | Correct Test |
|---|---|
| Two categorical variables, independent samples | Chi-square test of association |
| Two categorical variables, paired samples | McNemar's test |
| One categorical variable vs. known distribution | Chi-square goodness-of-fit test |
| Two categorical variables, small expected frequencies | Fisher's exact test |
| Ordered categorical variables | Linear-by-linear association test |
| Three or more categorical variables | Log-linear models |
| One binary outcome, continuous predictor | Logistic regression |
| Two proportions only (2 × 2 table) | Two-proportion z-test (equivalent result) |
3. The Mathematics Behind the Chi-Square Test of Association
3.1 The Pearson Chi-Square Statistic
Given an contingency table with observed cell frequencies , the Pearson chi-square statistic is:
Where the expected frequency for cell under the null hypothesis of independence is:
With:
- — the -th row marginal total
- — the -th column marginal total
- — the grand total
Under (independence), asymptotically follows a chi-square distribution with degrees of freedom.
3.2 Degrees of Freedom
For an table:
Intuition: A contingency table has cells. Given the row totals and column totals (which sum to ), only cells are free to vary — the remaining cells are determined by the marginal constraints. Each constraint consumes one degree of freedom.
| Table Dimensions | Degrees of Freedom |
|---|---|
| 1 | |
| 2 | |
| 4 | |
| 6 | |
| 9 | |
3.3 Computing the p-Value
The p-value is always computed from the right tail of the chi-square distribution:
Where is the cumulative distribution function (CDF) of the chi-square distribution with degrees of freedom. The chi-square test is inherently non-directional — departures from independence in any direction accumulate in the right tail.
3.4 Critical Values
Reject if .
Common critical values ():
| () | () | () | |
|---|---|---|---|
| 1 | 3.841 | 6.635 | 10.828 |
| 2 | 5.991 | 9.210 | 13.816 |
| 3 | 7.815 | 11.345 | 16.266 |
| 4 | 9.488 | 13.277 | 18.467 |
| 6 | 12.592 | 16.812 | 22.458 |
| 9 | 16.919 | 21.666 | 27.877 |
| 12 | 21.026 | 26.217 | 32.909 |
3.5 Standardised and Adjusted Residuals
Beyond the overall statistic, residuals reveal which specific cells deviate most from independence.
Raw residual:
Standardised residual (Pearson residual):
Note that .
Adjusted standardised residual (Freeman–Tukey; approximately ):
Adjusted standardised residuals indicate that cell deviates significantly from independence at . These are essential for locating the source of a significant chi-square result.
3.6 Yates' Continuity Correction (2 × 2 Tables)
For tables, the chi-square statistic is a continuous approximation to a discrete distribution. Yates' continuity correction improves the approximation:
Yates' correction makes the test more conservative (reduces Type I error). However, it is controversial — many statisticians consider it overly conservative and recommend using Fisher's exact test for small samples instead. DataStatPro reports both the uncorrected and Yates-corrected chi-square for tables.
3.7 The Likelihood Ratio Chi-Square (-Test)
An alternative to Pearson's is the likelihood ratio statistic (also called the -test or log-likelihood ratio):
Under , also follows a chi-square distribution with df. The -test is preferred in some fields (particularly genetics and log-linear modelling) because it is directly derived from maximum likelihood theory. For moderate to large samples, and yield nearly identical results. For small samples, can be less accurate.
3.8 Effect Size — Cramér's
Cramér's is the most widely used effect size measure for the chi-square test of association. It scales the chi-square statistic to range from 0 to 1:
Where is the smaller of the number of rows minus one and the number of columns minus one. For tables, (the phi coefficient). is interpretable as the average association strength across all possible pairs of category combinations.
3.9 Effect Size — Phi Coefficient () for 2 × 2 Tables
For tables specifically, the phi coefficient is:
The phi coefficient is equivalent to the Pearson product-moment correlation between two binary variables and ranges from to (though its magnitude is the same as for tables). A signed version conveying direction can be computed as:
The relationship between , , and is:
3.10 Statistical Power
Power for the chi-square test is the probability of detecting a true association of given magnitude. Under , the chi-square statistic follows a non-central chi-square distribution with non-centrality parameter:
Where are the true cell probabilities and , are the true marginal probabilities. In terms of :
Required sample size for desired power at significance level :
Required for a table, , two conventional effect sizes:
| Cramér's | Power = 0.80 | Power = 0.90 | Power = 0.95 |
|---|---|---|---|
| 0.10 (small) | 785 | 1046 | 1294 |
| 0.20 | 197 | 263 | 325 |
| 0.30 (medium) | 88 | 117 | 145 |
| 0.50 (large) | 32 | 42 | 53 |
| 0.70 | 17 | 22 | 28 |
4. Assumptions of the Chi-Square Test of Association
4.1 Independence of Observations
Each observation must contribute to exactly one cell of the contingency table. This means each participant or unit is counted once and only once. Independence is a design assumption, not testable from the data.
Common violations:
- Multiple responses from the same participant counted as separate observations.
- Paired or matched data analysed as independent (use McNemar's test instead).
- Clustered sampling where observations within clusters are correlated.
When violated: Use McNemar's test (matched pairs), Cochran's Q test (repeated measures), or mixed-effects models for clustered categorical data.
4.2 Adequate Expected Frequencies
The chi-square approximation is only valid when expected frequencies are sufficiently large. The most widely cited guidelines are:
| Guideline | Rule |
|---|---|
| Cochran (1954) | All ; no more than 20% of cells with |
| Yates (1934) | All (strict rule) |
| Agresti (2007) | Most ; use Fisher's exact for with any |
When expected frequencies are inadequate:
- For 2 × 2 tables: Use Fisher's exact test (computes exact p-values without the chi-square approximation).
- For larger tables: Combine categories (where theoretically justifiable), collect more data, or use the exact multinomial test.
⚠️ Adequate expected frequencies are about , not . A cell can have a large observed count but a small expected count — always check directly.
4.3 Fixed Marginal Totals (Study Design Consideration)
The chi-square test technically assumes that the row totals are fixed by the study design (sampling from pre-specified groups). If both margins are random, the test remains valid asymptotically, but Fisher's exact test (which conditions on both margins) is more appropriate for small samples.
In practice:
- Fixed row margins: Prospective study (e.g., 50 smokers and 50 non-smokers recruited in advance).
- Fixed grand total only: Cross-sectional survey where only is fixed.
- Both designs yield valid chi-square tests for large samples.
4.4 Nominal or Ordinal Scale of Measurement
Both variables must be categorical. The chi-square test makes no use of any ordering information in ordinal variables (all ordering is discarded). If both variables are ordinal, more powerful tests exploiting the ordering (linear-by-linear association, Spearman's ) should be considered alongside or instead of chi-square.
4.5 Sufficiently Large Sample Size
In addition to adequate cell expectations, the overall sample size must be large enough for the asymptotic chi-square approximation to hold. A common guideline is for a table; larger tables require proportionally larger .
When violated: Use Fisher's exact test () or the exact multinomial test (larger tables).
4.6 No Structural Zeros
A structural zero is a cell that is logically impossible (e.g., "males who are pregnant"). Structural zeros violate the independence model and require special treatment (structural equation models or quasi-independence models).
4.7 Assumption Summary
| Assumption | How to Check | Remedy if Violated |
|---|---|---|
| Independence of observations | Study design review | McNemar's test; multilevel models |
| Adequate | Inspect expected frequency table | Fisher's exact test; collapse categories |
| Sufficient overall | Count total observations | Fisher's exact; collect more data |
| Categorical variables | Measurement review | Use appropriate scale-specific tests |
| No structural zeros | Theoretical review | Quasi-independence models |
5. Variants of the Chi-Square Test of Association
5.1 Pearson Chi-Square Test of Independence (Standard)
The classic form: compare observed to expected frequencies in a two-way contingency table using the Pearson statistic .
5.2 Fisher's Exact Test
When expected frequencies are small (especially in tables), Fisher's exact test computes the exact probability of observing the data or a more extreme table, given the observed marginal totals:
The p-value is the sum of hypergeometric probabilities for all tables as extreme as or more extreme than observed. Fisher's exact test is always valid (not an approximation) but extends with difficulty to larger tables (requires exact conditional multinomial computation).
5.3 Chi-Square Goodness-of-Fit Test
Although not a test of association, this closely related test assesses whether the observed distribution of a single categorical variable matches a theoretically specified distribution :
Degrees of freedom: (number of categories minus one).
5.4 McNemar's Test (Paired Nominal Data)
For paired or matched binary categorical data (e.g., before/after designs), McNemar's test is the appropriate alternative to chi-square. It focuses on the discordant pairs:
Where and are the off-diagonal cells of the matched-pairs table. With continuity correction: .
5.5 Linear-by-Linear Association Test
When both variables are ordinal, the linear-by-linear association test assigns integer scores to the ordered categories and tests for a trend:
Where is the Spearman correlation between the two ordinal variables. This test has 1 degree of freedom regardless of the table size and is more powerful than the overall chi-square when the true association is monotone.
5.6 Cochran-Mantel-Haenszel Test (Stratified Tables)
When the relationship between two binary variables must be assessed across multiple strata (e.g., the same 2 × 2 table measured in several hospitals), the Mantel- Haenszel test combines evidence across strata:
This controls for the stratifying variable and estimates a common odds ratio across strata, assuming the association is homogeneous.
5.7 Bayesian Chi-Square Analysis
The Bayesian approach to contingency table analysis estimates a Bayes Factor comparing the association model () to the independence model (). Under a symmetric Dirichlet prior, the Bayes Factor can be approximated via the Bayesian Information Criterion (BIC):
indicates moderate evidence for an association; indicates moderate evidence for independence.
6. Using the Chi-Square Test of Association Calculator Component
The Chi-Square Test of Association Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, and reporting tests of association in contingency tables.
Step-by-Step Guide
Step 1 — Select the Test
Navigate to Statistical Tests → Chi-Square Tests → Chi-Square Test of Association.
Step 2 — Input Method
Choose how to provide data:
- Raw data: Upload or paste two categorical variable columns. DataStatPro automatically constructs the contingency table, computing row totals, column totals, and the grand total.
- Contingency table: Enter the observed frequency counts directly into the interactive table grid. Specify row and column labels. Add or remove rows and columns using the and controls.
- Summary proportions: Enter proportions and a total to reconstruct the table.
Step 3 — Define the Table Structure
- Specify the number of rows and columns .
- Label the row variable, column variable, and each category.
- DataStatPro automatically computes all marginal totals and expected frequencies .
Step 4 — Select the Alternative Test (if applicable)
DataStatPro automatically detects violated assumptions and suggests alternatives:
- If any in a table → Fisher's exact test is recommended.
- If both variables are ordinal → Linear-by-linear association test is offered.
- If data are paired → McNemar's test is prompted.
Step 5 — Set Significance Level
Default: . DataStatPro simultaneously reports results at and .
Step 6 — Select Display Options
- ✅ Observed and expected frequency tables with percentage breakdowns.
- ✅ Pearson , , exact p-value, and decision.
- ✅ Yates' continuity-corrected (for tables).
- ✅ Fisher's exact test p-value (for tables).
- ✅ Likelihood ratio statistic.
- ✅ Cramér's (all table sizes) and phi ( only) with 95% CI.
- ✅ Standardised and adjusted standardised residuals with significance flags.
- ✅ Contribution of each cell to (heat-mapped).
- ✅ Mosaic plot and clustered bar chart visualisations.
- ✅ Chi-square distribution diagram with observed statistic and critical region.
- ✅ Power analysis: current power and required for 80%, 90%, 95% power.
- ✅ Bayesian analysis (Bayes Factor ).
- ✅ APA 7th edition results paragraph (auto-generated).
Step 7 — Run the Analysis
Click "Run Chi-Square Test of Association". DataStatPro will:
- Compute all , verify assumption of adequate expected frequencies.
- Compute , , , and exact p-value.
- Apply Yates' correction and compute Fisher's exact p-value (for ).
- Compute all effect size measures (, ) with confidence intervals.
- Compute standardised and adjusted residuals for all cells.
- Generate all selected visualisations.
- Estimate post-hoc power and produce sample size recommendations.
- Output an APA-compliant results paragraph.
7. Step-by-Step Procedure
7.1 Full Manual Procedure
Step 1 — State the Hypotheses
: [Variable ] and [Variable ] are statistically independent.
: [Variable ] and [Variable ] are not statistically independent (an association exists).
Step 2 — Construct the Contingency Table
Tally observations into an table. Record:
- Each cell's observed frequency .
- Row totals .
- Column totals .
- Grand total .
Step 3 — Check Assumptions
- Verify independence of observations (design review).
- Compute all expected frequencies: .
- Confirm no more than 20% of cells have and all .
- If assumptions are violated, switch to Fisher's exact test or collapse categories.
Step 4 — Compute Expected Frequencies
Verify: and (marginal totals are preserved).
Step 5 — Compute the Chi-Square Statistic
Step 6 — Determine Degrees of Freedom
Step 7 — Compute the p-Value
Reject if .
Step 8 — Compute Effect Size
Cramér's (all tables):
Phi coefficient ( tables only):
Step 9 — Compute Standardised Residuals
Flag cells where (significant at ).
Step 10 — Interpret and Report
Use the APA reporting template in Section 15. Always report , , , , Cramér's (or ) with 95% CI, and a table of observed frequencies with expected frequencies (or at minimum percentage breakdowns).
8. Interpreting the Output
8.1 The Chi-Square Statistic
| Relative to | Interpretation |
|---|---|
| Fail to reject ; no significant association at | |
| Reject ; significant association detected at | |
| Large with large | Can be significant even for very weak associations |
| Small with small | May be non-significant even for large associations (low power) |
8.2 The p-Value
| p-Value | Conventional Interpretation |
|---|---|
| No evidence against (independence) | |
| Marginal evidence of association (trend) | |
| Significant association at | |
| Significant association at | |
| Significant association at |
⚠️ A significant p-value only indicates that some departure from independence exists. It does not indicate the strength, direction, or practical importance of the association. Always examine effect sizes, residuals, and percentage breakdowns to understand the nature of the association.
8.3 Expected and Observed Frequency Tables
Comparing observed and expected frequencies reveals the pattern of association:
| Cell Pattern | Interpretation |
|---|---|
| (large positive residual) | This combination occurs more often than independence predicts |
| (large negative residual) | This combination occurs less often than independence predicts |
| for all cells | Data are consistent with independence |
| One or two cells drive | Association is localised; examine residuals |
8.4 Cramér's — Magnitude Interpretation
Cohen's (1988) benchmarks for Cramér's :
These benchmarks depend on the minimum dimension :
| (incl. ) | ||||
|---|---|---|---|---|
| Small | ||||
| Medium | ||||
| Large |
For the case (phi = Cramér's ):
| or | Verbal Label |
|---|---|
| Negligible | |
| Small | |
| Small to medium | |
| Medium | |
| Large | |
| Very large |
⚠️ Cohen's benchmarks were developed for the behavioural sciences and are conventions of last resort, not universal standards. In epidemiology, an odds ratio of 1.5 (corresponding to a small ) may be highly practically significant; in genetics, very small values can be of great theoretical importance. Always contextualise effect sizes within your specific domain.
8.5 Standardised Residuals: Locating the Source of Association
After a significant global chi-square, examine adjusted standardised residuals :
| Interpretation | |
|---|---|
| Cell does not significantly deviate from independence () | |
| Significant deviation at | |
| Significant deviation at | |
| Significant deviation at |
Positive residuals: the combination occurs more than expected under independence. Negative residuals: the combination occurs less than expected.
⚠️ When examining residuals across multiple cells, apply a multiple comparisons correction (e.g., Bonferroni: compare to ) to control the familywise error rate.
9. Effect Sizes for the Chi-Square Test of Association
9.1 Phi Coefficient () — for 2 × 2 Tables
Or signed (to indicate direction of association):
Interpretation: Equivalent to the Pearson correlation between two binary variables. Ranges from (perfect negative association) to (perfect positive association); = independence.
9.2 Cramér's — for All Table Sizes
Interpretation: Average association strength rescaled to . indicates independence; indicates perfect association (each row category uniquely determines the column category). For tables, .
9.3 Tschuprow's
An alternative to Cramér's for non-square tables:
penalises tables with very unequal dimensions more heavily than . For square tables (), . achieves its maximum of 1 only for square tables; for non-square tables, the maximum is less than 1.
9.4 Odds Ratio () — for 2 × 2 Tables
For tables with binary exposure () and binary outcome ():
Interpretation: The ratio of the odds of given to the odds of given . The odds ratio is the preferred effect size in clinical and epidemiological research.
- : No association.
- : Exposure is positively associated with the outcome.
- : Exposure is negatively associated with the outcome.
Approximate 95% CI for :
Exponentiate the bounds to obtain the CI on the OR scale.
9.5 Relative Risk () — for Prospective 2 × 2 Studies
When row margins are fixed by design (prospective/experimental study):
Interpretation: The ratio of the probability of the outcome in group 1 to the probability in group 2. Ranges from 0 to ; indicates no association.
⚠️ Relative risk is only interpretable when row margins are fixed (i.e., group sizes are pre-specified). For cross-sectional or case-control designs, use the odds ratio.
9.6 Effect Size Summary Table
| Effect Size | Formula | Range | Interpretation |
|---|---|---|---|
| Phi () | ( signed) | Correlation for binary variables; only | |
| Cramér's | Association strength; all table sizes | ||
| Tschuprow's | Conservative alternative to | ||
| Odds ratio () | Clinical/epi effect; only | ||
| Relative risk () | Prospective studies; only |
10. Confidence Intervals
10.1 CI for Cramér's
An asymptotic 95% CI for uses the non-central chi-square distribution. The non-centrality parameter (adjusted for bias). Finding and from the non-central chi-square distribution:
DataStatPro computes exact CIs numerically. An approximate 95% CI uses:
(adequate for )
10.2 CI for the Odds Ratio
Exact (Cornfield) 95% CI for :
Computed iteratively by finding the interval on the scale within which the conditional distribution of covers 95% probability. DataStatPro computes this exactly.
Approximate (Woolf's) 95% CI:
10.3 CI for Proportions and Risk Difference
For tables, the risk difference (absolute risk reduction) and its CI directly quantify the practical importance of the association:
95% CI for (Newcombe's method recommended over Wald):
Where and .
10.4 Equivalence and Confidence Intervals
If the goal is to establish that the association is negligibly small (near- independence), use an equivalence testing approach:
- Specify a maximum tolerable effect size (e.g., ).
- Conclude practical equivalence if the entire 95% CI for falls below .
- If the upper CI bound exceeds , equivalence cannot be concluded.
11. Advanced Topics
11.1 Multiple Chi-Square Tests on the Same Dataset
Testing associations between multiple pairs of categorical variables in the same dataset inflates the familywise error rate:
For tests: .
Correction strategies:
- Bonferroni: . Simple but conservative.
- Holm-Bonferroni: Sequential adjustment — less conservative than Bonferroni.
- Benjamini-Hochberg: Controls the False Discovery Rate (FDR) — appropriate for large-scale exploratory analyses (e.g., genome-wide association studies).
11.2 Partitioning Chi-Square in Larger Tables
For tables larger than , a significant overall tells you that some association exists somewhere in the table, but not where. Beyond residual analysis, the overall can be partitioned into independent subtables (using Helmert or polynomial contrast coding), each with 1 df, subject to the constraint that the partition df sum to the total df.
This allows focused hypothesis tests about specific row or column comparisons (e.g., "Do groups A and B differ from group C?" tested separately from "Do groups A and B differ from each other?").
11.3 Measures of Agreement vs. Measures of Association
For square tables where both variables classify the same objects into the same categories (inter-rater agreement), use Cohen's kappa () rather than or :
Where is the observed agreement and is the expected agreement by chance. → perfect agreement; → agreement at chance level.
11.4 Log-Linear Models for Multi-Way Tables
For three or more categorical variables, pairwise chi-square tests are inadequate — they cannot distinguish direct associations from indirect ones mediated by a third variable. Log-linear models model the joint distribution of all variables and allow testing for higher-order interactions:
Model selection (backward elimination or BIC-based) identifies the most parsimonious model that fits the data, isolating which associations are genuine versus artefactual.
11.5 Simpson's Paradox
Simpson's Paradox occurs when an association observed in the overall table reverses or disappears when the data are stratified by a third variable. This is one of the most important reasons to never rely on the marginal chi-square test alone when potential confounders exist.
Classic example: Drug A appears superior to Drug B overall, but within each hospital stratum, Drug B is superior — the reversal is caused by the confounding of hospital quality with drug assignment.
Detection: Use the Cochran-Mantel-Haenszel procedure to compute stratum-adjusted estimates and compare to the unadjusted association.
11.6 The Relationship Between Chi-Square and Other Tests
The chi-square test of association is algebraically equivalent to several other tests in special cases:
| Special Case | Equivalent Test |
|---|---|
| table | Two-proportion z-test: |
| , small | Fisher's exact test |
| Both variables ordinal | Spearman correlation significance test |
| One binary, one continuous | Point-biserial correlation; independent t-test |
| table | Chi-square goodness-of-fit test |
11.7 Bayesian Chi-Square Analysis
The Bayesian approach quantifies evidence rather than making a binary decision. The BIC-approximated Bayes Factor is:
Interpreting :
| Evidence for (Association) over (Independence) | |
|---|---|
| Extreme | |
| Very strong | |
| Strong | |
| Moderate | |
| Anecdotal | |
| No evidence | |
| Anecdotal evidence for | |
| Moderate evidence for (independence) |
Key advantage: constitutes positive evidence for independence — something p-values cannot provide.
12. Worked Examples
Example 1: Smoking Status and Lung Disease (2 × 2)
A public health researcher surveys adults, recording smoking status (smoker/non-smoker) and presence of chronic lung disease (yes/no).
Contingency Table (Observed):
| Lung Disease: Yes | Lung Disease: No | Row Total | |
|---|---|---|---|
| Smoker | 40 | 60 | 100 |
| Non-Smoker | 20 | 80 | 100 |
| Column Total | 60 | 140 | 200 |
Step 1 — Hypotheses:
: Smoking status and lung disease are independent.
: Smoking status and lung disease are associated.
Step 2 — Expected Frequencies:
All or — assumption of adequate expected frequencies is satisfied.
Step 3 — Chi-Square Statistic:
Step 4 — Degrees of Freedom and p-Value:
Step 5 — Effect Sizes:
95% CI for :
Step 6 — Adjusted Standardised Residuals:
By symmetry: ; ; .
All cells show significant deviations (, significant at ).
Summary:
| Statistic | Value | Interpretation |
|---|---|---|
| (two-tailed) | Highly significant | |
| Small-to-medium association | ||
| Smokers 2.67× more likely to have lung disease | ||
| 95% CI for | Excludes 1; confirms significant association |
APA write-up: "A chi-square test of association revealed a significant association between smoking status and lung disease, , , . The odds of lung disease were 2.67 times higher for smokers than non-smokers (95% CI: [1.38, 5.16]). Adjusted standardised residuals indicated that smokers showed more lung disease () and non-smokers showed less lung disease () than expected under independence."
Example 2: Teaching Method and Pass/Fail Outcome (3 × 2)
An education researcher compares three teaching methods (lecture, flipped classroom, online) on student pass/fail outcomes for students (70 per method).
Contingency Table (Observed):
| Pass | Fail | Row Total | |
|---|---|---|---|
| Lecture | 45 | 25 | 70 |
| Flipped | 55 | 15 | 70 |
| Online | 38 | 32 | 70 |
| Col. Total | 138 | 72 | 210 |
Step 1 — Expected Frequencies:
:
All — assumption satisfied.
Step 2 — Chi-Square Statistic ():
Step 3 — p-Value:
Step 4 — Cramér's :
Step 5 — Adjusted Standardised Residuals:
Using :
| Cell | | Significant ()? | | :--- | :------- | :-------------------------- | | Lecture/Pass | | No | | Lecture/Fail | | No | | Flipped/Pass | | Yes () | | Flipped/Fail | | Yes () | | Online/Pass | | Yes () | | Online/Fail | | Yes () |
Interpretation: Teaching method is significantly associated with pass/fail outcome, , , . The flipped classroom exceeds the expected pass rate (more passes, fewer fails than expected), while online learning underperforms (fewer passes, more fails than expected). The lecture method does not significantly deviate from independence.
APA write-up: "A chi-square test of association indicated a significant association between teaching method and pass/fail outcome, , , . Adjusted standardised residuals revealed that the flipped classroom had significantly more passes and fewer fails than expected ( and , respectively), while the online method had significantly fewer passes and more fails than expected ( and , respectively). The lecture method did not deviate significantly from independence."
Example 3: Fisher's Exact Test — Rare Side Effect
A clinical trial investigates whether a new drug is associated with a rare side effect. Only participants are available.
Contingency Table (Observed):
| Side Effect: Yes | Side Effect: No | Row Total | |
|---|---|---|---|
| Treatment | 6 | 9 | 15 |
| Placebo | 2 | 13 | 15 |
| Column Total | 8 | 22 | 30 |
Step 1 — Check Assumptions:
Both cells involving side effects have → Fisher's exact test is required rather than the standard chi-square approximation.
Step 2 — Fisher's Exact Test p-Value:
The one-tailed p-value (testing : treatment increases side effects) is computed as the sum of hypergeometric probabilities for tables as extreme as or more extreme than observed:
Step 3 — Odds Ratio:
95% CI for (exact Cornfield):
Interpretation: Despite a fourfold increase in the odds of a side effect in the treatment group, the small sample size provides insufficient evidence to conclude a statistically significant association, (Fisher's exact, two-tailed), , [95% CI: 0.71, 36.85]. The wide confidence interval reflects substantial uncertainty due to the small sample. A larger study is warranted.
APA write-up: "Due to small expected cell frequencies (), Fisher's exact test was used. No statistically significant association was found between treatment condition and side effect occurrence (, two-tailed), though the effect size was small-to-medium (, , 95% CI: [0.71, 36.85]). The wide confidence interval and low power () indicate that the study was substantially underpowered for this effect size; these findings should be interpreted with caution and a larger replication study is recommended."
13. Common Mistakes and How to Avoid Them
Mistake 1: Using Chi-Square with Non-Independent Observations
Problem: Entering repeated measurements, matched pairs, or clustered data into a standard chi-square test. This violates the independence assumption and produces inflated Type I error rates.
Solution: For matched pairs or pre-post binary data, use McNemar's test. For clustered data, use generalised estimating equations (GEE) or mixed-effects logistic regression. For repeated binary measures, use Cochran's Q test.
Mistake 2: Ignoring Small Expected Frequencies
Problem: Running a standard chi-square test when several cells have expected frequencies below 5, leading to an unreliable chi-square approximation with inflated or deflated p-values.
Solution: Always inspect the expected frequency table before interpreting results. Use Fisher's exact test for tables with small . For larger tables, collapse theoretically justifiable categories, or use the exact multinomial test.
Mistake 3: Treating Chi-Square as Directional
Problem: Interpreting a significant chi-square result as indicating a specific direction (e.g., "Group A is higher than Group B"). The chi-square test is omnibus and non-directional — it only indicates that some association exists.
Solution: After a significant omnibus chi-square, examine adjusted standardised residuals to identify which specific cells deviate from independence and in which direction. Report percentage breakdowns and odds ratios to characterise the direction of the association.
Mistake 4: Confusing Association with Causation
Problem: Concluding that causes because a significant association was found. Chi-square only establishes statistical association; causal inference requires experimental design or causal modelling.
Solution: Use appropriate causal language ("is associated with" rather than "causes") unless the study design (randomised experiment) supports causal claims. Consider potential confounders and use stratified analyses (CMH test) or log-linear models to adjust for them.
Mistake 5: Reporting Only the p-Value Without Effect Size
Problem: Reporting ", " without effect size is insufficient. A driven purely by large may correspond to a trivially small that has no practical importance.
Solution: Always report Cramér's (or for tables) with its 95% CI. For tables in clinical or epidemiological contexts, also report the odds ratio and its CI.
Mistake 6: Using Chi-Square for Continuous Data
Problem: Dichotomising a continuous variable (e.g., age → young/old) to use chi-square instead of a more appropriate parametric test. Dichotomisation discards information and dramatically reduces statistical power.
Solution: Use the continuous variable in a correlation, regression, or t-test where appropriate. Only categorise variables when the categorical form is theoretically meaningful (e.g., clinical threshold).
Mistake 7: Misinterpreting a Non-Significant Result as Evidence of Independence
Problem: Concluding that means the variables are independent. As with all hypothesis tests, a non-significant result means insufficient evidence against , not that is true. With , almost no association will reach significance.
Solution: Report power analysis and the 95% CI for . Use the Bayesian chi-square test () or a TOST equivalence procedure to positively support independence.
Mistake 8: Applying Row or Column Percentages Inconsistently
Problem: Reporting column percentages when rows represent the grouping variable (or vice versa) makes patterns hard to interpret. Mixing row and column percentages within the same table causes confusion.
Solution: When rows represent the independent variable (groups), report row percentages (each row sums to 100%). When the marginal distributions are both random (cross-sectional survey), report both row and column percentages and let the research question guide interpretation.
14. Troubleshooting
| Problem | Likely Cause | Solution |
|---|---|---|
| Chi-square is extremely large ( for a table) | Very large ; even negligible associations become significant | Focus on or ; a large may correspond to |
| exactly | Software rounds to zero; extremely large | Report as per APA; investigate effect size |
| Expected frequency in one or more cells | Very small cell counts or extreme marginal imbalance | Use Fisher's exact test (); collapse categories; collect more data |
| Chi-square statistic equals 0 | Observed frequencies exactly equal expected ( for all cells) | Verify data entry; this would mean perfect independence |
| Cramér's | Computation error or incorrect or | Verify formula; is bounded by construction |
| Fisher's exact test and chi-square give very different -values | Small sample or extreme marginals | Prefer Fisher's exact; chi-square approximation is unreliable with small |
| All adjusted residuals are small but is significant | Association is diffuse — spread uniformly across all cells, not localised | Report the overall result; diffuse associations may be artefacts of sparse data |
| Large but | Small (low power) | Study is underpowered; may reflect a real but undetected effect; report power |
| Negative odds ratio or | Computation error (OR cannot be negative) | Verify cell order and formula; |
| McNemar result differs substantially from chi-square | Data are paired, not independent | Use McNemar's test; the standard chi-square is incorrect for paired data |
| -statistic and disagree substantially | Very small expected frequencies or highly asymmetric tables | Use Fisher's exact; both and are unreliable for very small samples |
| Structural zero in one cell (count = 0 by design) | Logically impossible cell combination | Use quasi-independence model; do not include structural zeros in standard chi-square |
15. Quick Reference Cheat Sheet
Core Equations
| Formula | Description |
|---|---|
| Expected frequency for cell | |
| Pearson chi-square statistic | |
| Likelihood ratio statistic | |
| Degrees of freedom | |
| Right-tail p-value | |
| Cramér's (all table sizes) | |
| Phi coefficient ( only) | |
| Adjusted standardised residual | |
| Odds ratio ( only) | |
| Approximate required for 80% power |
Decision Guide
| Condition | Recommended Test |
|---|---|
| Two categorical variables, independent, adequate | Chi-square test of association |
| table with any | Fisher's exact test |
| Paired binary data (pre/post) | McNemar's test |
| Both variables ordinal | Linear-by-linear association test |
| Three or more categorical variables | Log-linear model |
| Establishing independence (not just failing to reject) | Bayesian chi-square () or TOST equivalence |
| One categorical variable vs. known distribution | Chi-square goodness-of-fit test |
Cramér's Benchmarks (Cohen, 1988)
| Table Size () | Small | Medium | Large |
|---|---|---|---|
| 2 (includes ) | |||
| 3 | |||
| 4 | |||
| 5 |
Required Sample Size (2 × 2 Table, )
| Power = 0.80 | Power = 0.90 | |
|---|---|---|
| 0.10 | 785 | 1046 |
| 0.20 | 197 | 263 |
| 0.30 | 88 | 117 |
| 0.50 | 32 | 42 |
| 0.70 | 17 | 22 |
Adjusted Standardised Residual Thresholds
| Significance Level | |
|---|---|
APA 7th Edition Reporting Templates
Standard (all table sizes): "A chi-square test of association revealed a [significant / non-significant] association between [Variable X] and [Variable Y], , , [95% CI: LB, UB]."
With odds ratio (2 × 2 tables): "... The odds of [outcome] were [OR value] times higher in [group 1] than [group 2] (95% CI: [LB, UB])."
With residuals (larger tables): "... Adjusted standardised residuals indicated that [cell description] occurred significantly more/less frequently than expected ()."
Fisher's exact test: "Due to small expected cell frequencies, Fisher's exact test was used. [Result statement], (Fisher's exact, two-tailed), , [95% CI: LB, UB]."
With Bayesian analysis: "The Bayesian chi-square test yielded , indicating [moderate / strong / extreme] evidence for [an association / independence]."
Reporting Checklist
| Item | Required |
|---|---|
| Chi-square statistic with sign | ✅ Always |
| Degrees of freedom | ✅ Always |
| Sample size (in parentheses with ) | ✅ Always |
| Exact p-value | ✅ Always |
| Observed frequency table (or percentage breakdown) | ✅ Always |
| Expected frequency table | ✅ When or any near threshold |
| Cramér's or phi | ✅ Always |
| 95% CI for effect size | ✅ Always |
| Odds ratio and 95% CI | ✅ For in clinical/epi contexts |
| Adjusted standardised residuals | ✅ For tables larger than |
| Fisher's exact test | ✅ When any |
| Assumption check (expected frequencies) | ✅ Always |
| Note on independence of observations | ✅ Always |
| Power analysis | ✅ For non-significant results; underpowered studies |
| Bayes Factor | Recommended for null (non-significant) results |
| TOST equivalence test | ✅ When claiming independence |
This tutorial provides a comprehensive foundation for understanding, conducting, and reporting chi-square tests of association within the DataStatPro application. For further reading, consult Agresti's "An Introduction to Categorical Data Analysis" (3rd ed., 2018), Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Everitt's "The Analysis of Contingency Tables" (2nd ed., 1992), and Bishop, Fienberg & Holland's "Discrete Multivariate Analysis" (1975). For feature requests or support, contact the DataStatPro team.