Wilcoxon Signed-Rank Test: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of non-parametric inference all the way through the mathematics, assumptions, variants, effect sizes, interpretation, reporting, and practical usage of the Wilcoxon Signed-Rank Test within the DataStatPro application. Whether you are encountering the Wilcoxon Signed-Rank Test for the first time or seeking a rigorous understanding of rank-based within-subjects comparison, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is the Wilcoxon Signed-Rank Test?
The Mathematics Behind the Wilcoxon Signed-Rank Test
Assumptions of the Wilcoxon Signed-Rank Test
Variants of the Wilcoxon Signed-Rank Test
Using the Wilcoxon Signed-Rank Test Calculator Component
Full Step-by-Step Procedure
Effect Sizes for the Wilcoxon Signed-Rank Test
Confidence Intervals
Power Analysis and Sample Size Planning
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into the Wilcoxon Signed-Rank Test, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 Parametric vs. Non-Parametric Inference

Parametric tests (such as the paired t-test) make specific assumptions about the shape of the population distribution — typically that data are drawn from a normally distributed population. Their test statistics are derived from distributional assumptions, and their validity depends on how well those assumptions are met.

Non-parametric tests (also called distribution-free tests) do not assume a specific parametric form for the population distribution. Instead, they are based on the ranks of the data rather than the raw values themselves. Because ranks carry less information than raw values, non-parametric tests are generally less powerful than their parametric counterparts when parametric assumptions are met — but they can be more powerful when those assumptions are violated.

The Wilcoxon Signed-Rank Test is the leading non-parametric alternative to the paired t-test for comparing two related conditions when the normality of difference scores cannot be assumed.

1.2 The Concept of Ranks

Ranking transforms raw data values into their relative order positions. Given a set of values $\{x_1, x_2, \ldots, x_n\}$ :

Assign rank 1 to the smallest value, rank 2 to the next smallest, and so on.
For tied values, assign the average rank (midrank) to all tied observations.

Example:

Value	Rank
$3.1$	1
$5.4$	2
$5.4$	2 (midrank of ranks 2 and 3)
$7.9$	4
$12.1$	5

Ranking discards information about the precise magnitude of differences between values (e.g., whether the gap between ranks 1 and 2 is 0.1 or 100 units) but preserves the ordinal information (which values are larger or smaller). This makes rank-based tests robust to extreme values and non-normal distributions.

1.3 Ordinal, Interval, and Ratio Scales

The level of measurement determines which statistical tests are appropriate:

Scale	Properties	Examples	Appropriate Summaries
Nominal	Categories only	Gender, blood type	Mode, frequencies
Ordinal	Ordered categories; unequal intervals	Likert items, pain ratings, ranks	Median, percentiles
Interval	Equal intervals; no true zero	Temperature (°C), IQ scores	Mean, SD
Ratio	Equal intervals; true zero	Height, weight, reaction time	Mean, SD, ratios

The Wilcoxon Signed-Rank Test is appropriate for ordinal data and for interval/ratio data that violate the normality assumption of the paired t-test.

1.4 The Median as a Measure of Central Tendency

The median is the value that divides the distribution into two equal halves — 50% of observations fall below it and 50% above it. Unlike the mean, the median is:

Resistant to outliers: A single extreme value does not distort the median.
Appropriate for skewed distributions: The median better represents the "typical" value when distributions are asymmetric.
The natural parameter for non-parametric tests: The Wilcoxon Signed-Rank Test can be interpreted as testing whether the population pseudo-median of difference scores differs from zero (under the symmetry assumption).

The pseudo-median (also called the Hodges-Lehmann estimator) is the median of all pairwise averages $(d_i + d_j)/2$ for $i \leq j$ , including each observation paired with itself.

1.5 Signed Ranks: Combining Magnitude and Direction

The Wilcoxon Signed-Rank Test uniquely combines two pieces of information from difference scores:

Magnitude: How large is each difference, relative to the others? (Captured by the rank of the absolute difference.)
Direction: Is each difference positive or negative? (Captured by the sign attached to the rank.)

By ranking absolute differences and then restoring the sign, the test gives more weight to large differences than to small ones — unlike the sign test, which ignores magnitude entirely. This is why the Wilcoxon Signed-Rank Test is more powerful than the sign test.

1.6 The Null and Alternative Hypotheses

The Wilcoxon Signed-Rank Test operates under the following hypotheses:

Under the symmetry assumption:

$H_0:$ The population of difference scores is symmetrically distributed about zero.

$H_1:$ The population of difference scores is NOT symmetrically distributed about zero.

Equivalently (under symmetry):

$H_0: \text{pseudo-median of differences} = 0$

Without the symmetry assumption (more general interpretation):

$H_0: P(d_i > 0) = P(d_i < 0) = 0.5$

(The probability of a positive difference equals the probability of a negative difference.)

Directional alternatives:

$H_1: P(d_i > 0) > P(d_i < 0)$ (upper one-tailed)

$H_1: P(d_i > 0) < P(d_i < 0)$ (lower one-tailed)

1.7 The Asymptotic Relative Efficiency

The Asymptotic Relative Efficiency (ARE) of a non-parametric test relative to its parametric counterpart quantifies the relative sample sizes needed to achieve the same power as $n \to \infty$ .

For the Wilcoxon Signed-Rank Test vs. the paired t-test:

$ARE = \frac{3}{\pi} \approx 0.955$ (for normally distributed data)

This means that for normally distributed data, the Wilcoxon test requires approximately $1/0.955 \approx 1.047$ times as many observations as the paired t-test to achieve the same power — a loss of only about 5%. In exchange for this negligible efficiency cost, the Wilcoxon test gains complete robustness to non-normality.

For non-normal distributions, the Wilcoxon test can be substantially more efficient than the t-test:

Distribution	ARE (Wilcoxon vs. t-test)
Normal	$3/\pi \approx 0.955$
Uniform	$1.000$
Double exponential (Laplace)	$1.500$
Logistic	$\pi^2/9 \approx 1.097$
Contaminated normal (10% outliers)	$> 2.000$
Heavy-tailed distributions	Can be very large

💡 For data that are approximately normal, using the Wilcoxon test costs you only 5% efficiency. For data with heavy tails or outliers, the Wilcoxon test can dramatically outperform the t-test. This asymmetry makes the Wilcoxon test a safe default when normality is uncertain.

1.8 Type I Error, Power, and the Role of Sample Size

Type I error ( $\alpha$ ): The probability of incorrectly rejecting $H_0$ when it is true. The Wilcoxon Signed-Rank Test maintains the nominal $\alpha$ regardless of the underlying distribution (for continuous data).
Type II error ( $\beta$ ): The probability of failing to detect a true effect.
Power ( $1-\beta$ ): The probability of correctly detecting a true effect.

The Wilcoxon test achieves nearly identical power to the paired t-test for normal data and superior power for non-normal data, making it a generally safe and efficient choice for paired comparisons.

2. What is the Wilcoxon Signed-Rank Test?

2.1 The Core Idea

The Wilcoxon Signed-Rank Test (Wilcoxon, 1945) is a non-parametric inferential procedure for testing whether two related conditions (measured on the same participants or matched pairs) have the same distribution. It is the non-parametric alternative to the paired t-test when the assumption of normally distributed difference scores cannot be met.

Rather than working with raw difference scores and computing means and standard deviations (as the paired t-test does), the Wilcoxon test:

Computes the absolute values of the difference scores $|d_i|$ .
Ranks the absolute differences from smallest to largest.
Restores the sign of each difference to its rank.
Computes the sum of the positive ranks $W^+$ and the sum of the negative ranks $W^-$ as the test statistics.
Evaluates whether $W^+$ and $W^-$ are sufficiently different from what would be expected by chance if $H_0$ were true.

Under $H_0$ , positive and negative differences should be roughly equally common and roughly equally large — so $W^+$ and $W^-$ should be approximately equal (each approximately $n(n+1)/4$ ). Large discrepancies between $W^+$ and $W^-$ provide evidence against $H_0$ .

2.2 When to Use the Wilcoxon Signed-Rank Test

The Wilcoxon Signed-Rank Test is the appropriate choice when:

The DV is measured on an ordinal scale (e.g., Likert items, pain ratings, satisfaction scores) where differences may not be meaningful in interval terms.
The DV is continuous (interval/ratio) but the difference scores are non-normally distributed and sample size is small ( $n < 30$ ).
There are extreme outliers in the difference scores that cannot be removed or explained.
The distribution of differences is heavily skewed, making the mean a poor representation of central tendency.
The research question concerns whether one condition tends to produce higher values than the other rather than specifically about the mean difference.

2.3 The Wilcoxon Signed-Rank Test vs. Related Procedures

Situation	Appropriate Test
Two related conditions, differences normal	Paired t-test (preferred for power)
Two related conditions, differences non-normal	Wilcoxon Signed-Rank Test
Two related conditions, only direction of difference known	Sign Test (less powerful)
One group vs. known value, non-normal	Wilcoxon Signed-Rank (one-sample version)
Three or more related conditions, non-normal	Friedman Test
Two independent groups, non-normal	Mann-Whitney U Test
Two related conditions, Bayesian non-parametric	Bayesian Signed-Rank Test

2.4 The Wilcoxon Signed-Rank Test vs. the Sign Test

The Wilcoxon Signed-Rank Test and the Sign Test are both non-parametric tests for paired data, but they differ in the information they use:

Property	Wilcoxon Signed-Rank	Sign Test
Information used	Rank of $	d_i
Requires rankable differences	✅ Yes	❌ No
Power	Higher	Lower
Robustness to outliers	High	Very high
ARE vs. t-test (normal data)	0.955	0.637
Suitable when only direction known	❌ No	✅ Yes

The Wilcoxon test is preferred over the sign test in virtually all circumstances where the absolute magnitude of differences can be ranked, because it makes better use of the available information.

2.5 Two Versions: Paired and One-Sample

The Wilcoxon Signed-Rank Test has two closely related applications:

Paired version: Compare two related conditions. Compute $d_i = x_{1i} - x_{2i}$ for each pair, then apply the test to the difference scores.

One-sample version: Test whether a single sample's population median (or pseudo-median) equals a hypothesised value $\theta_0$ . Compute $d_i = x_i - \theta_0$ and apply the test to these adjusted values.

Both versions are mathematically identical — they differ only in how the difference scores are constructed.

3. The Mathematics Behind the Wilcoxon Signed-Rank Test

3.1 Computing Difference Scores

Paired version: For $n$ pairs $(x_{1i}, x_{2i})$ , $i = 1, \ldots, n$ :

$d_i = x_{1i} - x_{2i}$

One-sample version: For $n$ observations $x_i$ tested against $\theta_0$ :

$d_i = x_i - \theta_0$

3.2 Handling Zero Differences

Pairs where $d_i = 0$ (exactly) are excluded from the analysis because they carry no information about the direction of an effect. Let $n'$ denote the number of non-zero differences remaining after exclusion. All subsequent steps use $n'$ .

⚠️ A large number of zero differences substantially reduces the effective sample size and thus statistical power. This is most common with coarsely measured ordinal scales (e.g., 5-point Likert items). If more than 20% of differences are zero, interpret results with particular caution and consider reporting the number of zero differences explicitly.

3.3 Ranking the Absolute Differences

Rank the absolute values $|d_i|$ from smallest (rank 1) to largest (rank $n'$ ).

For tied absolute values, assign the average (midrank) of the ranks they would have occupied:

If three observations are tied at the 4th, 5th, and 6th positions, each receives rank $(4+5+6)/3 = 5$ .

Notation: Let $R_i$ denote the rank assigned to $|d_i|$ .

3.4 Computing the Test Statistics $W^+$ and $W^-$

Restore the original sign to each rank:

Sum of positive ranks (ranks corresponding to $d_i > 0$ ):

$W^+ = \sum_{\{i:\;d_i > 0\}} R_i$

Sum of negative ranks (ranks corresponding to $d_i < 0$ ):

$W^- = \sum_{\{i:\;d_i < 0\}} R_i$

Verification check:

$W^+ + W^- = \frac{n'(n'+1)}{2}$

This provides an arithmetic check: if $W^+ + W^-$ does not equal $n'(n'+1)/2$ , there is a computational error.

Under $H_0$ , the expected values are:

$E[W^+] = E[W^-] = \frac{n'(n'+1)}{4}$

3.5 The Test Statistic $W$

The conventional test statistic is:

$W = \min(W^+, W^-)$

Small values of $W$ (far from $n'(n'+1)/4$ ) provide evidence against $H_0$ .

Alternatively, many software implementations report $W^+$ directly (or $T^+$ ), with the p-value computed from the appropriate tail of the sampling distribution.

DataStatPro reports both $W^+$ and $W^-$ , highlights the minimum, and computes exact and asymptotic p-values.

3.6 Exact Distribution (Small Samples, $n' \leq 25$ )

For small samples without ties, the exact null distribution of $W^+$ can be enumerated: under $H_0$ , each of the $2^{n'}$ possible sign assignments is equally likely, giving $W^+$ a discrete distribution that can be tabulated exactly.

Exact p-value (two-tailed):

$p = 2 \times \min\!\left[P(W^+ \leq W_{obs}^+), P(W^+ \geq W_{obs}^+)\right]$

DataStatPro always computes the exact p-value when $n' \leq 25$ and there are no (or few) ties, and automatically switches to the normal approximation for larger samples.

3.7 Normal Approximation (Large Samples, $n' > 25$ )

For larger samples, $W^+$ is approximately normally distributed:

$E[W^+] = \frac{n'(n'+1)}{4}$

$\text{Var}[W^+] = \frac{n'(n'+1)(2n'+1)}{24}$

z-statistic (without continuity correction):

$z = \frac{W^+ - E[W^+]}{\sqrt{\text{Var}[W^+]}} = \frac{W^+ - n'(n'+1)/4}{\sqrt{n'(n'+1)(2n'+1)/24}}$

z-statistic (with continuity correction, more accurate for discrete distributions):

$z_{cc} = \frac{|W^+ - E[W^+]| - 0.5}{\sqrt{\text{Var}[W^+]}}$

Two-tailed p-value:

$p = 2 \times [1 - \Phi(|z|)]$

Where $\Phi$ is the standard normal CDF.

3.8 Tie Correction for the Variance

When there are tied absolute difference values, the variance formula must be corrected:

$\text{Var}_{corrected}[W^+] = \frac{n'(n'+1)(2n'+1)}{24} - \frac{\sum_{k=1}^{g}(t_k^3 - t_k)}{48}$

Where:

$g$ = number of distinct tied groups among the ranked absolute differences.
$t_k$ = number of observations in the $k$ -th tied group.

The correction reduces the variance, increasing the z-statistic slightly and thus providing a more accurate p-value when ties are present.

Corrected z-statistic:

$z_{tie} = \frac{W^+ - n'(n'+1)/4}{\sqrt{\text{Var}_{corrected}[W^+]}}$

3.9 The Exact Probability Under $H_0$ : Deriving the Null Distribution

Under $H_0$ , each non-zero difference score $d_i$ is equally likely to be positive or negative, independently of its magnitude. This means each of the $2^{n'}$ possible sign assignments to the $n'$ ranks is equally probable.

The total number of distinct values $W^+$ can take ranges from $0$ (all negative) to $n'(n'+1)/2$ (all positive). The probability of any specific value of $W^+$ is the number of sign assignments producing that value divided by $2^{n'}$ .

Example for $n' = 4$ (ranks 1, 2, 3, 4; total $= 10$ ):

$W^+$ can range from 0 to 10. $P(W^+ = 0) = 1/16$ (all negative). $P(W^+ = 10) = 1/16$ (all positive). $P(W^+ = 5) = 4/16 = 0.25$ (four sign assignments give $W^+ = 5$ ).

3.10 Relationship Between Wilcoxon $W$ and the Mann-Whitney $U$

The Wilcoxon Signed-Rank statistic $W^+$ is algebraically related to the Mann-Whitney $U$ statistic. Specifically, for the one-sample or paired case, $W^+$ counts the number of Walsh averages $(d_i + d_j)/2$ (for $i \leq j$ ) that are positive:

$W^+ = \#\left\{(i,j): i \leq j \text{ and } (d_i+d_j)/2 > 0\right\}$

This connection to Walsh averages is the foundation of the Hodges-Lehmann estimator of the pseudo-median, which serves as the point estimate associated with the Wilcoxon test.

4. Assumptions of the Wilcoxon Signed-Rank Test

4.1 Symmetry of the Difference Score Distribution

The Wilcoxon Signed-Rank Test's primary assumption is that the population distribution of difference scores is symmetric about its median (pseudo-median). This is weaker than the normality assumption of the paired t-test but is still a meaningful constraint.

Why symmetry matters: The test is designed so that, under $H_0$ , positive and negative ranks of equal magnitude are equally likely. If the difference distribution is asymmetric, the test is not testing only the location of the median — it may also respond to the shape of the distribution. In that case, $H_0$ conflates "no location shift" with "symmetric distribution."

How to check:

Histogram of difference scores: look for approximate left-right symmetry about zero.
Q-Q plot of difference scores: if symmetric, points should follow a straight line (not necessarily on the normal reference line — just linear).
Skewness statistic: $|z_{skew}| < 2$ suggests no severe asymmetry.
Density plots: visual inspection of the distribution of $d_i$ .

When violated: If difference scores are severely asymmetric (heavily skewed in one direction), the Wilcoxon test's p-value may not correctly reflect only a location shift. In this case:

Use the Sign Test (which only requires that the median exists, with no symmetry assumption).
Consider a data transformation (log, square root) to reduce skewness.
Report the results with an explicit caveat about the asymmetry.

⚠️ The symmetry assumption is often overlooked. A common error is applying the Wilcoxon Signed-Rank Test to heavily right-skewed difference scores (e.g., when data represent counts or reaction times with occasional very long responses) without checking symmetry. In such cases, the Sign Test or bootstrap methods are more appropriate.

4.2 Independence of Pairs

All pairs $(x_{1i}, x_{2i})$ must be independent of each other. That is, knowing the difference score for pair $i$ gives no information about the difference score for pair $j$ ( $j \neq i$ ). Within each pair, the two measurements are of course dependent — this is the point of the paired design.

Common violations:

Multiple measurements from the same participant treated as separate pairs.
Pairs sampled from the same cluster (classroom, family, ward).
Longitudinal data with autocorrelated measurements.

When violated: Use multilevel models or time-series methods.

4.3 Continuous (or At Least Ordinal and Rankable) Differences

The test requires that the absolute differences can be meaningfully ranked — there must be a natural ordering of the magnitudes. This is satisfied whenever:

The DV is measured on a ratio or interval scale.
The DV is ordinal and differences can be ranked (e.g., a 10-point pain scale where a difference of 3 is consistently larger than a difference of 1).

When violated: If differences cannot be ranked (e.g., nominal categories), use the McNemar test (for binary outcomes) or other categorical tests.

4.4 Exchangeability Under $H_0$

Under $H_0$ , the distribution of $d_i$ must be exchangeable with respect to sign: $d_i$ and $-d_i$ must have the same distribution. This is satisfied when the difference distribution is symmetric about zero.

This condition is equivalent to stating that the probability of a positive difference equals the probability of a negative difference of the same magnitude.

4.5 Absence of Excessive Ties

The Wilcoxon Signed-Rank Test is designed for continuous data where ties in absolute differences are rare. Excessive ties (especially many zero differences) can affect the accuracy of the p-value.

Types of ties:

Zero differences ( $d_i = 0$ ): excluded from the analysis, reducing $n'$ .
Tied absolute differences ( $|d_i| = |d_j|$ for $i \neq j$ ): handled by midranks; the tie correction adjusts the variance.

How to check: Count the number of zero differences and the number of tied absolute differences. If more than 20–25% of differences are zero, the effective sample size is substantially reduced.

When excessive ties present: Use the exact permutation test version of the Wilcoxon test, which handles ties exactly. DataStatPro automatically applies the exact test with ties when $n' \leq 25$ and the standard tie-corrected approximation for larger samples.

4.6 Assumption Summary Table

Assumption	Description	How to Check	Remedy if Violated
Symmetry of differences	$d_i$ distribution is symmetric about $\theta_0$	Histogram, Q-Q, skewness of $d_i$	Sign Test; transform data
Independence of pairs	Pairs are independent across observations	Design review	Multilevel model
Rankable differences	$	d_i	$ can be meaningfully ordered
Exchangeability	$d_i$ and $-d_i$ have same distribution	Symmetry check	Sign Test; bootstrap
No excessive ties	Few zero or tied absolute differences	Count zeros and ties	Exact permutation test; sign test

5. Variants of the Wilcoxon Signed-Rank Test

5.1 Paired Version (Two-Condition Comparison)

The paired version compares two related conditions. Difference scores are computed as $d_i = x_{1i} - x_{2i}$ and the test evaluates whether the pseudo-median of the differences equals zero.

This is the most common application of the Wilcoxon Signed-Rank Test and is the primary focus of this tutorial.

5.2 One-Sample Version (Against a Hypothesised Median)

The one-sample version tests whether the population pseudo-median of a single sample equals a specified value $\theta_0$ :

$H_0: \theta = \theta_0$

Compute adjusted differences: $d_i = x_i - \theta_0$

Then apply the standard Wilcoxon procedure to these adjusted values.

Common applications:

Testing whether a sample's median IQ differs from the population norm of 100.
Testing whether median response time differs from a published normative value.
Quality control: testing whether median product weight differs from a target.

5.3 Exact vs. Approximate (Asymptotic) p-values

Exact p-value: Computes the p-value from the complete enumeration of all possible rank assignments under $H_0$ . Appropriate for small samples ( $n' \leq 25$ ) and when ties are absent or few. DataStatPro always provides the exact p-value when feasible.

Asymptotic p-value: Uses the normal approximation to the distribution of $W^+$ . Appropriate for $n' > 25$ . The tie-corrected version is more accurate when ties are present.

With continuity correction: The continuity correction ( $\pm 0.5$ adjustment to $W^+$ ) improves the accuracy of the normal approximation for moderate sample sizes by accounting for the discrete nature of $W^+$ .

Recommendation: Use the exact p-value whenever possible ( $n' \leq 25$ , few ties). For larger samples, the tie-corrected asymptotic p-value with continuity correction is generally accurate.

5.4 Permutation Version

The permutation (randomisation) version of the Wilcoxon test generates the null distribution by randomly reassigning the signs of the absolute differences $B$ times (e.g., $B = 10{,}000$ ) and computing $W^+$ for each permutation. The p-value is the proportion of permuted statistics at least as extreme as the observed $W^+$ .

This approach:

Is valid regardless of ties (handles them exactly).
Does not rely on any distributional approximation.
Requires more computation but is exact in principle.
Is particularly useful for small samples with many ties.

DataStatPro offers the permutation version under the "Exact / Permutation" option.

5.5 Pratt's Method for Zero Differences

Two conventions exist for handling zero differences ( $d_i = 0$ ):

Wilcoxon's original method (default): Exclude all zero differences; analyse only the $n'$ non-zero differences.

Pratt's method (1959): Include zero differences in the ranking, but exclude them from the sum of signed ranks. This method:

Retains the information that zero differences exist (they count toward the ranking).
Can give slightly different p-values from the standard method.
May be preferred when zeros are informative (e.g., zero change is substantively meaningful).

DataStatPro provides both methods when zero differences are present.

6. Using the Wilcoxon Signed-Rank Test Calculator Component

The Wilcoxon Signed-Rank Test Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, visualising, and reporting the test and associated effect sizes.

Step-by-Step Guide

Step 1 — Select "Wilcoxon Signed-Rank Test"

From the "Test Type" dropdown, select:

Wilcoxon Signed-Rank Test (Paired): For comparing two related conditions.
Wilcoxon Signed-Rank Test (One-Sample): For testing a single sample against a hypothesised median $\theta_0$ .

💡 DataStatPro automatically suggests the Wilcoxon Signed-Rank Test when the normality check on difference scores is significant in the Paired t-Test component. A yellow warning banner will appear with a direct link to the Wilcoxon component.

Step 2 — Input Method

Choose how to provide the data:

Raw data (paired columns): Upload or paste two columns — Condition 1 and Condition 2 — with one row per participant. DataStatPro computes all difference scores, performs symmetry checks, counts zeros and ties, and generates all statistics.
Raw data (difference scores): Upload a single column of pre-computed difference scores. Useful for the one-sample version (enter $d_i = x_i - \theta_0$ ).
Summary data (counts): Enter $n^+$ (number of positive differences), $n^-$ (number of negative differences), $n^0$ (ties at zero), and group summary statistics. Only the sign test and approximate statistics are available in this mode.
Published results: Enter the reported $W^+$ (or $T$ ), $n'$ , and any available tie information to compute p-values and effect sizes from a published result.

Step 3 — Specify the Null Hypothesis Value $\theta_0$

Paired version: Default $\theta_0 = 0$ (testing whether the median difference is zero). Enter a non-zero value for one-sample-style comparisons against a reference.
One-sample version: Enter the hypothesised population pseudo-median $\theta_0$ .

Step 4 — Select the Alternative Hypothesis

Two-tailed (default): $H_1:$ The pseudo-median differs from $\theta_0$ .
Upper one-tailed: $H_1:$ The pseudo-median is greater than $\theta_0$ .
Lower one-tailed: $H_1:$ The pseudo-median is less than $\theta_0$ .

Step 5 — Select p-value Method

Exact (recommended for $n' \leq 25$ ): Uses the complete enumeration of the null distribution. Automatically selected by DataStatPro for small samples.
Asymptotic + Continuity Correction (recommended for $n' > 25$ ): Normal approximation with tie correction and continuity correction.
Permutation ( $B$ resamples): Specify $B$ (default: $10{,}000$ ). Appropriate for any sample size, handles ties exactly.

Step 6 — Handle Zero Differences

Wilcoxon method (default): Exclude zero differences; analyse $n'$ non-zero pairs.
Pratt's method: Include zeros in ranking but not in rank sums.

DataStatPro reports $n$ (total pairs), $n^0$ (zero differences excluded), $n^+$ (positive differences), $n^-$ (negative differences), and $n' = n - n^0$ (effective sample size).

Step 7 — Select Display Options

✅ $W^+$ , $W^-$ , $W$ (minimum), $z$ , and p-value (exact and/or asymptotic).
✅ Descriptive statistics: $n'$ , $n^+$ , $n^-$ , $n^0$ , median per condition, median difference, Hodges-Lehmann pseudo-median estimate.
✅ Hodges-Lehmann estimator with 95% CI.
✅ Rank table: individual $d_i$ , $|d_i|$ , $R_i$ , signed rank.
✅ Effect size $r_W$ (rank-biserial correlation) with 95% CI.
✅ Matched-pairs rank-biserial correlation $r_{rb}$ .
✅ Common Language Effect Size (CL%).
✅ Assumption check panel: histogram of $d_i$ , Q-Q plot, skewness, zero count, tie count.
✅ Distribution visualisation: overlapping density plots per condition; histogram of signed ranks.
✅ Dot plot with connecting lines showing individual participant change.
✅ Comparison with paired t-test results (runs both; flags discrepancies).
✅ Power curve: power vs. $n$ for observed effect size.
✅ APA 7th edition-compliant results paragraph (auto-generated).

Step 8 — Run the Analysis

Click "Run Wilcoxon Test". DataStatPro will:

Compute all difference scores and rank them.
Apply zero-exclusion (or Pratt) and tie-correction.
Compute $W^+$ , $W^-$ , $z$ , exact p-value, and asymptotic p-value.
Estimate the Hodges-Lehmann pseudo-median with exact 95% CI.
Compute effect sizes $r_W$ and $r_{rb}$ with CIs.
Run assumption checks and display symmetry diagnostics.
Auto-generate the APA-compliant results paragraph.

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

This section walks through every computational step for the Wilcoxon Signed-Rank Test, from raw data to a full APA-style conclusion.

Given: $n$ pairs of observations $(x_{1i}, x_{2i})$ for $i = 1, 2, \ldots, n$ .

Step 1 — Establish Sign Convention and Compute Difference Scores

Define $d_i = x_{1i} - x_{2i}$ consistently for all pairs. A positive $d_i$ means Condition 1 yields a higher value than Condition 2 for participant $i$ .

State the sign convention explicitly: "Positive differences indicate higher scores in Condition 1 than Condition 2."

Step 2 — Identify and Exclude Zero Differences

Identify all pairs where $d_i = 0$ exactly. Remove these from further analysis.

$n' = n - n^0$ (effective sample size after exclusion)

Record $n^0$ for reporting. If $n^0 > 0$ , state explicitly that $n^0$ pairs with $d_i = 0$ were excluded.

Step 3 — Compute Absolute Differences and Check Symmetry

Compute $|d_i|$ for all $n'$ non-zero differences.

Symmetry check:

Plot a histogram of the $n'$ difference scores $d_i$ .
Compute skewness: if $|z_{skew}| < 2$ , symmetry is not severely violated.
Inspect whether the distribution appears approximately symmetric about zero.

Step 4 — Rank the Absolute Differences

Rank $|d_1|, |d_2|, \ldots, |d_{n'}|$ from smallest (rank 1) to largest (rank $n'$ ).

For tied absolute values, assign the average rank to all tied observations.

Notation: $R_i$ = rank of $|d_i|$ .

Verification: $\sum_{i=1}^{n'} R_i = n'(n'+1)/2$

Step 5 — Assign Signed Ranks

Restore the sign of each difference to its rank:

$R_i^{+} = R_i$ if $d_i > 0$ ; $R_i^{-} = R_i$ if $d_i < 0$

Create a table with columns: $i$ , $d_i$ , $|d_i|$ , $R_i$ (rank of $|d_i|$ ), and the signed rank ( $+R_i$ if positive, $-R_i$ if negative).

Step 6 — Compute the Rank Sums

$W^+ = \sum_{\{i:\; d_i > 0\}} R_i$ (sum of ranks where $d_i > 0$ )

$W^- = \sum_{\{i:\; d_i < 0\}} R_i$ (sum of ranks where $d_i < 0$ )

Verification: $W^+ + W^- = n'(n'+1)/2$

Test statistic: $W = \min(W^+, W^-)$

Count: $n^+ =$ number of positive differences; $n^- =$ number of negative differences; $n^+ + n^- = n'$ .

Step 7 — Compute the p-value

If $n' \leq 25$ and few ties: Use the exact null distribution (from tables or software enumeration).

If $n' > 25$ or many ties: Use the normal approximation with tie correction:

$E[W^+] = \frac{n'(n'+1)}{4}$

$\text{Var}_{corrected}[W^+] = \frac{n'(n'+1)(2n'+1)}{24} - \frac{\sum_k(t_k^3-t_k)}{48}$

$z = \frac{W^+ - E[W^+]}{\sqrt{\text{Var}_{corrected}[W^+]}}$

With continuity correction:

$z_{cc} = \frac{|W^+ - E[W^+]| - 0.5}{\sqrt{\text{Var}_{corrected}[W^+]}}$

Two-tailed p-value:

$p = 2 \times [1 - \Phi(|z_{cc}|)]$

Compare $p$ to $\alpha$ . Reject $H_0$ if $p \leq \alpha$ .

Step 8 — Compute the Hodges-Lehmann Point Estimate

The Hodges-Lehmann estimator $\hat{\theta}$ is the point estimate of the pseudo-median associated with the Wilcoxon test. It is the median of all pairwise averages of the non-zero differences:

$\hat{\theta} = \text{Median}\left\{\frac{d_i + d_j}{2} : 1 \leq i \leq j \leq n'\right\}$

There are $n'(n'+1)/2$ such averages (including each difference paired with itself).

This estimator is:

Robust to outliers (like the median).
More efficient than the median for symmetric distributions.
The natural point estimate associated with the Wilcoxon test.

Step 9 — Compute the 95% CI for the Pseudo-Median

The exact 95% CI for the pseudo-median uses the order statistics of the Walsh averages (all $n'(n'+1)/2$ pairwise averages). The CI bounds are determined by the critical values of the Wilcoxon null distribution.

Let $W^+_{\alpha/2}$ be the lower critical value from the exact Wilcoxon table:

$\text{CI bounds} =$ the $W^+_{\alpha/2}$ -th smallest and $(n'(n'+1)/2 - W^+_{\alpha/2} + 1)$ -th largest Walsh average.

DataStatPro computes these exact CI bounds numerically.

Approximate 95% CI (for large $n'$ ):

Find $C_\alpha = z_{\alpha/2}\sqrt{\text{Var}[W^+]}$ , then the CI consists of the $(W^+_{\alpha/2} + 1)$ -th to $(W^+_{1-\alpha/2})$ -th ordered Walsh averages.

Step 10 — Compute Effect Sizes

Effect size $r_W$ (from z-statistic):

$r_W = \frac{z}{\sqrt{n'}}$

Matched-pairs rank-biserial correlation $r_{rb}$ (from Kerby, 2014):

$r_{rb} = \frac{W^+ - W^-}{W^+ + W^-} = \frac{W^+ - W^-}{n'(n'+1)/2}$

Or equivalently:

$r_{rb} = 1 - \frac{4W^-}{n'(n'+1)} = \frac{4W^+}{n'(n'+1)} - 1$

Both $r_W$ and $r_{rb}$ range from $-1$ to $+1$ .

Common Language Effect Size (CL):

$CL = \frac{W^+}{n'(n'+1)/2} \times 100\%$ (when $W^+ > W^-$ , i.e., most differences positive)

More precisely: $CL = \frac{\text{number of Walsh averages} > 0}{n'(n'+1)/2}$

Step 11 — Interpret and Report

Combine all results into a complete APA-compliant report:

State the test used and the reason (non-normality, ordinal data).
Report group/condition medians.
Report $W^+$ (or $W$ ), $z$ , and $p$ .
Report the Hodges-Lehmann estimate with 95% CI.
Report the effect size $r_{rb}$ (or $r_W$ ) with its 95% CI.
State the practical conclusion.

8. Effect Sizes for the Wilcoxon Signed-Rank Test

8.1 The Rank-Biserial Correlation $r_{rb}$ — Primary Effect Size

The matched-pairs rank-biserial correlation (Kerby, 2014) is the recommended primary effect size for the Wilcoxon Signed-Rank Test. It has several equivalent formulations:

From rank sums:

$r_{rb} = \frac{W^+ - W^-}{n'(n'+1)/2}$

From positive and negative rank proportions:

$r_{rb} = \frac{\text{sum of positive ranks} - \text{sum of negative ranks}}{\text{total rank sum}}$

Interpretation: $r_{rb}$ represents the difference between the proportion of favourable and unfavourable evidence in the data.

$r_{rb} = +1.0$ : All differences are positive (every participant scores higher in Condition 1).
$r_{rb} = -1.0$ : All differences are negative (every participant scores higher in Condition 2).
$r_{rb} = 0.0$ : Equal evidence for positive and negative effects ( $W^+ = W^-$ ).
$r_{rb} = +0.50$ : 75% of the evidence favours Condition 1 over Condition 2.

This last property is related to the probability of superiority interpretation:

$P(d_i > 0) = \frac{1 + r_{rb}}{2}$ (approximately, under the symmetry assumption)

8.2 The $r_W$ Effect Size — From the z-Statistic

$r_W$ (sometimes written $r$ or $r_{Wilcoxon}$ ) is the effect size computed directly from the standardised test statistic:

$r_W = \frac{z}{\sqrt{n'}}$

Where $z$ is the z-approximation to the Wilcoxon statistic and $n'$ is the effective sample size (excluding zero differences).

$r_W$ has the same range as a Pearson correlation ( $-1$ to $+1$ ) and uses the same verbal benchmarks as Pearson $r$ . It is mathematically equivalent to the point-biserial correlation between a binary indicator of condition and the observed rank differences.

Relationship between $r_W$ and $r_{rb}$ :

For large $n'$ without ties, $r_W \approx r_{rb}$ . They can differ for small samples or with many ties.

💡 DataStatPro reports both $r_W$ and $r_{rb}$ . For primary reporting, $r_{rb}$ is recommended because it is interpretable without reference to the z-approximation and has a direct probability-of-superiority interpretation. Use $r_W$ when comparing to literature that reports this variant.

8.3 Cohen's Benchmarks for $r_W$ and $r_{rb}$

Since $r_W$ and $r_{rb}$ behave like correlation coefficients, Cohen's (1988) benchmarks for Pearson $r$ are applied:

$\vert r_W \vert$ or $\vert r_{rb} \vert$	Verbal Label	Equivalent $d$	Power needed ( $n'$ pairs)
$0.10$	Small	$0.20$	$\approx 264$
$0.30$	Medium	$0.62$	$\approx 52$
$0.50$	Large	$1.15$	$\approx 20$
$0.70$	Very large	$1.96$	$\approx 9$
$0.90$	Huge	$4.13$	$\approx 5$

Power estimates for two-tailed $\alpha = .05$ , 80% power, Wilcoxon test.

⚠️ These benchmarks from Cohen (1988) are rough guidelines. Always contextualise effect sizes against domain-specific norms. An $r_{rb} = 0.30$ may be large in some fields (e.g., large-scale educational interventions) and small in others (e.g., lab-controlled cognitive experiments).

8.4 Converting Between Effect Size Metrics

From	To	Formula
$r_{rb}$	$d$ (approx)	$d = \frac{2r_{rb}}{\sqrt{1-r_{rb}^2}}$
$d$	$r_{rb}$ (approx)	$r_{rb} = \frac{d}{\sqrt{d^2+4}}$
$r_W$	$d$	$d = \frac{2r_W}{\sqrt{1-r_W^2}}$
$W^+$ , $n'$	$r_{rb}$	$r_{rb} = (2W^+ - n'(n'+1)/2)/(n'(n'+1)/2)$
$z$ , $n'$	$r_W$	$r_W = z/\sqrt{n'}$

⚠️ The conversions between $r$ and $d$ above use the equal-groups formula and are only approximations. Do not use these conversions for meta-analytic aggregation without accounting for the design structure.

8.5 The Hodges-Lehmann Estimator as an Effect Size

The Hodges-Lehmann pseudo-median $\hat{\theta}$ is the point estimate in original measurement units associated with the Wilcoxon test. It is:

Reported alongside $r_{rb}$ to provide both a standardised and an unstandardised effect.
More interpretable than $r_{rb}$ for practitioners who think in original scale units.
More robust than the mean difference $\bar{d}$ to outliers and skewness.
The natural "what is the effect size in original units?" companion to the Wilcoxon test.

Reporting recommendation: Always report $\hat{\theta}$ with its 95% CI alongside $r_{rb}$ . This parallels the paired t-test practice of reporting both the mean difference (in original units) and Cohen's $d$ .

8.6 The Common Language Effect Size for the Wilcoxon Test

The Common Language Effect Size (CL) for the Wilcoxon context is:

$CL = P(\text{randomly selected pair has } d_i > 0)$

Estimated from the data:

$\widehat{CL} = \frac{n^+}{n'} \times 100\%$ (simple version based on counts)

Or, more precisely using Walsh averages:

$\widehat{CL} = \frac{\text{number of Walsh averages } (d_i+d_j)/2 > 0}{n'(n'+1)/2} \times 100\%$

This is the probability that a randomly selected participant scores higher in Condition 1 than in Condition 2, estimated non-parametrically from the data.

9. Confidence Intervals

9.1 Exact CI for the Hodges-Lehmann Pseudo-Median

The natural CI to report with the Wilcoxon Signed-Rank Test is the exact confidence interval for the pseudo-median (Hodges-Lehmann CI), expressed in the original measurement units.

Algorithm:

Compute all $M = n'(n'+1)/2$ Walsh averages: $(d_i + d_j)/2$ for $1 \leq i \leq j \leq n'$ .
Sort the Walsh averages in ascending order: $A_{(1)} \leq A_{(2)} \leq \cdots \leq A_{(M)}$ .
Find the lower critical value $C_L$ from the exact Wilcoxon null distribution at the chosen $\alpha$ level.
The 95% CI is $\left[A_{(C_L+1)}, A_{(M-C_L)}\right]$ .

Where $C_L$ is the largest value of $W^+$ for which $P(W^+ \leq C_L) \leq \alpha/2$ under $H_0$ .

DataStatPro computes this exact CI automatically.

9.2 Number of Walsh Averages for Common Sample Sizes

$n'$ pairs	$M = n'(n'+1)/2$ Walsh averages
5	15
10	55
15	120
20	210
30	465
50	1275
100	5050

9.3 Interpreting the Hodges-Lehmann CI

The Hodges-Lehmann CI has the same interpretation as any confidence interval: if the study were repeated many times, approximately 95% of the resulting intervals would contain the true population pseudo-median.

CI interpretation rules:

CI Property	Interpretation
Entirely above zero	Pseudo-median is significantly positive; Condition 1 tends to produce higher values
Entirely below zero	Pseudo-median is significantly negative; Condition 2 tends to produce higher values
Contains zero	Result is not statistically significant at level $\alpha$
Narrow CI	Precise estimate (large $n'$ )
Wide CI	Imprecise estimate (small $n'$ ); interpret cautiously

9.4 CI for the Effect Size $r_{rb}$

A bootstrap 95% CI for $r_{rb}$ is available in DataStatPro when raw data are provided:

Resample $n'$ pairs with replacement $B = 10{,}000$ times.
Compute $r_{rb}^{(b)}$ for each bootstrap sample.
The 95% CI is the 2.5th and 97.5th percentile of the bootstrap distribution.

An asymptotic CI can also be computed using Fisher's $z$ -transformation:

$z_{r_{rb}} = \text{arctanh}(r_{rb}), \quad SE_{z_{r_{rb}}} = \frac{1}{\sqrt{n'-3}}$

$95\%\text{ CI for } z_{r_{rb}}: z_{r_{rb}} \pm 1.96/\sqrt{n'-3}$

Back-transform: $r_{rb} = \tanh(z_{r_{rb}})$

9.5 Width of the CI as a Function of Sample Size

For $r_{rb} = 0.30$ using the Fisher $z$ approximation:

$n'$	$SE_{z_r}$	Approx. CI Width ( $r$ )	Precision
10	0.378	1.16	Very low
20	0.243	0.79	Low
30	0.189	0.63	Moderate
50	0.145	0.49	Moderate
100	0.102	0.35	Good
200	0.071	0.25	High

⚠️ The CI for $r_{rb}$ is very wide for small samples. Always report the CI to convey the uncertainty in the effect size estimate. A precise-looking point estimate of $r_{rb} = 0.50$ from $n' = 10$ pairs has a CI of approximately $[-0.15, 0.84]$ — nearly uninformative about the true effect magnitude.

10. Power Analysis and Sample Size Planning

10.1 Power of the Wilcoxon Signed-Rank Test

Power analysis for the Wilcoxon Signed-Rank Test is more complex than for parametric tests because the power depends on the entire distribution of difference scores, not just the mean and variance. Three approaches are used:

Approach 1 — Use the ARE relative to the paired t-test:

Since $ARE = 3/\pi \approx 0.955$ for normal data, the required $n'$ for the Wilcoxon test is approximately $1/0.955 \approx 1.047$ times the $n$ required for the paired t-test at the same power.

$n'_{Wilcoxon} \approx n_{paired\;t} \times \frac{\pi}{3} \approx 1.047 \times n_{paired\;t}$

This is the most practical planning approach when $d_z$ is known or estimated.

Approach 2 — Use the effect size $r_{rb}$ directly (simulation-based):

DataStatPro uses Monte Carlo simulation to estimate power for specified $r_{rb}$ (or $d_z$ ), $n'$ , $\alpha$ , and distributional shape (normal, logistic, exponential).

Approach 3 — Use the normal approximation (large samples):

For large $n'$ , power is approximately:

$\text{Power} \approx \Phi\!\left(|z_{\lambda}| - z_{\alpha/2}\right)$

Where $z_\lambda = r_{rb}\sqrt{n'(n'+1)/2} / \sqrt{\text{Var}[W^+]}$ is the non-centrality parameter.

10.2 Required Sample Size for 80% Power ( $\alpha = .05$ , Two-Tailed)

Based on converting $d_z$ to Wilcoxon $n'$ via ARE (normal data):

$n'_{Wilcoxon} \approx \frac{7.849}{d_z^2} \times \frac{\pi}{3}$

$d_z$ equivalent	$r_{rb}$ (approx)	$n'$ Wilcoxon (80% power)	$n$ Paired t (80% power)	Overhead
0.20	0.099	277	264	+5%
0.30	0.148	125	119	+5%
0.50	0.243	46	44	+5%
0.80	0.372	19	18	+6%
1.00	0.447	14	13	+8%
1.20	0.514	10	9	+11%
1.50	0.600	7	7	$\approx 0\%$

Note: For non-normal distributions (heavy tails, skewed), the Wilcoxon test may require fewer observations than the paired t-test.

10.3 Sensitivity Analysis

The minimum detectable effect size $r_{rb,min}$ for a given $n'$ and power (80%):

Using the ARE-based approximation:

$d_{z,min} \approx \sqrt{\frac{7.849 \times \pi/3}{n'}} = \sqrt{\frac{8.211}{n'}}$

$r_{rb,min} \approx \frac{d_{z,min}}{\sqrt{d_{z,min}^2 + 4}}$

$n'$ pairs	Min. detectable $d_z$	Min. detectable $r_{rb}$
10	0.906	0.411
20	0.641	0.306
30	0.523	0.253
50	0.405	0.199
100	0.286	0.142
200	0.202	0.101

10.4 Power Advantage Under Non-Normality

For non-normal distributions, the Wilcoxon test's power advantage over the t-test grows:

Distribution of $d_i$	ARE	Implication
Normal	0.955	Wilcoxon needs 5% more pairs
Contaminated normal (5% outliers)	1.34	Wilcoxon needs 25% fewer pairs
Laplace (double exponential)	1.50	Wilcoxon needs 33% fewer pairs
Logistic	1.10	Wilcoxon needs 9% fewer pairs
Heavy Cauchy tails	$\gg 1$	Wilcoxon dramatically more powerful

💡 When the distribution of difference scores is expected to be non-normal (e.g., for Likert-type scales, skewed physiological data, or time-to-event measures), plan sample size using the Wilcoxon test directly via DataStatPro's Monte Carlo power module rather than the ARE-based approximation.

11. Advanced Topics

11.1 Comparing the Wilcoxon Signed-Rank Test and the Paired t-Test

A common question is: given that both tests are available, which should be reported?

Decision criteria:

Condition	Recommendation
Difference scores clearly normal, no outliers, $n \geq 15$	Paired t-test (slightly more powerful)
Difference scores non-normal, $n < 30$	Wilcoxon Signed-Rank Test
Difference scores ordinal or near-ordinal	Wilcoxon Signed-Rank Test
Severe outliers in differences that cannot be removed	Wilcoxon Signed-Rank Test
Uncertain normality, small $n$	Wilcoxon Signed-Rank Test (safer)
$n \geq 30$ , differences mildly non-normal	Either test (CLT protects t-test)
Pre-registered choice, normality assumed	Paired t-test with Wilcoxon as sensitivity

Best practice: When normality is uncertain, run both tests. If they agree (both significant or both non-significant), report the parametric result as primary with the non-parametric as a sensitivity check. If they disagree, investigate the distribution of differences and report the Wilcoxon as the primary test with an explanation.

11.2 The Sign Test as a Simpler Alternative

The Sign Test is an even simpler non-parametric test that uses only the sign of each difference (ignoring magnitude). It tests $H_0: P(d_i > 0) = 0.5$ using the binomial distribution:

$B = n^+ \sim \text{Binomial}(n', 0.5)$ under $H_0$

When to use the Sign Test over Wilcoxon:

Only the direction of change is known (not the magnitude).
Data are binary or nominal (e.g., improved vs. not improved).
The distribution of differences is so severely non-symmetric that even the Wilcoxon test's symmetry assumption is implausible.

Efficiency comparison: The Sign Test has ARE $= 2/\pi \approx 0.637$ relative to the paired t-test — substantially less efficient than the Wilcoxon test's ARE of 0.955. Use the Sign Test only when the Wilcoxon test's symmetry assumption cannot be justified.

11.3 Bootstrap Wilcoxon Test

The bootstrap version of the Wilcoxon test generates the null distribution by resampling:

For each bootstrap iteration $b = 1, \ldots, B$ : a. Randomly flip the sign of each $d_i$ with probability 0.5 (sign randomisation under $H_0$ ). b. Compute $W^{+,(b)}$ from the sign-randomised differences.
The bootstrap p-value is the proportion of $|W^{+,(b)} - E[W^+]|$ that exceeds $|W^+_{obs} - E[W^+]|$ .

This approach:

Is valid regardless of ties or distributional shape.
Produces exact p-values in the limit as $B \to \infty$ .
Is equivalent to the permutation test described in Section 5.4.

11.4 Bayesian Non-Parametric Paired Test

The Bayesian Signed-Rank Test (van Doorn et al., 2018; Ly et al., 2016) extends the Bayesian framework to the Wilcoxon setting. It computes a Bayes Factor $BF_{10}$ quantifying evidence for $H_1$ (pseudo-median $\neq 0$ ) vs. $H_0$ (pseudo-median $= 0$ ) without assuming normality.

The prior on the scaled pseudo-median under $H_1$ is a Cauchy distribution (as in the Bayesian t-test), but the likelihood is based on a normal approximation to the sampling distribution of the Wilcoxon statistic.

$BF_{10}^{Wilcoxon} \approx BF_{10}^{t}$ evaluated at $t = z_{Wilcoxon}$ with $\nu = n'-1$

This approximation is valid for $n' \geq 20$ . DataStatPro computes the Bayesian Signed-Rank Test using this approximation.

Interpretation of $BF_{10}$ : Same benchmarks as the Bayesian t-test (see Section 11.4 of the Paired t-Test tutorial).

11.5 Multiple Wilcoxon Tests and Familywise Error Control

When multiple Wilcoxon Signed-Rank Tests are conducted simultaneously (e.g., testing the same intervention on five different outcomes), the familywise error rate (FWER) inflates exactly as with multiple t-tests:

$FWER = 1 - (1-\alpha)^k$

Correction methods applicable to multiple Wilcoxon tests:

Method	Adjusted $\alpha$	Properties
Bonferroni	$\alpha/k$	Conservative; controls FWER
Holm	Sequential	Less conservative than Bonferroni
Benjamini-Hochberg	FDR control	Exploratory analyses

Apply the same correction logic as for multiple parametric tests.

11.6 The Wilcoxon Test for Ordinal Likert Scale Data

A common application of the Wilcoxon Signed-Rank Test is to paired Likert scale responses. Consider a satisfaction survey where participants rate two products on a 5-point scale (1 = very dissatisfied, 5 = very satisfied).

Key considerations:

Single Likert items should be treated as ordinal; the Wilcoxon test is appropriate.
Composite Likert scales (sum or average of multiple items) can often be treated as approximately continuous; the paired t-test may be appropriate if the composite is approximately normally distributed.
Floor and ceiling effects are common with Likert data and create many zero differences and ties — check carefully and consider Pratt's method.
The Wilcoxon test cannot distinguish between a systematic shift of 1 point (each participant rates Product 1 exactly 1 point higher) and a mixed pattern (some rate it 2 points higher, others 1 point lower). The Hodges-Lehmann estimate helps clarify the typical magnitude of change.

11.7 Reporting the Wilcoxon Signed-Rank Test According to APA 7th Edition

Minimum reporting requirements (APA 7th ed.):

State that the Wilcoxon Signed-Rank Test was used and why (e.g., non-normal differences, ordinal data).
Report medians for each condition (or the Hodges-Lehmann pseudo-median estimate).
Report the test statistic: $T$ or $W^+$ (or $W$ , the minimum), and the z-approximation if $n' > 25$ .
Report the exact or asymptotic p-value.
Report the effect size $r_{rb}$ (or $r_W$ ) with 95% CI.
Report the Hodges-Lehmann estimate with 95% CI (in original units).
Report $n'$ , $n^+$ , $n^-$ , and $n^0$ (number of zeros excluded).

12. Worked Examples

Example 1: Pre-Post Anxiety Scores (Non-Normal Differences)

A clinical psychologist evaluates an 8-week acceptance and commitment therapy (ACT) programme for anxiety. Generalised Anxiety Disorder 7-item scale (GAD-7; range 0–21; higher = more anxiety) scores are recorded for $n = 12$ participants before and after the programme.

Shapiro-Wilk test on raw scores: Differences are right-skewed ( $W = 0.821$ , $p = .017$ ) — normality violated. The Wilcoxon Signed-Rank Test is used.

Raw data:

$i$	Pre-ACT ( $x_{1i}$ )	Post-ACT ( $x_{2i}$ )	$d_i = x_{1i}-x_{2i}$
1	16	9	7
2	12	8	4
3	18	6	12
4	14	11	3
5	20	8	12
6	11	9	2
7	17	14	3
8	15	5	10
9	13	11	2
10	19	10	9
11	16	13	3
12	14	12	2

Step 1 — Zero differences: No $d_i = 0$ , so $n' = 12$ .

Step 2 — Absolute differences and symmetry check:

$|d_i|$ : 7, 4, 12, 3, 12, 2, 3, 10, 2, 9, 3, 2

Symmetry check: all differences are positive (no negative differences), indicating a strong shift. The distribution of $d_i$ is right-skewed (all positive, with some large values of 12), which is consistent with the Shapiro-Wilk violation.

Step 3 — Rank the absolute differences:

Sorted $|d_i|$ values and their ranks (with midranks for ties):

$\vert d_i \vert$ value	Count	Rank positions	Avg rank
2	3	1, 2, 3	2.0
3	3	4, 5, 6	5.0
4	1	7	7.0
7	1	8	8.0
9	1	9	9.0
10	1	10	10.0
12	2	11, 12	11.5

Rank assignment:

$i$	$d_i$	$\vert d_i \vert$	Rank $R_i$	Signed Rank
1	7	7	8.0	$+8.0$
2	4	4	7.0	$+7.0$
3	12	12	11.5	$+11.5$
4	3	3	5.0	$+5.0$
5	12	12	11.5	$+11.5$
6	2	2	2.0	$+2.0$
7	3	3	5.0	$+5.0$
8	10	10	10.0	$+10.0$
9	2	2	2.0	$+2.0$
10	9	9	9.0	$+9.0$
11	3	3	5.0	$+5.0$
12	2	2	2.0	$+2.0$

Step 4 — Rank sums:

$W^+ = 8.0+7.0+11.5+5.0+11.5+2.0+5.0+10.0+2.0+9.0+5.0+2.0 = 78.0$

$W^- = 0.0$ (no negative differences)

Check: $W^+ + W^- = 78.0 = 12 \times 13/2 = 78$ ✅

$W = \min(78.0, 0.0) = 0.0$

$n^+ = 12$ , $n^- = 0$ , $n^0 = 0$

Step 5 — Exact p-value ( $n' = 12$ ):

With $W = 0$ (all differences positive), the exact two-tailed p-value is:

$p = 2 \times P(W^+ \leq 0) = 2 \times (1/2^{12}) = 2/4096 = 0.000488$

$p < .001$

Step 6 — Hodges-Lehmann estimator:

All $n'(n'+1)/2 = 12 \times 13/2 = 78$ Walsh averages $(d_i+d_j)/2$ are computed and sorted. The median of 78 values is the average of the 39th and 40th sorted Walsh averages.

Given all differences are positive (2, 2, 2, 3, 3, 3, 4, 7, 9, 10, 12, 12), the Walsh averages range from 2 (minimum) to 12 (maximum), all positive.

$\hat{\theta} = 4.5$ GAD-7 points (median of Walsh averages; computed by DataStatPro)

95% CI for pseudo-median (exact): $[3.0, 9.5]$ GAD-7 points

Step 7 — Effect sizes:

Rank-biserial correlation:

$r_{rb} = \frac{W^+ - W^-}{n'(n'+1)/2} = \frac{78-0}{78} = 1.000$

$r_{rb} = 1.0$ (perfect: every participant improved)

z-based effect size ( $n' = 12$ , asymptotic approximation):

$E[W^+] = 12 \times 13/4 = 39$

Tie correction: $\sum_k(t_k^3-t_k) = (3^3-3)+(3^3-3)+(2^3-2) = 24+24+6 = 54$

$\text{Var}_{corrected}[W^+] = \frac{12 \times 13 \times 25}{24} - \frac{54}{48} = 162.5 - 1.125 = 161.375$

$z = (78-39)/\sqrt{161.375} = 39/12.703 = 3.070$

$r_W = 3.070/\sqrt{12} = 3.070/3.464 = 0.886$

Common Language Effect Size:

$\widehat{CL} = n^+/n' = 12/12 = 100\%$ (all participants improved)

Step 8 — Summary:

Statistic	Value	Interpretation
Pre-ACT median	$15.5$ GAD-7 pts	Moderate-severe anxiety
Post-ACT median	$9.5$ GAD-7 pts	Mild anxiety
$n'$ (non-zero diff.)	$12$	All participants showed positive change
$n^+$ / $n^-$ / $n^0$	$12$ / $0$ / $0$
$W^+$	$78.0$	Maximum possible
$W^-$	$0.0$	Zero negative ranks
$W$ (minimum)	$0.0$
$p$ (exact, two-tailed)	$< .001$
HL pseudo-median $\hat{\theta}$	$4.5$ GAD-7 pts
95% CI for $\hat{\theta}$	$[3.0, 9.5]$ pts	Excludes 0; significant
$r_{rb}$	$1.000$	Maximum possible effect
$r_W$	$0.886$	Very large
CL	$100\%$	Every participant improved

APA write-up: "Due to non-normal distribution of difference scores (Shapiro-Wilk $W = 0.82$ , $p = .017$ ), a Wilcoxon Signed-Rank Test was conducted. ACT therapy produced a statistically significant reduction in anxiety (pre-ACT: Mdn = 15.5 GAD-7 points; post-ACT: Mdn = 9.5), $W^+ = 78$ , $p < .001$ (exact). The Hodges-Lehmann estimate of the median reduction was 4.5 GAD-7 points [95% CI: 3.0, 9.5], $r_{rb} = 1.00$ , indicating a very large treatment effect. All 12 participants showed improvement following ACT."

Example 2: Pain Ratings — Two Physiotherapy Protocols (Ordinal DV)

A physiotherapist compares pain relief (0–10 NRS, ordinal) under two physiotherapy protocols in $n = 15$ patients with chronic lower back pain. Each patient receives both protocols in randomised order with a 1-week washout. Lower scores indicate less pain. $d_i = \text{Protocol A} - \text{Protocol B}$ (negative = A produces less pain).

Raw data:

$i$	Protocol A	Protocol B	$d_i$
1	4	6	−2
2	7	7	0
3	3	5	−2
4	6	8	−2
5	5	4	1
6	4	7	−3
7	6	6	0
8	3	6	−3
9	5	5	0
10	7	9	−2
11	4	6	−2
12	6	7	−1
13	5	8	−3
14	3	5	−2
15	6	7	−1

Step 1 — Exclude zeros:

$d_i = 0$ for participants 2, 7, 9 → $n^0 = 3$ ; $n' = 15-3 = 12$ .

Non-zero differences: $-2, -2, -2, 1, -3, -3, -2, -2, -1, -3, -2, -1$

$n^+ = 1$ (participant 5: $d_5 = +1$ ); $n^- = 11$ (all others).

Step 2 — Absolute differences and ranks:

$\vert d_i \vert$ value	Count	Rank positions	Avg rank
1	3	1, 2, 3	2.0
2	6	4, 5, 6, 7, 8, 9	6.5
3	3	10, 11, 12	11.0

Rank table (non-zero differences only):

$i$	$d_i$	$\vert d_i \vert$	$R_i$	Signed Rank
1	−2	2	6.5	−6.5
3	−2	2	6.5	−6.5
4	−2	2	6.5	−6.5
5	+1	1	2.0	+2.0
6	−3	3	11.0	−11.0
8	−3	3	11.0	−11.0
10	−2	2	6.5	−6.5
11	−2	2	6.5	−6.5
12	−1	1	2.0	−2.0
13	−3	3	11.0	−11.0
14	−2	2	6.5	−6.5
15	−1	1	2.0	−2.0

Step 3 — Rank sums:

$W^+ = 2.0$

$W^- = 6.5+6.5+6.5+11.0+11.0+6.5+6.5+2.0+11.0+6.5+2.0 = 76.0$

Check: $W^+ + W^- = 78 = 12 \times 13/2$ ✅

$W = \min(2.0, 76.0) = 2.0$

Step 4 — Exact p-value ( $n' = 12$ ):

From Wilcoxon signed-rank exact tables: $P(W^+ \leq 2) = 0.0020$ (one-tail).

Two-tailed: $p = 2 \times 0.0020 = .004$

Step 5 — z-approximation (with tie correction):

$\sum_k(t_k^3-t_k) = (3^3-3)+(6^3-6)+(3^3-3) = 24+210+24 = 258$

$\text{Var}_{corrected} = 162.5 - 258/48 = 162.5 - 5.375 = 157.125$

$z = (2.0 - 39)/\sqrt{157.125} = -37/12.535 = -2.952$

$p = 2 \times \Phi(-2.952) = 2 \times 0.00158 = .003$

Step 6 — Hodges-Lehmann estimate:

$\hat{\theta} = -2.0$ NRS points (median of Walsh averages)

95% CI for $\hat{\theta}$ (exact): $[-3.0, -1.0]$ NRS points

Step 7 — Effect sizes:

$r_{rb} = (W^+ - W^-)/(n'(n'+1)/2) = (2-76)/78 = -74/78 = -0.949$

$|r_{rb}| = 0.949$ — very large effect (Protocol A produces substantially less pain)

$r_W = z/\sqrt{n'} = -2.952/\sqrt{12} = -2.952/3.464 = -0.852$

CL (proportion of differences favouring Protocol A):

$\widehat{CL} = n^+/n' = 1/12 = 8.3\%$

Protocol B outperforms Protocol A in 91.7% of patients.

Summary:

Statistic	Value	Interpretation
Protocol A median pain	$5.0$ NRS
Protocol B median pain	$6.0$ NRS
$n' = 12$ (3 zeros excluded)
$n^+$ / $n^-$ / $n^0$	$1$ / $11$ / $3$
$W^+$	$2.0$
$W^-$	$76.0$
$W$ (minimum)	$2.0$
Exact $p$ (two-tailed)	$.004$	Significant
HL estimate $\hat{\theta}$	$-2.0$ NRS	Protocol A lowers pain by 2 pts
95% CI	$[-3.0, -1.0]$	Excludes 0
$r_{rb}$	$-0.949$	Very large
$r_W$	$-0.852$	Very large
CL	$91.7\%$ favour B	Protocol B clearly superior

APA write-up: "Due to the ordinal nature of the NRS pain scale and the presence of tied differences, a Wilcoxon Signed-Rank Test was conducted. Three pairs with equal ratings were excluded, leaving $n' = 12$ pairs. Protocol A (Mdn = 5.0) produced significantly lower pain ratings than Protocol B (Mdn = 6.0), $W^+ = 2$ , $p = .004$ (exact), $r_{rb} = -0.95$ [95% CI: −0.99, −0.73]. The Hodges-Lehmann estimate indicated that Protocol A reduced pain by a median of 2.0 NRS points compared to Protocol B [95% CI: 1.0, 3.0]. This represents a very large effect, with Protocol B producing lower pain in 11 of 12 patients with non-zero differences."

Example 3: One-Sample Wilcoxon — Daily Step Counts vs. Health Guideline

A public health researcher tests whether median daily step counts in a sample of $n = 18$ office workers differ from the recommended health guideline of 10,000 steps per day. The distribution of step counts is right-skewed (Shapiro-Wilk $p = .031$ ); the one-sample Wilcoxon Signed-Rank Test is used.

Data (daily steps, thousands):

$x_i$ : 6.2, 8.4, 11.3, 7.1, 9.8, 5.6, 12.4, 8.9, 7.3, 10.1, 6.8, 9.4, 13.2, 7.6, 8.1, 11.8, 6.4, 9.2

Null hypothesis: $\theta_0 = 10.0$ thousand steps (health guideline)

$H_0: \theta = 10.0$ vs. $H_1: \theta \neq 10.0$

Differences from guideline: $d_i = x_i - 10.0$ :

$-3.8, -1.6, 1.3, -2.9, -0.2, -4.4, 2.4, -1.1, -2.7, 0.1, -3.2, -0.6, 3.2, -2.4, -1.9, 1.8, -3.6, -0.8$

Step 1 — No zero differences: $n' = 18$ ; $n^0 = 0$ .

$n^+ = 5$ (values: 1.3, 2.4, 0.1, 3.2, 1.8); $n^- = 13$ .

Step 2 — Rank absolute differences:

Sorted $|d_i|$ : 0.1, 0.2, 0.6, 0.8, 1.1, 1.3, 1.6, 1.8, 1.9, 2.4, 2.4, 2.7, 2.9, 3.2, 3.2, 3.6, 3.8, 4.4

Ranks 1–18 assigned (midranks for tied values 2.4 and 3.2):

$\vert d_i \vert$	$R_i$	Sign	Signed Rank
0.1	1	+	+1
0.2	2	−	−2
0.6	3	−	−3
0.8	4	−	−4
1.1	5	−	−5
1.3	6	+	+6
1.6	7	−	−7
1.8	8	+	+8
1.9	9	−	−9
2.4	10.5	+	+10.5
2.4	10.5	−	−10.5
2.7	12	−	−12
2.9	13	−	−13
3.2	14.5	+	+14.5
3.2	14.5	−	−14.5
3.6	16	−	−16
3.8	17	−	−17
4.4	18	−	−18

Step 3 — Rank sums:

$W^+ = 1+6+8+10.5+14.5 = 40.0$

$W^- = 2+3+4+5+7+9+10.5+12+13+14.5+16+17+18 = 131.0$

Check: $40.0+131.0 = 171 = 18 \times 19/2$ ✅

$W = \min(40.0, 131.0) = 40.0$

Step 4 — Normal approximation (with tie correction, $n' = 18 > 14$ , use asymptotic):

$E[W^+] = 18 \times 19/4 = 85.5$

Ties: two pairs of ties (2.4 twice, 3.2 twice): $\sum_k(t_k^3-t_k) = (2^3-2)+(2^3-2) = 6+6 = 12$

$\text{Var}_{corrected} = \frac{18 \times 19 \times 37}{24} - \frac{12}{48} = 527.25 - 0.25 = 527.00$

$z = (40.0 - 85.5)/\sqrt{527.00} = -45.5/22.956 = -1.982$

With continuity correction: $z_{cc} = (|40.0-85.5|-0.5)/22.956 = 45.0/22.956 = 1.960$

$p = 2 \times [1-\Phi(1.960)] = 2 \times 0.025 = .050$

(Marginal; exact p-value from DataStatPro: $p = .047$ )

Step 5 — Hodges-Lehmann estimate and CI:

$\hat{\theta} = -1.75$ thousand steps (estimated median difference from 10,000)

Population pseudo-median: $10.000 - 1.75 = 8.25$ thousand steps/day

95% CI for pseudo-median: $[-3.50, -0.05]$ thousand steps from guideline

Step 6 — Effect sizes:

$r_{rb} = (40-131)/171 = -91/171 = -0.532$ (large effect — below guideline)

$r_W = -1.982/\sqrt{18} = -1.982/4.243 = -0.467$

Summary:

Statistic	Value
Sample median steps	$8.25$ k
Guideline	$10.0$ k
$n'$	$18$
$n^+$ / $n^-$	$5$ / $13$
$W^+$	$40.0$
$W^-$	$131.0$
$z$	$-1.982$
$p$ (exact, two-tailed)	$.047$
HL estimate (from guideline)	$-1.75$ k steps
95% CI	$[-3.50, -0.05]$ k steps
$r_{rb}$	$-0.532$ (Large)

APA write-up: "A one-sample Wilcoxon Signed-Rank Test was used to examine whether median daily step counts differed from the 10,000-step health guideline, as step counts were right-skewed (Shapiro-Wilk $W = 0.87$ , $p = .031$ ). The sample median of 8,250 steps was significantly below the guideline, $W^+ = 40$ , $p = .047$ (exact), $r_{rb} = -0.53$ [95% CI: −0.78, −0.09]. The Hodges-Lehmann estimate indicated that office workers fell short of the guideline by a median of 1,750 steps/day [95% CI: 50, 3,500 steps below], a large effect."

Example 4: Comparing Two Teaching Methods — Non-Significant Result

A teacher compares student performance on matched reading comprehension tests under two instructional methods: silent reading vs. guided discussion, in $n = 10$ students. Test scores range 0–100.

Data:

$i$	Silent ( $x_{1i}$ )	Discussion ( $x_{2i}$ )	$d_i$
1	72	75	−3
2	68	71	−3
3	81	78	3
4	65	68	−3
5	77	73	4
6	70	72	−2
7	75	75	0
8	83	80	3
9	69	74	−5
10	74	76	−2

Step 1 — Zero differences: Participant 7: $d_7 = 0$ → $n^0 = 1$ ; $n' = 9$ .

Non-zero $d_i$ : $-3, -3, 3, -3, 4, -2, 3, -5, -2$

$n^+ = 3$ ( $+3, +4, +3$ ); $n^- = 6$ ( $-3, -3, -3, -2, -5, -2$ ).

Step 2 — Rank absolute differences:

Sorted $|d_i|$ : 2, 2, 3, 3, 3, 3, 3, 4, 5

$\vert d_i \vert$	Count	Avg Rank	Sign assignments
2	2	1.5	Both −: −1.5, −1.5
3	5	5.0	Three −, two +: −5.0 (×3), +5.0 (×2)
4	1	8.0	One +: +8.0
5	1	9.0	One −: −9.0

Step 3 — Rank sums:

$W^+ = 5.0+8.0+5.0 = 18.0$

$W^- = 1.5+1.5+5.0+5.0+5.0+9.0 = 27.0$

Check: $18.0+27.0 = 45 = 9 \times 10/2$ ✅

$W = \min(18.0, 27.0) = 18.0$

Step 4 — Exact p-value ( $n' = 9$ ):

From exact Wilcoxon tables: $P(W^+ \leq 18) = 0.244$ (one-tail)

Two-tailed: $p = 2 \times 0.244 = .488$ (but for $W^+ = 18$ , close to expected 22.5, so:)

Using symmetry: $p = 2 \times \min[P(W^+ \leq 18), P(W^+ \geq 27)]$

Exact: $p = .490$ (DataStatPro exact computation).

Step 5 — Effect sizes:

$r_{rb} = (18-27)/45 = -9/45 = -0.200$ (small effect, discussion slightly better)

$z_{cc} = (|18-22.5|-0.5)/\sqrt{9\times10\times19/24 - (2\times6+5^3-5)/48}$

$= (4.5-0.5)/\sqrt{71.25-(12+120)/48} = 4.0/\sqrt{71.25-2.75} = 4.0/\sqrt{68.5} = 4.0/8.277 = 0.483$

$r_W = -0.483/\sqrt{9} = -0.483/3 = -0.161$

Hodges-Lehmann estimate: $\hat{\theta} = -1.0$ points

95% CI for $\hat{\theta}$ (exact): $[-4.0, 2.5]$ points (includes 0)

Summary:

Statistic	Value	Interpretation
Silent median	$71.0$ pts
Discussion median	$73.0$ pts
$n' = 9$ (1 zero excluded)
$n^+$ / $n^-$ / $n^0$	$3$ / $6$ / $1$
$W^+$	$18.0$
$W^-$	$27.0$
$p$ (exact, two-tailed)	$.490$	Not significant
HL estimate	$-1.0$ pts	Discussion slightly higher
95% CI for $\hat{\theta}$	$[-4.0, 2.5]$ pts	Includes 0
$r_{rb}$	$-0.200$	Small effect

APA write-up: "A Wilcoxon Signed-Rank Test was conducted to compare comprehension scores under silent reading and guided discussion. One pair with identical scores was excluded ( $n' = 9$ ). There was no significant difference between silent reading (Mdn = 71.0) and guided discussion (Mdn = 73.0), $W^+ = 18$ , $p = .490$ (exact), $r_{rb} = -0.20$ [95% CI: −0.69, 0.41]. The Hodges-Lehmann estimate of the median difference was −1.0 points [95% CI: −4.0, 2.5], indicating a small, non-significant advantage for guided discussion. Given the small sample size ( $n' = 9$ ), this study had limited power to detect small effects (minimum detectable $r_{rb} \approx 0.61$ at 80% power)."

13. Common Mistakes and How to Avoid Them

Mistake 1: Using the Wilcoxon Signed-Rank Test When the Sign Test is More Appropriate

Problem: Applying the Wilcoxon Signed-Rank Test to data where differences can be assessed for direction but not for meaningful magnitude — for example, nominal categories coded as 0/1, or extremely coarse ordinal data with only a few categories. The Wilcoxon test requires that ranking the absolute differences is meaningful; if it is not, the test is invalid.

Solution: When only the direction of change is known (positive or negative), use the Sign Test. When differences can be meaningfully ranked, use the Wilcoxon test. Examine whether the concept of "a difference of 4 being larger than a difference of 2" makes sense for your measurement scale.

Mistake 2: Ignoring the Symmetry Assumption

Problem: Applying the Wilcoxon Signed-Rank Test without checking whether the difference scores are approximately symmetrically distributed about zero. The test assumes symmetry — without it, the p-value conflates location and shape effects. For instance, with right-skewed positive differences, even a true null hypothesis can be rejected because the large positive outliers inflate $W^+$ .

Solution: Always plot a histogram of the difference scores $d_i$ and assess symmetry visually. Compute skewness ( $|z_{skew}| < 2$ ). If differences are severely asymmetric, use the Sign Test or a bootstrap-based test instead.

Mistake 3: Not Reporting the Effective Sample Size $n'$ and the Number of Zeros

Problem: Reporting $n = 20$ pairs but not mentioning that 5 pairs had $d_i = 0$ and were excluded, leaving $n' = 15$ for the analysis. Readers cannot evaluate the precision of the estimate or compare it to power requirements without knowing $n'$ .

Solution: Always report $n$ (total pairs), $n^0$ (zero differences excluded), $n^+$ , $n^-$ , and $n'$ (effective sample size). State explicitly: " $n^0 = 5$ pairs with zero difference were excluded from the analysis, leaving $n' = 15$ ."

Mistake 4: Reporting Only the p-value Without an Effect Size

Problem: Reporting $W^+ = 45$ , $p = .032$ without any effect size measure. The Wilcoxon test statistic $W^+$ is not interpretable without knowing $n'$ , and the p-value conveys nothing about the magnitude of the effect.

Solution: Always report the rank-biserial correlation $r_{rb}$ (or $r_W$ ) with its 95% CI, and the Hodges-Lehmann estimate $\hat{\theta}$ with its 95% CI. These together convey effect magnitude in both standardised and original-units terms.

Mistake 5: Using the Paired t-Test When the Wilcoxon Test is Clearly Needed

Problem: Observing highly non-normal difference scores with extreme outliers in a small sample ( $n < 20$ ) and proceeding with the paired t-test because "it's the standard test." The t-test's p-value may be seriously distorted by even a single extreme outlier in small samples.

Solution: Implement a pre-analysis normality check on difference scores (Shapiro-Wilk). If $p_{SW} < .05$ and $n < 30$ , use the Wilcoxon test as the primary analysis. Run the paired t-test as a sensitivity check and report both results with an explanation of why they may differ.

Mistake 6: Treating a Non-Significant Wilcoxon Result as Evidence of No Difference

Problem: Reporting $W^+ = 32$ , $p = .12$ and concluding "the two conditions do not differ." As with all hypothesis tests, a non-significant result only indicates insufficient evidence to reject $H_0$ — it does not establish equivalence or absence of an effect.

Solution: Report the Hodges-Lehmann estimate and its 95% CI. If the CI is wide, note that the study is underpowered and a meaningful effect may exist but be undetected. For claims of equivalence, use a formal equivalence test (e.g., TOST on the Hodges-Lehmann estimator) with pre-specified bounds.

Mistake 7: Mis-Reporting the Test Statistic

Problem: Confusion about which statistic to report. Different software uses different conventions: some report $W^+$ (sum of positive ranks), some report $W^-$ , some report $W = \min(W^+, W^-)$ , some report a $z$ -statistic, and some report $T$ (an older notation equivalent to $W$ ). Reporting " $W = 12$ " without specifying that this is the minimum (vs. $W^+ = 12$ ) is ambiguous.

Solution: Clearly specify what was reported. DataStatPro reports $W^+$ , $W^-$ , and $W = \min(W^+, W^-)$ separately, and the auto-generated APA paragraph uses the convention $W^+$ = [value] to avoid ambiguity. When reporting, specify: " $W^+ = 45$ (sum of positive ranks)" or " $T = 12$ (Wilcoxon signed-rank statistic, minimum of $W^+$ and $W^-$ )".

Mistake 8: Applying the Test to Data With Too Many Ties Without Addressing Them

Problem: Using Likert scale data where many participants show no change ( $d_i = 0$ ) and many show changes of exactly 1 point (massive ties in $|d_i|$ ). Running the standard Wilcoxon test without the tie correction produces inaccurate p-values, and excluding a large proportion of zeros severely reduces power.

Solution: Report the proportion of zero differences ( $n^0/n$ ). Apply the tie correction to the variance (DataStatPro does this automatically). Consider using Pratt's method when zero differences are informative. If more than 30% of differences are zero or tied, acknowledge the limitation and consider whether the Sign Test or a permutation test is more appropriate.

Mistake 9: Comparing $r_{rb}$ from the Wilcoxon Test Directly with Cohen's $d$ from a t-Test

Problem: Reporting $r_{rb} = 0.40$ from a Wilcoxon test alongside Cohen's $d = 0.60$ from a paired t-test on different but related data, and treating them as equivalent effect sizes. $r_{rb}$ and $d$ are on different scales and are only approximately related through conversion formulas.

Solution: Use the conversion formula $d \approx 2r_{rb}/\sqrt{1-r_{rb}^2}$ for rough comparison, and clearly note the approximation. For direct comparison, use the same effect size metric (e.g., convert both to $r$ ) and acknowledge that the Wilcoxon-based $r_{rb}$ and the t-test-based $d$ measure slightly different aspects of the effect (rank-based vs. mean-based).

Mistake 10: Using the Asymptotic Test When the Exact Test is Available

Problem: With $n' = 12$ non-zero differences and few ties, using the normal approximation to get $p = .048$ when the exact test gives $p = .067$ — and reporting the asymptotic result to achieve significance.

Solution: Always use the exact p-value when $n' \leq 25$ and ties are few. DataStatPro automatically selects the exact test for small samples. Never choose a p-value method post-hoc based on which gives a more favourable result. Pre-specify the method (exact vs. asymptotic) before analysis.

14. Troubleshooting

Problem	Likely Cause	Solution
$W^+ + W^- \neq n'(n'+1)/2$	Arithmetic error in ranking or rank sum computation	Recheck ranks, midranks, and sums; verify $n'$ (zeros excluded)
$W^+ = n'(n'+1)/2$ and $W^- = 0$	All differences are positive (or all negative)	Verify data; if genuine, $p < 2 \times (0.5)^{n'}$ ; compute exact p-value
Exact and asymptotic p-values diverge substantially	Small $n'$ (asymptotic unreliable) or many ties	Use exact p-value for $n' \leq 25$ ; use permutation test if many ties
Many zero differences ( $n^0 > n/4$ )	Coarse measurement scale; many participants show no change	Report $n^0$ explicitly; consider Pratt's method; consider Sign Test; note reduced power
Wilcoxon significant but paired t-test not significant	Outliers in differences inflating $s_d$ (t-test); Wilcoxon more robust	Inspect difference distribution; if outliers present, Wilcoxon result is more reliable
Paired t-test significant but Wilcoxon not significant	Small $n'$ after zero exclusion; t-test using mean which is influenced by extreme values	Inspect closely; if differences are symmetric and normal, t-test is appropriate; if not, Wilcoxon
$r_{rb} = \pm 1.0$	All non-zero differences have the same sign	Perfect effect in the data; report with note that all participants changed in the same direction
95% CI for $\hat{\theta}$ is very wide	Small $n'$	Report wide CI; increase sample size; note low precision
Hodges-Lehmann estimate differs substantially from mean difference	Presence of outliers or skewness in $d_i$	Both are valid but measure different things; HL is the natural companion to the Wilcoxon test
Skewness check suggests asymmetric differences	Data do not meet symmetry assumption	Use Sign Test; report both tests; use bootstrap p-value
Software reports negative $W^+$ or $W^-$	Software error or sign convention confusion	Check software documentation; both $W^+$ and $W^-$ are non-negative by definition
Tie correction produces a negative variance	Extreme number of ties; something is wrong	Check data for coding errors; with excessive ties, use permutation test
One-sample version gives different result from paired version for same data	Check how differences were defined	One-sample tests $x_i$ against $\theta_0$ ; paired tests $d_i = x_{1i}-x_{2i}$ against 0; should be equivalent if $d_i = x_i - \theta_0$
Power is very low despite significant result	Sample size is small; significance is due to extreme effect size, not adequate power	Report sensitivity analysis; note that future replications need larger samples
Cannot determine $\hat{\theta}$ without raw data	Only summary statistics available	$\hat{\theta}$ requires all $d_i$ values; request data or report only $r_{rb}$

15. Quick Reference Cheat Sheet

Core Formulas

Formula	Description
$d_i = x_{1i} - x_{2i}$	Difference score for pair $i$
$n' = n - n^0$	Effective sample size (excluding zeros)
$W^+ = \sum_{\{d_i>0\}} R_i$	Sum of positive ranks
$W^- = \sum_{\{d_i<0\}} R_i$	Sum of negative ranks
$W^+ + W^- = n'(n'+1)/2$	Verification check
$W = \min(W^+, W^-)$	Wilcoxon test statistic
$E[W^+] = n'(n'+1)/4$	Expected $W^+$ under $H_0$
$\text{Var}[W^+] = n'(n'+1)(2n'+1)/24$	Variance of $W^+$ (no ties)
$\text{Var}_{corrected}[W^+] = n'(n'+1)(2n'+1)/24 - \sum_k(t_k^3-t_k)/48$	Variance with tie correction
$z = (W^+ - E[W^+])/\sqrt{\text{Var}_{corrected}[W^+]}$	z-statistic
$z_{cc} = (\lvert W^+ - E[W^+]\rvert - 0.5)/\sqrt{\text{Var}_{corrected}}$	z with continuity correction
$p = 2\times[1-\Phi(\lvert z\rvert)]$	Two-tailed p-value

Effect Size Formulas

Formula	Description
$r_{rb} = (W^+ - W^-)/(n'(n'+1)/2)$	Matched-pairs rank-biserial correlation
$r_{rb} = 1 - 4W^-/(n'(n'+1))$	Alternative formula for $r_{rb}$
$r_W = z/\sqrt{n'}$	Effect size from z-statistic
$\hat{\theta} = \text{Median}\{(d_i+d_j)/2: i\leq j\}$	Hodges-Lehmann pseudo-median
$\widehat{CL} = n^+/n'$	Common Language Effect Size (simple)
$r_{rb} \to d: d \approx 2r_{rb}/\sqrt{1-r_{rb}^2}$	Convert $r_{rb}$ to Cohen's $d$ (approx.)
$d \to r_{rb}: r_{rb} \approx d/\sqrt{d^2+4}$	Convert Cohen's $d$ to $r_{rb}$ (approx.)

Walsh Averages for Hodges-Lehmann CI

$n'$ pairs	$M = n'(n'+1)/2$ Walsh averages
5	15
10	55
15	120
20	210
25	325
30	465
50	1,275

Cohen's Benchmarks for $r_{rb}$ and $r_W$

| $\vert r_{rb} \vert$ or $\vert r_W \vert$ | Label | Approx. $|d_z|$ equiv. | | :--------------------- | :---- | :--------------------- | | $< 0.10$ | Negligible | $< 0.20$ | | $0.10 - 0.29$ | Small | $0.20 - 0.61$ | | $0.30 - 0.49$ | Medium | $0.62 - 1.13$ | | $\geq 0.50$ | Large | $\geq 1.15$ |

ARE Comparison: Wilcoxon vs. Paired t-Test

Data Distribution	ARE	Interpretation
Normal	$3/\pi \approx 0.955$	Wilcoxon needs $\approx$ 5% more pairs
Uniform	$1.000$	Identical efficiency
Logistic	$\pi^2/9 \approx 1.097$	Wilcoxon needs $\approx$ 9% fewer pairs
Laplace	$1.500$	Wilcoxon needs 33% fewer pairs
Contaminated normal	$> 1.500$	Wilcoxon substantially more powerful

Required $n'$ for 80% Power (Two-Tailed $\alpha = .05$ , Normal Data)

$d_z$ equivalent	$r_{rb}$ (approx.)	$n'$ Wilcoxon	$n$ Paired t	Overhead
0.20	0.10	277	264	+5%
0.30	0.15	125	119	+5%
0.50	0.24	46	44	+5%
0.80	0.37	19	18	+6%
1.00	0.45	14	13	+8%
1.20	0.51	10	9	+11%
1.50	0.60	7	7	$\approx$ 0%

Zero and Tie Handling Reference

Situation	Method	Notes
$d_i = 0$ (default)	Wilcoxon: exclude	$n' = n - n^0$ ; report $n^0$
$d_i = 0$ (alternative)	Pratt: include in ranking	Can affect p-value; use when zeros are informative
Tied $	d_i	$
Many ties	Permutation test	Exact handling regardless of tie structure
$n' \leq 25$ , few ties	Exact p-value	Always preferred
$n' > 25$	Asymptotic + continuity correction	Accurate for most situations

Test Selection Guide

Two related conditions, continuous or ordinal DV?
├── Are difference scores normally distributed?
│   (Check: Shapiro-Wilk on d_i, Q-Q plot)
│   ├── YES and n ≥ 15 → Paired t-test (more power)
│   │   (Report Wilcoxon as sensitivity check if desired)
│   └── NO, or n < 30 and Shapiro-Wilk p < .05
│       └── Are differences rankable (magnitudes meaningful)?
│           ├── YES → Wilcoxon Signed-Rank Test ✅
│           │   ├── Are differences symmetric?
│           │   │   ├── YES → Standard Wilcoxon ✅
│           │   │   └── NO → Sign Test or Bootstrap
│           │   └── Many zeros? → Consider Pratt's method
│           └── NO (only direction known) → Sign Test
└── Three or more conditions → Friedman Test

Comparison: Wilcoxon vs. Sign Test vs. Paired t-Test

Property	Paired t-Test	Wilcoxon Signed-Rank	Sign Test
Uses magnitude of differences	✅ Full	✅ Ranks	❌ No
Assumes normality	✅ Yes	❌ No	❌ No
Assumes symmetry	(via normality)	✅ Yes	❌ No
ARE vs. t-test	1.000	0.955	0.637
Robust to outliers	❌ Low	✅ High	✅ Very high
Handles ordinal DV	❌ No	✅ Yes	✅ Yes
Effect size	Cohen's $d_z$	$r_{rb}$ , $r_W$	$p^+$ , $P(d>0)$
Point estimate	$\bar{d}$	Hodges-Lehmann $\hat{\theta}$	Median

APA 7th Edition Reporting Templates

Standard significant result:

"Due to [non-normal difference scores / ordinal measurement scale] (Shapiro-Wilk $W =$ [value], $p =$ [value]), a Wilcoxon Signed-Rank Test was conducted. [Condition 1] (Mdn = [value]) [was / was not] significantly [higher / lower] than [Condition 2] (Mdn = [value]), $W^+ =$ [value], $z =$ [value], $p =$ [value] [(exact)/(asymptotic)]. The Hodges-Lehmann estimate of the median difference was [value] [units] [95% CI: LB, UB], $r_{rb} =$ [value] [95% CI: LB, UB], indicating a [small / medium / large] effect. [ $n^0 =$ [value] pairs with zero difference were excluded, leaving $n' =$ [value] pairs for analysis.]"

Non-significant result:

"A Wilcoxon Signed-Rank Test revealed no significant difference between [Condition 1] (Mdn = [value]) and [Condition 2] (Mdn = [value]), $W^+ =$ [value], $p =$ [value], $r_{rb} =$ [value] [95% CI: LB, UB]. The Hodges-Lehmann estimate was [value] [95% CI: LB, UB], indicating a [small / negligible] effect that the study was insufficiently powered to detect (minimum detectable $r_{rb} \approx$ [value] at 80% power)."

One-sample version:

"A one-sample Wilcoxon Signed-Rank Test was conducted to examine whether the population pseudo-median of [DV] differed from [θ₀]. The sample median of [value] [was / was not] significantly different from [θ₀], $W^+ =$ [value], $p =$ [value], $r_{rb} =$ [value]. The Hodges-Lehmann estimate was [value] [units] from the null value [95% CI: LB, UB]."

Wilcoxon Signed-Rank Test Reporting Checklist

Item	Required
Statement of why Wilcoxon was used (non-normality, ordinal, outliers)	✅ Always
Median for each condition	✅ Always
$W^+$ (and/or $W^-$ or $W = \min$ ) — specify which	✅ Always
z-statistic (if asymptotic)	✅ For $n' > 25$
p-value (exact or asymptotic — specify which)	✅ Always
$n$ (total pairs), $n^0$ (zeros excluded), $n'$ (effective $n$ )	✅ Always
$n^+$ and $n^-$ (positive and negative differences)	✅ Recommended
Whether Wilcoxon or Pratt method used for zeros	✅ When $n^0 > 0$
Whether exact, asymptotic, or permutation p-value used	✅ Always
Tie correction applied	✅ When ties present
$r_{rb}$ (primary effect size) with 95% CI	✅ Always
$r_W$ alongside $r_{rb}$	✅ Recommended
Hodges-Lehmann estimate $\hat{\theta}$ with 95% CI	✅ Always
Symmetry check on difference scores	✅ When $n < 50$
Comparison with paired t-test result (sensitivity)	✅ Recommended
Power or sensitivity analysis	✅ For null results
Domain-specific benchmark context for $r_{rb}$	✅ Recommended

Conversion Formulas: Wilcoxon $\leftrightarrow$ Other Metrics

From	To	Formula
$W^+$ , $n'$	$r_{rb}$	$r_{rb} = (2W^+ - n'(n'+1)/2) / (n'(n'+1)/2)$
$z$ , $n'$	$r_W$	$r_W = z/\sqrt{n'}$
$r_{rb}$	Cohen's $d$ (approx.)	$d \approx 2r_{rb}/\sqrt{1-r_{rb}^2}$
Cohen's $d$	$r_{rb}$ (approx.)	$r_{rb} \approx d/\sqrt{d^2+4}$
$r_W$	Cohen's $d$	$d \approx 2r_W/\sqrt{1-r_W^2}$
$r_{rb}$	$P(d_i > 0)$ (approx.)	$P = (1+r_{rb})/2$
$n_{t\text{-test}}$	$n'_{Wilcoxon}$ (normal data)	$n'_{W} \approx n_t \times \pi/3 \approx 1.047 \times n_t$
$d_z$	Required $n'$ (80% power)	$n' \approx 8.211/d_z^2$

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting the Wilcoxon Signed-Rank Test within the DataStatPro application. For further reading, consult Wilcoxon's original paper "Individual Comparisons by Ranking Methods" (Biometrics Bulletin, 1945); Hollander, Wolfe & Chicken's "Nonparametric Statistical Methods" (3rd ed., 2014) for rigorous mathematical treatment; Conover's "Practical Nonparametric Statistics" (3rd ed., 1999) for applied guidance; Kerby's "The Simple Difference Formula: An Approach to Teaching Nonparametric Correlation" (Comprehensive Psychology, 2014) for the matched-pairs rank-biserial correlation; Field's "Discovering Statistics Using IBM SPSS Statistics" (5th ed., 2018) for accessible applied coverage; and van Doorn et al.'s "Bayesian Inference for Kendall's Rank Correlation Coefficient" (Communications in Statistics, 2018) for the Bayesian extension. For the Hodges-Lehmann estimator and its confidence interval, see Hodges & Lehmann's "Estimates of Location Based on Rank Tests" (Annals of Mathematical Statistics, 1963). For feature requests or support, contact the DataStatPro team.

$i$	Pre-ACT ( $x_{1i}$ )	Post-ACT ( $x_{2i}$ )	$d_i = x_{1i}-x_{2i}$
1	16	9	7
2	12	8	4
3	18	6	12
4	14	11	3
5	20	8	12
6	11	9	2
7	17	14	3
8	15	5	10
9	13	11	2
10	19	10	9
11	16	13	3
12	14	12	2

$i$	Pre-ACT ( $x_{1i}$ )	Post-ACT ( $x_{2i}$ )	$d_i = x_{1i}-x_{2i}$
1	16	9	7
2	12	8	4
3	18	6	12
4	14	11	3
5	20	8	12
6	11	9	2
7	17	14	3
8	15	5	10
9	13	11	2
10	19	10	9
11	16	13	3
12	14	12	2

Wilcoxon Signed Rank Test

Wilcoxon Signed-Rank Test: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 Parametric vs. Non-Parametric Inference

1.2 The Concept of Ranks

1.3 Ordinal, Interval, and Ratio Scales

1.4 The Median as a Measure of Central Tendency

1.5 Signed Ranks: Combining Magnitude and Direction

1.6 The Null and Alternative Hypotheses

1.7 The Asymptotic Relative Efficiency

1.8 Type I Error, Power, and the Role of Sample Size

2. What is the Wilcoxon Signed-Rank Test?

2.1 The Core Idea

2.2 When to Use the Wilcoxon Signed-Rank Test

2.3 The Wilcoxon Signed-Rank Test vs. Related Procedures

2.4 The Wilcoxon Signed-Rank Test vs. the Sign Test

2.5 Two Versions: Paired and One-Sample

3. The Mathematics Behind the Wilcoxon Signed-Rank Test

3.1 Computing Difference Scores

3.2 Handling Zero Differences

3.3 Ranking the Absolute Differences

3.4 Computing the Test Statistics W+W^+W+ and W−W^-W−

3.5 The Test Statistic WWW

3.6 Exact Distribution (Small Samples, n′≤25n' \leq 25n′≤25)

3.7 Normal Approximation (Large Samples, n′>25n' > 25n′>25)

3.8 Tie Correction for the Variance

3.9 The Exact Probability Under H0H_0H0​: Deriving the Null Distribution

3.10 Relationship Between Wilcoxon WWW and the Mann-Whitney UUU

4. Assumptions of the Wilcoxon Signed-Rank Test

4.1 Symmetry of the Difference Score Distribution

4.2 Independence of Pairs

4.3 Continuous (or At Least Ordinal and Rankable) Differences

4.4 Exchangeability Under H0H_0H0​

4.5 Absence of Excessive Ties

4.6 Assumption Summary Table

5. Variants of the Wilcoxon Signed-Rank Test

5.1 Paired Version (Two-Condition Comparison)

5.2 One-Sample Version (Against a Hypothesised Median)

5.3 Exact vs. Approximate (Asymptotic) p-values

5.4 Permutation Version

5.5 Pratt's Method for Zero Differences

6. Using the Wilcoxon Signed-Rank Test Calculator Component

Step-by-Step Guide

7. Full Step-by-Step Procedure

7.1 Complete Computational Procedure

Step 1 — Establish Sign Convention and Compute Difference Scores

Step 2 — Identify and Exclude Zero Differences

Step 3 — Compute Absolute Differences and Check Symmetry

Step 4 — Rank the Absolute Differences

Step 5 — Assign Signed Ranks

Step 6 — Compute the Rank Sums

Step 7 — Compute the p-value

Step 8 — Compute the Hodges-Lehmann Point Estimate

Step 9 — Compute the 95% CI for the Pseudo-Median

Step 10 — Compute Effect Sizes

Step 11 — Interpret and Report

8. Effect Sizes for the Wilcoxon Signed-Rank Test

8.1 The Rank-Biserial Correlation rrbr_{rb}rrb​ — Primary Effect Size

8.2 The rWr_WrW​ Effect Size — From the z-Statistic

8.3 Cohen's Benchmarks for rWr_WrW​ and rrbr_{rb}rrb​

8.4 Converting Between Effect Size Metrics

8.5 The Hodges-Lehmann Estimator as an Effect Size

8.6 The Common Language Effect Size for the Wilcoxon Test

9. Confidence Intervals

9.1 Exact CI for the Hodges-Lehmann Pseudo-Median

9.2 Number of Walsh Averages for Common Sample Sizes

9.3 Interpreting the Hodges-Lehmann CI

9.4 CI for the Effect Size rrbr_{rb}rrb​

9.5 Width of the CI as a Function of Sample Size

10. Power Analysis and Sample Size Planning

10.1 Power of the Wilcoxon Signed-Rank Test

10.2 Required Sample Size for 80% Power (α=.05\alpha = .05α=.05, Two-Tailed)

10.3 Sensitivity Analysis

10.4 Power Advantage Under Non-Normality

11. Advanced Topics

11.1 Comparing the Wilcoxon Signed-Rank Test and the Paired t-Test

11.2 The Sign Test as a Simpler Alternative

11.3 Bootstrap Wilcoxon Test

11.4 Bayesian Non-Parametric Paired Test

3.4 Computing the Test Statistics $W^+$ and $W^-$

3.5 The Test Statistic $W$

3.6 Exact Distribution (Small Samples, $n' \leq 25$ )

3.7 Normal Approximation (Large Samples, $n' > 25$ )

3.9 The Exact Probability Under $H_0$ : Deriving the Null Distribution

3.10 Relationship Between Wilcoxon $W$ and the Mann-Whitney $U$

4.4 Exchangeability Under $H_0$

8.1 The Rank-Biserial Correlation $r_{rb}$ — Primary Effect Size

8.2 The $r_W$ Effect Size — From the z-Statistic

8.3 Cohen's Benchmarks for $r_W$ and $r_{rb}$

9.4 CI for the Effect Size $r_{rb}$

10.2 Required Sample Size for 80% Power ( $\alpha = .05$ , Two-Tailed)

Mistake 3: Not Reporting the Effective Sample Size $n'$ and the Number of Zeros

Mistake 9: Comparing $r_{rb}$ from the Wilcoxon Test Directly with Cohen's $d$ from a t-Test

Cohen's Benchmarks for $r_{rb}$ and $r_W$

Required $n'$ for 80% Power (Two-Tailed $\alpha = .05$ , Normal Data)

Conversion Formulas: Wilcoxon $\leftrightarrow$ Other Metrics

$i$	Pre-ACT ( $x_{1i}$ )	Post-ACT ( $x_{2i}$ )	$d_i = x_{1i}-x_{2i}$
1	16	9	7
2	12	8	4
3	18	6	12
4	14	11	3
5	20	8	12
6	11	9	2
7	17	14	3
8	15	5	10
9	13	11	2
10	19	10	9
11	16	13	3
12	14	12	2