Knowledge Base / One-Sample t-Test Inferential Statistics 38 min read

One-Sample t-Test

Step-by-step guide to conducting one-sample t-tests using DataStatPro.

One-Sample t-Test: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of single-group inference all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are encountering the one-sample t-test for the first time or deepening your understanding of comparing a sample to a known standard, this guide builds your knowledge systematically from the ground up.


Table of Contents

  1. Prerequisites and Background Concepts
  2. What is a One-Sample t-Test?
  3. The Mathematics Behind the One-Sample t-Test
  4. Assumptions of the One-Sample t-Test
  5. Variants of the One-Sample t-Test
  6. Using the One-Sample t-Test Calculator Component
  7. Step-by-Step Procedure
  8. Interpreting the Output
  9. Effect Sizes for the One-Sample t-Test
  10. Confidence Intervals
  11. Advanced Topics
  12. Worked Examples
  13. Common Mistakes and How to Avoid Them
  14. Troubleshooting
  15. Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into the one-sample t-test, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 The Concept of Statistical Inference

Statistical inference is the process of drawing conclusions about a population from a sample. In the one-sample t-test context:

1.2 The Null and Alternative Hypotheses

Every t-test operates within the hypothesis testing framework:

1.3 The Standard Error of the Mean

When we draw a sample of size nn from a population with standard deviation σ\sigma, the sample mean xˉ\bar{x} varies from sample to sample. The standard error of the mean (SEM) quantifies this variability:

SExˉ=σnSE_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

In practice, σ\sigma is unknown and estimated by the sample standard deviation ss:

SE^xˉ=sn\widehat{SE}_{\bar{x}} = \frac{s}{\sqrt{n}}

A larger nn produces a smaller SEM, meaning our estimate of μ\mu becomes more precise as sample size increases. This is why large samples detect even trivially small deviations from μ0\mu_0.

1.4 The t-Distribution

When the population σ\sigma is unknown and estimated from the data, the test statistic does not follow the standard normal distribution — it follows the t-distribution with n1n - 1 degrees of freedom (df).

The t-distribution is:

The heavier tails reflect additional uncertainty from estimating σ\sigma with ss — particularly important in small samples.

1.5 The p-Value and Significance Level

The p-value is the probability of obtaining a test statistic at least as extreme as the observed value, assuming H0H_0 is true. It answers: "How surprising is this sample result if the null hypothesis were correct?"

The significance level α\alpha (conventionally .05.05) is the threshold below which we consider the result sufficiently surprising to reject H0H_0.

⚠️ A small p-value does NOT mean the null is false, the effect is large, or the finding is important. It only means the data are inconsistent with H0H_0 at level α\alpha. Always accompany p-values with effect sizes and confidence intervals.

1.6 The Central Limit Theorem

The Central Limit Theorem (CLT) states that for sufficiently large nn, the sampling distribution of xˉ\bar{x} is approximately normal regardless of the shape of the population distribution:

xˉN ⁣(μ,σ2n)\bar{x} \approx \mathcal{N}\!\left(\mu, \frac{\sigma^2}{n}\right)

This guarantees that the one-sample t-test is robust to non-normality for large samples (generally n30n \geq 30). For smaller samples, the normality of the population itself is important.

1.7 Point Estimates and Interval Estimates

A 95% CI means: if we repeated the study many times, 95% of the resulting intervals would contain the true μ\mu. CIs communicate both the location and precision of the estimate — always report them alongside the t-test result.


2. What is a One-Sample t-Test?

2.1 The Core Question

The one-sample t-test is a parametric inferential test that determines whether the mean of a single sample differs significantly from a known, hypothesised, or theoretically meaningful population value μ0\mu_0.

Unlike two-sample tests that compare two groups, the one-sample t-test compares one group to a fixed reference point. The reference point μ0\mu_0 is not estimated from the data — it is specified in advance based on:

2.2 The General Logic

The test measures how far the sample mean xˉ\bar{x} is from μ0\mu_0, standardised by the estimated standard error of the mean:

t=Observed departure from H0Expected random departure under H0=xˉμ0s/nt = \frac{\text{Observed departure from } H_0}{\text{Expected random departure under } H_0} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}

A large t|t| indicates that the sample mean is many standard error units away from μ0\mu_0 — unlikely to occur by chance if H0H_0 is true.

2.3 When to Use the One-Sample t-Test

The one-sample t-test is appropriate when:

ConditionRequirement
Research designSingle sample compared to a fixed standard
Outcome variableContinuous (interval or ratio scale)
DistributionApproximately normal (or n30n \geq 30 via CLT)
Reference valueKnown μ0\mu_0, specified before data collection
ObservationsIndependent of each other

2.4 Real-World Applications

FieldResearch Questionμ0\mu_0
Clinical PsychologyDoes a clinical sample's depression score differ from the population norm?Published PHQ-9 norm
Cognitive NeuroscienceIs the reaction time of ADHD patients different from the normative 250 ms?250250 ms
EducationDoes the class mean exam score differ from the national average of 70%?7070
Quality ControlDoes the mean tablet weight differ from the target of 500 mg?500500 mg
Sport ScienceDoes the team's mean VO2_2 max differ from elite athlete norms?Published norm
NutritionDoes average daily caloric intake differ from the recommended 2000 kcal?20002000 kcal
FinanceDoes the mean return of a fund differ from the benchmark return of 8%?8%8\%
Public HealthIs mean blood pressure in a community sample different from the clinical threshold?120120 mmHg

2.5 Distinguishing from Related Tests

SituationCorrect Test
One sample vs. known valueOne-sample t-test
Two independent groupsIndependent samples t-test
Two related measurementsPaired samples t-test
One sample, non-normal, small nnWilcoxon signed-rank test (one-sample)
Proportion vs. known valueOne-proportion z-test
Variance vs. known valueChi-squared test for variance

3. The Mathematics Behind the One-Sample t-Test

3.1 The t-Statistic

Given a sample of nn observations x1,x2,,xnx_1, x_2, \ldots, x_n drawn from a population with unknown mean μ\mu and unknown standard deviation σ\sigma, the one-sample t-statistic is:

t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}

Where:

xˉ=1ni=1nxi(sample mean)\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i \quad \text{(sample mean)}

s=1n1i=1n(xixˉ)2(sample standard deviation)s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2} \quad \text{(sample standard deviation)}

SE=sn(estimated standard error of the mean)SE = \frac{s}{\sqrt{n}} \quad \text{(estimated standard error of the mean)}

μ0(hypothesised population mean)\mu_0 \quad \text{(hypothesised population mean)}

Under H0:μ=μ0H_0: \mu = \mu_0, the statistic tt follows a t-distribution with ν=n1\nu = n-1 degrees of freedom.

3.2 Degrees of Freedom

The degrees of freedom ν=n1\nu = n - 1 represent the number of independent pieces of information available to estimate the standard deviation. We lose 1 degree of freedom because computing ss requires first estimating xˉ\bar{x} from the same data.

Smaller ν\nu → heavier tails → more conservative critical values → harder to achieve significance. This appropriately penalises small samples for the additional uncertainty in estimating σ\sigma.

3.3 Computing the p-Value

The p-value is computed from the cumulative distribution function (CDF) Ft,νF_{t,\nu} of the t-distribution:

Two-tailed (default):

p=2×P(Tνtobs)=2×[1Ft,ν(tobs)]p = 2 \times P(T_\nu \geq |t_{obs}|) = 2 \times [1 - F_{t,\nu}(|t_{obs}|)]

Upper one-tailed (H1:μ>μ0H_1: \mu > \mu_0):

p=P(Tνtobs)=1Ft,ν(tobs)p = P(T_\nu \geq t_{obs}) = 1 - F_{t,\nu}(t_{obs})

Lower one-tailed (H1:μ<μ0H_1: \mu < \mu_0):

p=P(Tνtobs)=Ft,ν(tobs)p = P(T_\nu \leq t_{obs}) = F_{t,\nu}(t_{obs})

3.4 Critical Values

The decision rule compares the observed t|t| to the critical value tcritt_{crit}:

tcrit=tα/2,  n1t_{crit} = t_{\alpha/2,\; n-1} (two-tailed)

Reject H0H_0 if tobstcrit|t_{obs}| \geq t_{crit}.

Common critical values (tα/2,  νt_{\alpha/2,\; \nu}, two-tailed α=.05\alpha = .05):

nnν\nutcritt_{crit}
542.776
1092.262
15142.145
20192.093
30292.045
50492.010
100991.984
\infty\infty1.960

3.5 The 95% Confidence Interval for μ\mu

The (1α)×100%(1-\alpha) \times 100\% CI for the population mean μ\mu is:

xˉ±tα/2,  n1sn\bar{x} \pm t_{\alpha/2,\; n-1} \cdot \frac{s}{\sqrt{n}}

This interval is dual to the hypothesis test: H0:μ=μ0H_0: \mu = \mu_0 is rejected at level α\alpha if and only if μ0\mu_0 falls outside the (1α)×100%(1-\alpha) \times 100\% CI.

3.6 Cohen's dd — Effect Size for the One-Sample t-Test

The standardised effect size expresses how many standard deviation units the sample mean departs from μ0\mu_0:

d=xˉμ0sd = \frac{\bar{x} - \mu_0}{s}

This is directly analogous to a z-score, but standardised by the sample SD rather than the population SD.

Note that dd and tt are related:

t=dn    d=tnt = d\sqrt{n} \implies d = \frac{t}{\sqrt{n}}

This relationship reveals that tt is a joint function of effect size (dd) and sample size (nn). A small dd can produce a large tt with a large enough nn.

3.7 Hedges' gg — Bias-Corrected Effect Size

Cohen's dd is slightly positively biased (overestimates the true effect) in small samples. Hedges' gg applies a correction factor JJ:

g=d×J,J=134(n1)1g = d \times J, \qquad J = 1 - \frac{3}{4(n-1) - 1}

More precisely:

J=Γ((n1)/2)(n1)/2Γ((n2)/2)J = \frac{\Gamma((n-1)/2)}{\sqrt{(n-1)/2} \cdot \Gamma((n-2)/2)}

The correction is negligible for n>20n > 20 but can be substantial (>5%> 5\%) for n<10n < 10. Hedges' gg is the preferred effect size for small samples and meta-analysis.

3.8 Exact Confidence Interval for dd via the Non-Central t-Distribution

Under H1H_1, the t-statistic follows a non-central t-distribution with non-centrality parameter:

λ=dn\lambda = d\sqrt{n}

The exact 95% CI for dd inverts this relationship numerically: find λL\lambda_L and λU\lambda_U such that:

P(Tn1(λL)tobs)=0.025andP(Tn1(λU)tobs)=0.025P(T_{n-1}(\lambda_L) \geq t_{obs}) = 0.025 \quad \text{and} \quad P(T_{n-1}(\lambda_U) \leq t_{obs}) = 0.025

Then:

dL=λLn,dU=λUnd_L = \frac{\lambda_L}{\sqrt{n}}, \qquad d_U = \frac{\lambda_U}{\sqrt{n}}

An approximate 95% CI (adequate for n20n \geq 20):

d±1.96×SEd,SEd1n+d22(n1)d \pm 1.96 \times SE_d, \qquad SE_d \approx \sqrt{\frac{1}{n} + \frac{d^2}{2(n-1)}}

3.9 Statistical Power

Power is the probability of correctly rejecting H0H_0 when a true effect of size dd exists:

Power=P ⁣(Tn1(λ)>tcritλ=dn)\text{Power} = P\!\left(T_{n-1}(\lambda) > t_{crit} \mid \lambda = d\sqrt{n}\right)

Required sample size for desired power 1β1-\beta at two-sided α\alpha:

n(z1α/2+z1β)2d2+z1α/222n \approx \frac{(z_{1-\alpha/2} + z_{1-\beta})^2}{d^2} + \frac{z_{1-\alpha/2}^2}{2}

For α=.05\alpha = .05 and power =0.80= 0.80: (1.96+0.84)2=7.84(1.96 + 0.84)^2 = 7.84, so n7.84/d2n \approx 7.84/d^2.

Required nn for common effect sizes:

Cohen's ddPower = 0.80Power = 0.90Power = 0.95
0.20 (small)198264327
0.50 (medium)334454
0.80 (large)141822
1.0091215
1.50579

4. Assumptions of the One-Sample t-Test

4.1 Normality of the Data (or Sampling Distribution)

The one-sample t-test assumes that either:

How to check:

MethodDetails
Shapiro-Wilk testMost powerful normality test for n<50n < 50. H0H_0: data are normal. p>.05p > .05 → no evidence of non-normality
Kolmogorov-SmirnovFor n50n \geq 50, use Lilliefors correction
Q-Q plotPlot sample quantiles vs. theoretical normal quantiles; points should fall on the diagonal
Histogram + densityShould be approximately bell-shaped
Skewness$
Kurtosis$

Robustness: The t-test is remarkably robust to mild-to-moderate non-normality, especially for n15n \geq 15. Symmetric non-normal distributions cause few problems even for small nn. Severe skewness with small nn is the primary concern.

When violated: Use the Wilcoxon signed-rank test (one-sample version) as the non-parametric alternative. Consider log or square-root transformation for right-skewed data.

4.2 Independence of Observations

Each observation must be independent — the value of one participant's score must not influence another's. This is a design assumption, not testable statistically.

Common violations:

When violated: Use mixed models or multilevel approaches that explicitly model the dependency structure.

4.3 Interval or Ratio Scale of Measurement

The dependent variable must be measured on at least an interval scale — that is, the numerical differences between values must be meaningful and equal.

When violated: If the data are ordinal (ranks, Likert items treated as ordinal), use the Wilcoxon signed-rank test. Continuous but severely non-normal data may also warrant a non-parametric approach.

4.4 The Reference Value μ0\mu_0 Must Be Pre-Specified

The hypothesised value μ0\mu_0 must be specified before examining the data. Choosing μ0\mu_0 based on the observed xˉ\bar{x} (e.g., setting μ0=xˉ\mu_0 = \bar{x} from a pilot study and then testing the same data) is circular and invalidates the test.

4.5 No Extreme Outliers

Extreme outliers distort both the mean (xˉ\bar{x}) and the standard deviation (ss), potentially inflating or deflating the t-statistic.

How to check:

When outliers present: Investigate the cause (data entry error? valid extreme value?). Report analyses with and without the outlier. Consider the trimmed mean t-test or Wilcoxon signed-rank as robust alternatives.

4.6 Assumption Summary

AssumptionHow to CheckRemedy if Violated
NormalityShapiro-Wilk, Q-Q plotWilcoxon signed-rank; transform data
IndependenceDesign reviewMixed models
Interval scaleMeasurement theoryWilcoxon signed-rank
Pre-specified μ0\mu_0Research protocolRe-specify with new data
No severe outliersBoxplots, zz-scoresInvestigate; trimmed mean t-test

5. Variants of the One-Sample t-Test

5.1 Standard One-Sample t-Test

The classic form described throughout this tutorial: compare xˉ\bar{x} to a fixed value μ0\mu_0 assuming approximately normal data.

5.2 One-Sample Wilcoxon Signed-Rank Test

The non-parametric alternative when normality cannot be assumed. Tests whether the population median (not mean) equals θ0\theta_0. Procedure:

  1. Compute di=xiθ0d_i = x_i - \theta_0 for each observation.
  2. Remove di=0d_i = 0; let nn' = number of non-zero differences.
  3. Rank di|d_i| from 1 to nn'.
  4. Compute W+=di>0RiW^+ = \sum_{d_i > 0} R_i and W=di<0RiW^- = \sum_{d_i < 0} R_i.
  5. Test statistic: W=min(W+,W)W = \min(W^+, W^-) (or use W+W^+ with the zz-approximation).

Effect size: rW=z/nr_W = z/\sqrt{n'}

5.3 z-Test for the Mean (Known σ\sigma)

When the population standard deviation σ\sigma is known (rare in practice), use the one-sample z-test instead:

z=xˉμ0σ/nz = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}

This follows the standard normal distribution exactly, without the need for the t-distribution. This situation arises in standardised testing (where σ\sigma is known from large normative samples) or simulation studies.

5.4 Equivalence Testing (One-Sample TOST)

Rather than testing H0:μ=μ0H_0: \mu = \mu_0, equivalence testing asks whether μ\mu is close enough to μ0\mu_0 to be considered practically equivalent. The Two One-Sided Tests (TOST) procedure:

Specify equivalence bounds [μ0Δ,μ0+Δ][\mu_0 - \Delta, \mu_0 + \Delta] (e.g., Δ=5\Delta = 5 units).

Test both:

Equivalence is concluded when both null hypotheses are rejected — equivalently, when the 90% CI for xˉ\bar{x} falls entirely within (μ0Δ,μ0+Δ)(\mu_0 - \Delta, \mu_0 + \Delta).

5.5 Bayesian One-Sample t-Test

The Bayesian one-sample t-test computes a Bayes Factor BF10BF_{10} quantifying evidence for H1H_1 (effect exists) vs. H0H_0 (no effect):

BF10=P(dataH1)P(dataH0)BF_{10} = \frac{P(\text{data} \mid H_1)}{P(\text{data} \mid H_0)}

Under the Rouder et al. (2009) default prior (δCauchy(0,r=2/2)\delta \sim \text{Cauchy}(0, r = \sqrt{2}/2)):

BF10BF_{10} can be computed from tt and nn numerically. BF10>3BF_{10} > 3 indicates moderate evidence for H1H_1; BF10<1/3BF_{10} < 1/3 indicates moderate evidence for H0H_0.

5.6 Trimmed Mean t-Test (Robust Variant)

When outliers are present, the trimmed mean t-test uses the α\alpha-trimmed mean (removing the top and bottom α\alpha proportion of observations) rather than the arithmetic mean. With 20% trimming:

xˉt=1n2hi=h+1nhx(i),h=0.20n\bar{x}_t = \frac{1}{n-2h}\sum_{i=h+1}^{n-h} x_{(i)}, \quad h = \lfloor 0.20n \rfloor

The test statistic uses the Winsorised standard deviation. This is substantially more robust to outliers and heavy tails while retaining reasonable power.


6. Using the One-Sample t-Test Calculator Component

The One-Sample t-Test Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, and reporting one-sample tests.

Step-by-Step Guide

Step 1 — Select the Test

Navigate to Statistical Tests → t-Tests → One-Sample t-Test.

Step 2 — Input Method

Choose how to provide data:

Step 3 — Specify the Hypothesised Value

Enter μ0\mu_0 — the value you are testing against. Default is 00. Common values:

Step 4 — Select the Alternative Hypothesis

⚠️ One-tailed tests require a strong, pre-registered directional prediction. Selecting one-tailed post-hoc to achieve significance is p-hacking.

Step 5 — Set Significance Level and Confidence Level

Default: α=.05\alpha = .05, 95% CI. DataStatPro also displays results for α=.01\alpha = .01 and α=.001\alpha = .001 simultaneously.

Step 6 — Select Display Options

Step 7 — Run the Analysis

Click "Run One-Sample t-Test". DataStatPro will:

  1. Compute xˉ\bar{x}, ss, SESE, tt, ν\nu, and exact p-value.
  2. Construct the 95% CI for μ\mu.
  3. Compute Cohen's dd and Hedges' gg with exact CIs.
  4. Run Shapiro-Wilk normality test and flag outliers.
  5. Generate all selected visualisations.
  6. Output an APA-compliant results paragraph.

7. Step-by-Step Procedure

7.1 Full Manual Procedure

Step 1 — State the Hypotheses

H0:μ=μ0H1:μμ0H_0: \mu = \mu_0 \qquad H_1: \mu \neq \mu_0 (two-tailed)

Specify μ0\mu_0 based on theory, norms, or a substantive threshold.

Step 2 — Check Assumptions

Step 3 — Compute Summary Statistics

n=sample sizen = \text{sample size}

xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i

s=1n1i=1n(xixˉ)2s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i - \bar{x})^2}

SE=snSE = \frac{s}{\sqrt{n}}

Step 4 — Compute the t-Statistic

t=xˉμ0SE=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{SE} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}

Step 5 — Determine Degrees of Freedom

ν=n1\nu = n - 1

Step 6 — Compute the p-Value

p=2×P(Tνt)p = 2 \times P(T_\nu \geq |t|) (two-tailed)

Compare to α\alpha. Reject H0H_0 if pαp \leq \alpha.

Step 7 — Construct the 95% CI for μ\mu

Find tα/2,  n1t_{\alpha/2,\; n-1} (e.g., t.025,  n1t_{.025,\; n-1} for 95% CI):

xˉ±tα/2,  n1×SE\bar{x} \pm t_{\alpha/2,\; n-1} \times SE

Step 8 — Compute Effect Size

d=xˉμ0sd = \frac{\bar{x} - \mu_0}{s}

Hedges' gg:

g=d×(134(n1)1)g = d \times \left(1 - \frac{3}{4(n-1) - 1}\right)

Step 9 — Compute Approximate 95% CI for dd

SEd=1n+d22(n1)SE_d = \sqrt{\frac{1}{n} + \frac{d^2}{2(n-1)}}

d±1.96×SEdd \pm 1.96 \times SE_d

Step 10 — Interpret and Report

Use the APA reporting template in Section 15. Always report tt, ν\nu, pp, xˉ\bar{x}, ss, 95% CI for μ\mu, Cohen's dd or Hedges' gg, and 95% CI for the effect size.


8. Interpreting the Output

8.1 The t-Statistic

t\vert t \vert Relative to tcritt_{crit}Interpretation
t<tcrit\vert t \vert < t_{crit}Fail to reject H0H_0; result not significant at α\alpha
ttcrit\vert t \vert \geq t_{crit}Reject H0H_0; result significant at α\alpha
Large t\vert t \vert with large nnCan be significant even for tiny dd
Small t\vert t \vert with small nnMay be non-significant even for large dd (low power)

8.2 The p-Value

p-ValueConventional Interpretation
p>.10p > .10No evidence against H0H_0
.05<p.10.05 < p \leq .10Marginal evidence (trend)
.01<p.05.01 < p \leq .05Significant at α=.05\alpha = .05
.001<p.01.001 < p \leq .01Significant at α=.01\alpha = .01
p.001p \leq .001Significant at α=.001\alpha = .001

⚠️ These thresholds are arbitrary conventions, not natural boundaries. A result with p=.049p = .049 is not meaningfully more "significant" than p=.051p = .051. Focus on effect sizes and CIs, not arbitrary thresholds.

8.3 The 95% Confidence Interval

CI OutcomeInterpretation
CI excludes μ0\mu_0Reject H0H_0; μ\mu plausibly differs from μ0\mu_0
CI includes μ0\mu_0Fail to reject H0H_0
Narrow CIPrecise estimate of μ\mu; adequate sample size
Wide CIImprecise estimate; consider increasing nn

8.4 Cohen's dd — Magnitude Interpretation

Cohen's (1988) benchmarks:

d\vert d \vertVerbal LabelOverlap Between Distributions
0.000.00No effect100%100\%
0.200.20Small85%85\%
0.500.50Medium67%67\%
0.800.80Large53%53\%
1.201.20Very large40%40\%
2.002.00Huge18%18\%

Sawilowsky's (2009) extended benchmarks:

d\vert d \vertLabel
<0.10< 0.10Tiny
0.100.190.10 - 0.19Very small
0.200.490.20 - 0.49Small
0.500.790.50 - 0.79Medium
0.801.190.80 - 1.19Large
1.201.991.20 - 1.99Very large
2.00\geq 2.00Huge

⚠️ Cohen himself warned against mechanical application of these benchmarks. They were "offered as conventions of last resort." Always contextualise effect sizes within your specific research domain and compare to typical effect sizes in your field.


9. Effect Sizes for the One-Sample t-Test

9.1 Cohen's dd (One-Sample)

d=xˉμ0sd = \frac{\bar{x} - \mu_0}{s}

Interpretation: The sample mean is dd standard deviations above (positive) or below (negative) the hypothesised value μ0\mu_0.

9.2 Hedges' gg

g=d×(134(n1)1)g = d \times \left(1 - \frac{3}{4(n-1)-1}\right)

Preferred over dd for small samples (n<20n < 20) and meta-analysis.

9.3 Point-Biserial Correlation rr

r=t2t2+ν=t2t2+n1r = \sqrt{\frac{t^2}{t^2 + \nu}} = \sqrt{\frac{t^2}{t^2 + n - 1}}

Equivalent to the correlation between a binary "is vs. norm" variable and the continuous outcome. Ranges from 0 to 1; no directionality.

Convert to dd: d=2r/1r2d = 2r/\sqrt{1-r^2} (assuming equal split)

9.4 Common Language Effect Size (CL)

The probability that a randomly selected individual from the population has a score above μ0\mu_0:

CL=Φ ⁣(d1+1/n)Φ(d)CL = \Phi\!\left(\frac{d}{\sqrt{1 + 1/n}} \right) \approx \Phi(d) for large nn

Where Φ\Phi is the standard normal CDF.

CL=0.50CL = 0.50 → no effect; CL=0.84CL = 0.84d1.00d \approx 1.00; CL=0.69CL = 0.69d0.50d \approx 0.50.

9.5 Effect Size Summary Table

Effect SizeFormulaRangeInterpretation
Cohen's dd(xˉμ0)/s(\bar{x}-\mu_0)/s(,)(-\infty, \infty)SD units above/below μ0\mu_0
Hedges' ggd×Jd \times J(,)(-\infty, \infty)Bias-corrected; preferred for small nn
rrt2/(t2+ν)\sqrt{t^2/(t^2+\nu)}[0,1][0, 1]Correlation-like; no direction
CLΦ(d)\Phi(d) (approx)[0,1][0, 1]Prob. of exceeding μ0\mu_0

10. Confidence Intervals

10.1 CI for the Population Mean μ\mu

CIμ=xˉ±tα/2,  n1sn\text{CI}_\mu = \bar{x} \pm t_{\alpha/2,\; n-1} \cdot \frac{s}{\sqrt{n}}

This directly addresses the primary research question by providing a range of plausible values for the true population mean.

10.2 CI Width as a Function of Sample Size

CI Width=2×tα/2,  n1×sn2×1.96×sn\text{CI Width} = 2 \times t_{\alpha/2,\; n-1} \times \frac{s}{\sqrt{n}} \approx \frac{2 \times 1.96 \times s}{\sqrt{n}}

For s=10s = 10:

nnApprox CI WidthInterpretation
517.517.5Very imprecise
1012.412.4Imprecise
208.88.8Moderate
505.55.5Good
1003.93.9High precision
2002.82.8Very high precision

10.3 CI for Cohen's dd

Approximate (adequate for n20n \geq 20):

d±1.96×1n+d22(n1)d \pm 1.96 \times \sqrt{\frac{1}{n} + \frac{d^2}{2(n-1)}}

Exact: Uses the non-central t-distribution (computed automatically by DataStatPro).

10.4 Relationship Between CI and Hypothesis Test

The CI and two-tailed hypothesis test are algebraically equivalent:


11. Advanced Topics

11.1 Multiple One-Sample Tests on the Same Dataset

When several one-sample t-tests are conducted on data from the same participants (e.g., testing each of 10 subscales against their respective norms), the familywise error rate inflates:

FWER=1(1α)kFWER = 1 - (1-\alpha)^k

For k=10k = 10 tests: FWER=1(0.95)10=.401FWER = 1 - (0.95)^{10} = .401.

Correction strategies:

Report all tests with both original and adjusted p-values.

11.2 Sensitivity Analysis: Minimum Detectable Effect

Given a fixed sample size nn, the minimum detectable effect (MDE) at power 80% and α=.05\alpha = .05:

dmin=z1α/2+z1βn=1.96+0.84n=2.80nd_{min} = \frac{z_{1-\alpha/2} + z_{1-\beta}}{\sqrt{n}} = \frac{1.96 + 0.84}{\sqrt{n}} = \frac{2.80}{\sqrt{n}}

nndmind_{min} (80% power)
100.885
200.626
300.511
500.396
1000.280
2000.198

If dmind_{min} exceeds the smallest effect of practical interest, the study is adequately powered. If not, acknowledge that the study may miss practically important effects.

11.3 Comparing the One-Sample t-Test to the Paired t-Test

The paired t-test (Section on Paired t-Test) is mathematically equivalent to a one- sample t-test applied to the difference scores di=x1ix2id_i = x_{1i} - x_{2i}, testing H0:μd=0H_0: \mu_d = 0. Understanding this equivalence clarifies when each is appropriate:

11.4 Bayesian One-Sample t-Test

The Bayes Factor BF10BF_{10} from the Rouder et al. (2009) default prior (r=2/2r = \sqrt{2}/2):

BF10(t,n)=0(1+nδ2/(2ν))(ν+1)/2fCauchy(δ;r)dδ(1+t2/ν)(ν+1)/2BF_{10}(t, n) = \frac{\int_0^\infty (1+n\delta^2/(2\nu))^{-(\nu+1)/2} f_{Cauchy}(\delta;r)\,d\delta}{(1+t^2/\nu)^{-(\nu+1)/2}}

This integral has no closed form but is computed numerically by DataStatPro.

Interpreting BF10BF_{10}:

BF10BF_{10}Evidence for H1H_1 over H0H_0
>100> 100Extreme
3010030 - 100Very strong
103010 - 30Strong
3103 - 10Moderate
131 - 3Anecdotal
11No evidence
<1< 1Evidence for H0H_0

Key advantage: BF10<1/3BF_{10} < 1/3 provides positive evidence for the null hypothesis — something p-values cannot do.

11.5 TOST Equivalence Testing for the One-Sample t-Test

To establish that μ\mu is practically equivalent to μ0\mu_0 (e.g., that a new scale yields scores equivalent to an established norm):

  1. Specify Δ\Delta (the equivalence margin — the maximum acceptable deviation from μ0\mu_0).
  2. Test H01:μμ0ΔH_{01}: \mu \leq \mu_0 - \Delta with upper one-tailed t-test.
  3. Test H02:μμ0+ΔH_{02}: \mu \geq \mu_0 + \Delta with lower one-tailed t-test.
  4. Equivalence concluded if both tests are significant (i.e., 90% CI for xˉ\bar{x} falls within [μ0Δ,μ0+Δ][\mu_0 - \Delta, \mu_0 + \Delta]).

12. Worked Examples

Example 1: IQ in a Clinical Sample

A neuropsychologist measures IQ scores (n=25n = 25) in adults diagnosed with early-stage Alzheimer's disease. The population mean for neurotypical adults is μ0=100\mu_0 = 100.

Data summary: n=25n = 25, xˉ=88.4\bar{x} = 88.4, s=14.2s = 14.2

Step 1 — Hypotheses: H0:μ=100H_0: \mu = 100 vs. H1:μ100H_1: \mu \neq 100

Step 2 — Standard error:

SE=14.2/25=14.2/5=2.84SE = 14.2/\sqrt{25} = 14.2/5 = 2.84

Step 3 — t-statistic:

t=(88.4100)/2.84=11.6/2.84=4.085t = (88.4 - 100)/2.84 = -11.6/2.84 = -4.085

Step 4 — df and p-value:

ν=24\nu = 24; p=2×P(T244.085)=2×.00022=.0004p = 2 \times P(T_{24} \geq 4.085) = 2 \times .00022 = .0004

Step 5 — 95% CI for μ\mu:

t.025,24=2.064t_{.025,24} = 2.064

88.4±2.064×2.84=88.4±5.86=[82.5,94.3]88.4 \pm 2.064 \times 2.84 = 88.4 \pm 5.86 = [82.5, 94.3]

Step 6 — Cohen's dd:

d=(88.4100)/14.2=11.6/14.2=0.817d = (88.4 - 100)/14.2 = -11.6/14.2 = -0.817; d=0.817|d| = 0.817

Hedges' gg:

g=0.817×(13/(4×241))=0.817×(13/95)=0.817×0.968=0.791g = 0.817 \times (1 - 3/(4 \times 24 - 1)) = 0.817 \times (1 - 3/95) = 0.817 \times 0.968 = 0.791

95% CI for dd (approximate):

SEd=1/25+0.8172/48=0.040+0.0139=0.0539=0.232SE_d = \sqrt{1/25 + 0.817^2/48} = \sqrt{0.040 + 0.0139} = \sqrt{0.0539} = 0.232

d|d| 95% CI: 0.817±1.96(0.232)=[0.362,1.272]0.817 \pm 1.96(0.232) = [0.362, 1.272]

Summary:

StatisticValueInterpretation
t(24)t(24)4.085-4.085
pp (two-tailed).0004.0004Highly significant
xˉ\bar{x}88.488.411.611.6 points below norm
95% CI for μ\mu[82.5,94.3][82.5, 94.3]Excludes 100
Cohen's dd0.817-0.817Large effect
95% CI for dd[1.272,0.362][-1.272, -0.362]Entirely below zero
Hedges' gg0.791-0.791Large (bias-corrected)

APA write-up: "A one-sample t-test revealed that the Alzheimer's group (M=88.4M = 88.4, SD=14.2SD = 14.2, n=25n = 25) had significantly lower IQ scores than the normative mean of 100, t(24)=4.09t(24) = -4.09, p=.0004p = .0004, d=0.82d = -0.82 [95% CI: 1.27-1.27, 0.36-0.36]. The 95% CI for the mean, [82.5,94.3][82.5, 94.3], excluded the normative value. This represents a large deviation from the normative standard."


Example 2: Quality Control — Tablet Weight

A pharmaceutical quality control analyst measures the weight (mg) of n=40n = 40 tablets from a production batch. The target weight is μ0=500\mu_0 = 500 mg.

Data summary: n=40n = 40, xˉ=497.3\bar{x} = 497.3 mg, s=6.8s = 6.8 mg

Step 1 — Hypotheses: H0:μ=500H_0: \mu = 500 vs. H1:μ500H_1: \mu \neq 500

Step 2 — SE and t:

SE=6.8/40=6.8/6.325=1.075SE = 6.8/\sqrt{40} = 6.8/6.325 = 1.075

t=(497.3500)/1.075=2.7/1.075=2.512t = (497.3 - 500)/1.075 = -2.7/1.075 = -2.512

Step 3 — df and p-value:

ν=39\nu = 39; p=2×P(T392.512)=.016p = 2 \times P(T_{39} \geq 2.512) = .016

Step 4 — 95% CI:

t.025,39=2.023t_{.025,39} = 2.023

497.3±2.023×1.075=497.3±2.175=[495.1,499.5]497.3 \pm 2.023 \times 1.075 = 497.3 \pm 2.175 = [495.1, 499.5]

Step 5 — Effect size:

d=(497.3500)/6.8=0.397d = (497.3 - 500)/6.8 = -0.397 (small-medium effect)

Step 6 — Equivalence test:

Regulatory limit: ±5\pm 5 mg acceptable. Test whether μ[495,505]\mu \in [495, 505].

90% CI for xˉ\bar{x}: 497.3±1.685×1.075=[495.5,499.1]497.3 \pm 1.685 \times 1.075 = [495.5, 499.1]

Since [495.5,499.1][495,505][495.5, 499.1] \subset [495, 505]: the batch is within equivalence bounds even though it differs significantly from exactly 500 mg.

Interpretation: The batch mean is statistically significantly below 500 mg, but the deviation is within the acceptable regulatory range — the batch meets quality standards. This illustrates how statistical significance (the mean is not exactly 500 mg) and practical significance (the deviation is within tolerance) can diverge.

APA write-up: "A one-sample t-test revealed that the mean tablet weight (M=497.3M = 497.3 mg, SD=6.8SD = 6.8 mg, n=40n = 40) was significantly lower than the target of 500 mg, t(39)=2.51t(39) = -2.51, p=.016p = .016, d=0.40d = -0.40 [95% CI: 0.71-0.71, 0.08-0.08]. However, an equivalence test (TOST, Δ=5\Delta = 5 mg) demonstrated that the batch mean was within the acceptable regulatory range [90% CI: 495.5, 499.1] \subset [495, 505], indicating the batch meets quality specifications despite the statistically significant deviation from the nominal target."


Example 3: Exam Scores vs. National Average

A teacher believes their class performs above the national average of μ0=68%\mu_0 = 68\%. They measure n=30n = 30 students.

Data summary: n=30n = 30, xˉ=71.8%\bar{x} = 71.8\%, s=9.4%s = 9.4\%

Directional hypothesis: H0:μ68H_0: \mu \leq 68 vs. H1:μ>68H_1: \mu > 68 (one-tailed, pre-registered)

t-statistic:

SE=9.4/30=9.4/5.477=1.716SE = 9.4/\sqrt{30} = 9.4/5.477 = 1.716

t=(71.868)/1.716=3.8/1.716=2.215t = (71.8 - 68)/1.716 = 3.8/1.716 = 2.215

p-value (upper one-tailed):

p=P(T292.215)=.017p = P(T_{29} \geq 2.215) = .017

95% CI for μ\mu (two-tailed, for reference): [68.3,75.3][68.3, 75.3]

Cohen's dd: d=3.8/9.4=0.404d = 3.8/9.4 = 0.404 (small-medium)

APA write-up: "A one-tailed one-sample t-test (pre-registered directional hypothesis) indicated that the class mean exam score (M=71.8%M = 71.8\%, SD=9.4%SD = 9.4\%, n=30n = 30) was significantly above the national average of 68%, t(29)=2.22t(29) = 2.22, p=.017p = .017, d=0.40d = 0.40 [95% CI: 0.020.02, 0.780.78]. The class outperformed the national average by a small-to-medium margin."


13. Common Mistakes and How to Avoid Them

Mistake 1: Choosing μ0\mu_0 Based on the Sample Mean

Problem: Examining the data, seeing xˉ=73\bar{x} = 73, and then testing H0:μ=70H_0: \mu = 70 because it seems close. Selecting μ0\mu_0 based on the observed data is circular and inflates Type I error.

Solution: Always specify μ0\mu_0 before data collection, based on theory, published norms, or a substantive threshold. Pre-register the hypothesis if possible.


Mistake 2: Interpreting a Non-Significant Result as "No Difference from μ0\mu_0"

Problem: Concluding that p=.38p = .38 means μ=μ0\mu = \mu_0. A non-significant result means insufficient evidence against H0H_0, not evidence that H0H_0 is true. With n=5n = 5, almost no test will be significant, regardless of the true effect.

Solution: Report the 95% CI for μ\mu alongside the p-value. A wide CI spanning across μ0\mu_0 reflects uncertainty, not zero effect. Use equivalence testing (TOST) to positively establish equivalence.


Mistake 3: Using a One-Tailed Test Post-Hoc

Problem: Running a two-tailed test, getting p=.07p = .07, and switching to a one-tailed test to achieve p=.035p = .035. This doubles the effective Type I error rate.

Solution: Directional hypotheses must be pre-registered. If the research question genuinely allows only one direction (and this was decided before data collection), the one-tailed test is appropriate. Otherwise, use the two-tailed test.


Mistake 4: Reporting Only the p-Value Without Effect Size

Problem: "t(24)=4.09t(24) = 4.09, p=.0004p = .0004" tells the reader nothing about how large the deviation from μ0\mu_0 is in meaningful units. A study with n=10,000n = 10{,}000 could produce this result for d=0.04d = 0.04 — a trivially small effect.

Solution: Always report Cohen's dd (or Hedges' gg) with its 95% CI and interpret the magnitude relative to the research context.


Mistake 5: Using the One-Sample t-Test for Paired Data

Problem: Computing the mean of pre-scores and the mean of post-scores and testing each separately against μ0=0\mu_0 = 0. This ignores the within-person correlation and misses the point — the interest is in the change, not the absolute level.

Solution: For pre-post designs, use the paired t-test with difference scores di=xpostxpred_i = x_{post} - x_{pre}, testing H0:μd=0H_0: \mu_d = 0 (which is itself a one-sample t-test on the differences).


Mistake 6: Ignoring Outliers in Small Samples

Problem: With n=10n = 10, a single outlier can shift xˉ\bar{x} substantially and either inflate or deflate the t-statistic dramatically.

Solution: Always inspect data with boxplots and zz-scores before running the test. Report the analysis with and without outliers, and consider the Wilcoxon signed-rank test or trimmed mean t-test as robust alternatives.


Mistake 7: Not Reporting the 95% CI for μ\mu

Problem: The p-value and dd alone give an incomplete picture. The CI for μ\mu directly answers the question: "What values of the true population mean are plausible given these data?" Without it, readers cannot assess the precision of the estimate.

Solution: Always report the 95% CI for μ\mu alongside the hypothesis test results.


14. Troubleshooting

t\vert t \vertLikely CauseSolution
tt-statistic is extremely large ($t> 10$)
p=1.000p = 1.000 exactlyxˉ=μ0\bar{x} = \mu_0 exactly, or floating point issueCheck data; add more decimal places
Shapiro-Wilk p<.05p < .05 with large nnSmall, inconsequential non-normality detected (test has high power for large nn)Inspect Q-Q plot; with n30n \geq 30, minor non-normality rarely affects t-test validity
CI for dd is very wideSmall nnReport the wide CI — it conveys genuine uncertainty; conduct power analysis for future study
Cohen's dd is large but p>.05p > .05Small nn (low power)Study is underpowered; dd may reflect a real but undetected effect
dd from t/nt/\sqrt{n} differs from (xˉμ0)/s(\bar{x}-\mu_0)/sRounding in intermediate stepsCompute dd directly from summary statistics for accuracy
One-tailed pp is larger than two-tailed ppEffect is in the wrong directionReport the result as going against the directional hypothesis; consider reporting two-tailed results
Negative dd when a positive effect is expectedCheck direction of μ0xˉ\mu_0 - \bar{x} vs. xˉμ0\bar{x} - \mu_0Verify formula; report sign with direction (e.g., sample mean is below/above μ0\mu_0)
Equivalence test fails despite small ddEquivalence bounds are too narrow for the sample sizeIncrease nn or use wider, substantively justified equivalence bounds
Hedges' gg and Cohen's dd differ substantiallyVery small nn (<10< 10) — bias correction is largeReport gg (preferred); note that estimates are unstable for very small samples

15. Quick Reference Cheat Sheet

Core Equations

FormulaDescription
t=(xˉμ0)/(s/n)t = (\bar{x}-\mu_0)/(s/\sqrt{n})One-sample t-statistic
ν=n1\nu = n-1Degrees of freedom
p=2×P(Tνt)p = 2\times P(T_\nu \geq \vert t \vert)Two-tailed p-value
xˉ±tα/2,  νs/n\bar{x} \pm t_{\alpha/2,\;\nu}\cdot s/\sqrt{n}95% CI for μ\mu
d=(xˉμ0)/sd = (\bar{x}-\mu_0)/sCohen's dd
d=t/nd = t/\sqrt{n}Cohen's dd from tt-statistic
g=d×(13/(4(n1)1))g = d\times(1-3/(4(n-1)-1))Hedges' gg (bias-corrected)
r=t2/(t2+ν)r = \sqrt{t^2/(t^2+\nu)}Point-biserial rr from tt
SEd=1/n+d2/(2(n1))SE_d = \sqrt{1/n + d^2/(2(n-1))}SE of Cohen's dd
n7.84/d2n \approx 7.84/d^2Required nn for 80% power, α=.05\alpha=.05

Decision Guide

ConditionRecommended Test
Normal data, known μ0\mu_0One-sample t-test
Non-normal data or ordinalWilcoxon signed-rank (one-sample)
Testing equivalence to μ0\mu_0TOST equivalence test
Known population σ\sigmaOne-sample z-test
Quantifying evidence for H0H_0Bayesian t-test (Bayes Factor)

Cohen's Benchmarks

d\vert d \vertLabel
0.200.20Small
0.500.50Medium
0.800.80Large

Required Sample Size

ddPower = 0.80Power = 0.90
0.20198264
0.503344
0.801418
1.00912

Assumes α=.05\alpha = .05, two-tailed.

APA 7th Edition Reporting Templates

Standard: "A one-sample t-test indicated that [sample description] (M=M = [value], SD=SD = [value], n=n = [value]) [significantly / did not significantly] differ from [the reference value / normative mean of μ0\mu_0], t(ν)=t(\nu) = [value], p=p = [value], d=d = [value] [95% CI: LB, UB]. The 95% CI for the mean, [LB,UB][\text{LB}, \text{UB}], [excluded / included] μ0=\mu_0 = [value]."

With Hedges' gg: "... g=g = [value] [95% CI: LB, UB] (bias-corrected for small sample)."

With equivalence test: "A TOST equivalence test with bounds ±Δ\pm\Delta demonstrated [equivalence / non-equivalence] to [reference value], 90% CI for xˉ\bar{x}: [LB,UB][\text{LB}, \text{UB}] [/⊄][\subset / \not\subset] [μ0Δ,μ0+Δ][\mu_0-\Delta, \mu_0+\Delta]."

With Bayesian t-test: "The Bayesian t-test yielded BF10=BF_{10} = [value], indicating [moderate / strong / extreme] evidence for [the alternative / the null] hypothesis."

Reporting Checklist

ItemRequired
t-statistic with sign✅ Always
Degrees of freedom ν=n1\nu = n-1✅ Always
Exact p-value✅ Always
Sample mean, SD, and nn✅ Always
95% CI for μ\mu✅ Always
Cohen's dd or Hedges' gg✅ Always
95% CI for effect size✅ Always
Hypothesised value μ0\mu_0 (stated explicitly)✅ Always
Alternative hypothesis direction✅ Always
Normality check result✅ When n<30n < 30
Outlier check result✅ When n<30n < 30
Hedges' gg instead of dd✅ When n<20n < 20
TOST equivalence test✅ When claiming null result
Bayes FactorRecommended for null results
Power analysis✅ For underpowered or null results

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting one-sample t-tests within the DataStatPro application. For further reading, consult Gravetter & Wallnau's "Statistics for the Behavioral Sciences" (10th ed.), Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Lakens's "Equivalence Tests: A Practical Primer" (Social Psychological and Personality Science, 2017), and Rouder et al.'s "Bayesian t-Tests" (Psychonomic Bulletin & Review, 2009). For feature requests or support, contact the DataStatPro team.