One-Sample t-Test: Zero to Hero Tutorial

This comprehensive tutorial takes you from the foundational concepts of single-group inference all the way through advanced interpretation, reporting, assumption checking, and practical usage within the DataStatPro application. Whether you are encountering the one-sample t-test for the first time or deepening your understanding of comparing a sample to a known standard, this guide builds your knowledge systematically from the ground up.

Prerequisites and Background Concepts
What is a One-Sample t-Test?
The Mathematics Behind the One-Sample t-Test
Assumptions of the One-Sample t-Test
Variants of the One-Sample t-Test
Using the One-Sample t-Test Calculator Component
Step-by-Step Procedure
Interpreting the Output
Effect Sizes for the One-Sample t-Test
Confidence Intervals
Advanced Topics
Worked Examples
Common Mistakes and How to Avoid Them
Troubleshooting
Quick Reference Cheat Sheet

1. Prerequisites and Background Concepts

Before diving into the one-sample t-test, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.

1.1 The Concept of Statistical Inference

Statistical inference is the process of drawing conclusions about a population from a sample. In the one-sample t-test context:

Population parameter of interest: $\mu$ (the true population mean).
Sample statistic: $\bar{x}$ (the sample mean, our best estimate of $\mu$ ).
Question asked: "Is it plausible that the true population mean equals some specific, theoretically or practically meaningful value $\mu_0$ ?"

1.2 The Null and Alternative Hypotheses

Every t-test operates within the hypothesis testing framework:

$H_0$ (null hypothesis): There is no difference between the population mean and the hypothesised value. $H_0: \mu = \mu_0$ .
$H_1$ $H_{1}$ (alternative hypothesis): The population mean differs from the hypothesised value. The alternative can be:
- Two-tailed: $H_1: \mu \neq \mu_0$ (no directional prediction)
- Upper one-tailed: $H_1: \mu > \mu_0$ (predict the mean is higher)
- Lower one-tailed: $H_1: \mu < \mu_0$ (predict the mean is lower)

1.3 The Standard Error of the Mean

When we draw a sample of size $n$ from a population with standard deviation $\sigma$ , the sample mean $\bar{x}$ varies from sample to sample. The standard error of the mean (SEM) quantifies this variability:

$SE_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$

In practice, $\sigma$ is unknown and estimated by the sample standard deviation $s$ :

$\widehat{SE}_{\bar{x}} = \frac{s}{\sqrt{n}}$

A larger $n$ produces a smaller SEM, meaning our estimate of $\mu$ becomes more precise as sample size increases. This is why large samples detect even trivially small deviations from $\mu_0$ .

1.4 The t-Distribution

When the population $\sigma$ is unknown and estimated from the data, the test statistic does not follow the standard normal distribution — it follows the t-distribution with $n - 1$ degrees of freedom (df).

The t-distribution is:

Symmetric and bell-shaped, centred at zero.
Heavier-tailed than the standard normal (more probability in the extremes).
Parameterised by degrees of freedom $\nu = n - 1$ .
Approaches the standard normal $\mathcal{N}(0, 1)$ as $\nu \to \infty$ .

The heavier tails reflect additional uncertainty from estimating $\sigma$ with $s$ — particularly important in small samples.

1.5 The p-Value and Significance Level

The p-value is the probability of obtaining a test statistic at least as extreme as the observed value, assuming $H_0$ is true. It answers: "How surprising is this sample result if the null hypothesis were correct?"

The significance level $\alpha$ (conventionally $.05$ ) is the threshold below which we consider the result sufficiently surprising to reject $H_0$ .

⚠️ A small p-value does NOT mean the null is false, the effect is large, or the finding is important. It only means the data are inconsistent with $H_0$ at level $\alpha$ . Always accompany p-values with effect sizes and confidence intervals.

1.6 The Central Limit Theorem

The Central Limit Theorem (CLT) states that for sufficiently large $n$ , the sampling distribution of $\bar{x}$ is approximately normal regardless of the shape of the population distribution:

$\bar{x} \approx \mathcal{N}\!\left(\mu, \frac{\sigma^2}{n}\right)$

This guarantees that the one-sample t-test is robust to non-normality for large samples (generally $n \geq 30$ ). For smaller samples, the normality of the population itself is important.

1.7 Point Estimates and Interval Estimates

A point estimate ( $\bar{x}$ ) is a single best guess for the population parameter.
A confidence interval (CI) provides a range of plausible values for the parameter.

A 95% CI means: if we repeated the study many times, 95% of the resulting intervals would contain the true $\mu$ . CIs communicate both the location and precision of the estimate — always report them alongside the t-test result.

2. What is a One-Sample t-Test?

2.1 The Core Question

The one-sample t-test is a parametric inferential test that determines whether the mean of a single sample differs significantly from a known, hypothesised, or theoretically meaningful population value $\mu_0$ .

Unlike two-sample tests that compare two groups, the one-sample t-test compares one group to a fixed reference point. The reference point $\mu_0$ is not estimated from the data — it is specified in advance based on:

A published population norm (e.g., IQ = 100).
A theoretical prediction (e.g., reaction time = 250 ms).
A clinical threshold (e.g., PHQ-9 score = 10 for moderate depression).
A quality control standard (e.g., tablet weight = 500 mg).
A chance level (e.g., proportion correct = 0.50 in a binary task).

2.2 The General Logic

The test measures how far the sample mean $\bar{x}$ is from $\mu_0$ , standardised by the estimated standard error of the mean:

$t = \frac{\text{Observed departure from } H_0}{\text{Expected random departure under } H_0} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$

A large $|t|$ indicates that the sample mean is many standard error units away from $\mu_0$ — unlikely to occur by chance if $H_0$ is true.

2.3 When to Use the One-Sample t-Test

The one-sample t-test is appropriate when:

Condition	Requirement
Research design	Single sample compared to a fixed standard
Outcome variable	Continuous (interval or ratio scale)
Distribution	Approximately normal (or $n \geq 30$ via CLT)
Reference value	Known $\mu_0$ , specified before data collection
Observations	Independent of each other

2.4 Real-World Applications

Field	Research Question	$\mu_0$
Clinical Psychology	Does a clinical sample's depression score differ from the population norm?	Published PHQ-9 norm
Cognitive Neuroscience	Is the reaction time of ADHD patients different from the normative 250 ms?	$250$ ms
Education	Does the class mean exam score differ from the national average of 70%?	$70$
Quality Control	Does the mean tablet weight differ from the target of 500 mg?	$500$ mg
Sport Science	Does the team's mean VO $_2$ max differ from elite athlete norms?	Published norm
Nutrition	Does average daily caloric intake differ from the recommended 2000 kcal?	$2000$ kcal
Finance	Does the mean return of a fund differ from the benchmark return of 8%?	$8\%$
Public Health	Is mean blood pressure in a community sample different from the clinical threshold?	$120$ mmHg

2.5 Distinguishing from Related Tests

Situation	Correct Test
One sample vs. known value	One-sample t-test
Two independent groups	Independent samples t-test
Two related measurements	Paired samples t-test
One sample, non-normal, small $n$	Wilcoxon signed-rank test (one-sample)
Proportion vs. known value	One-proportion z-test
Variance vs. known value	Chi-squared test for variance

3. The Mathematics Behind the One-Sample t-Test

3.1 The t-Statistic

Given a sample of $n$ observations $x_1, x_2, \ldots, x_n$ drawn from a population with unknown mean $\mu$ and unknown standard deviation $\sigma$ , the one-sample t-statistic is:

$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

Where:

$\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i \quad \text{(sample mean)}$

$s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2} \quad \text{(sample standard deviation)}$

$SE = \frac{s}{\sqrt{n}} \quad \text{(estimated standard error of the mean)}$

$\mu_0 \quad \text{(hypothesised population mean)}$

Under $H_0: \mu = \mu_0$ , the statistic $t$ follows a t-distribution with $\nu = n-1$ degrees of freedom.

3.2 Degrees of Freedom

The degrees of freedom $\nu = n - 1$ represent the number of independent pieces of information available to estimate the standard deviation. We lose 1 degree of freedom because computing $s$ requires first estimating $\bar{x}$ from the same data.

Smaller $\nu$ → heavier tails → more conservative critical values → harder to achieve significance. This appropriately penalises small samples for the additional uncertainty in estimating $\sigma$ .

3.3 Computing the p-Value

The p-value is computed from the cumulative distribution function (CDF) $F_{t,\nu}$ of the t-distribution:

Two-tailed (default):

$p = 2 \times P(T_\nu \geq |t_{obs}|) = 2 \times [1 - F_{t,\nu}(|t_{obs}|)]$

Upper one-tailed ( $H_1: \mu > \mu_0$ ):

$p = P(T_\nu \geq t_{obs}) = 1 - F_{t,\nu}(t_{obs})$

Lower one-tailed ( $H_1: \mu < \mu_0$ ):

$p = P(T_\nu \leq t_{obs}) = F_{t,\nu}(t_{obs})$

3.4 Critical Values

The decision rule compares the observed $|t|$ to the critical value $t_{crit}$ :

$t_{crit} = t_{\alpha/2,\; n-1}$ (two-tailed)

Reject $H_0$ if $|t_{obs}| \geq t_{crit}$ .

Common critical values ( $t_{\alpha/2,\; \nu}$ , two-tailed $\alpha = .05$ ):

$n$	$\nu$	$t_{crit}$
5	4	2.776
10	9	2.262
15	14	2.145
20	19	2.093
30	29	2.045
50	49	2.010
100	99	1.984
$\infty$	$\infty$	1.960

3.5 The 95% Confidence Interval for $\mu$

The $(1-\alpha) \times 100\%$ CI for the population mean $\mu$ is:

$\bar{x} \pm t_{\alpha/2,\; n-1} \cdot \frac{s}{\sqrt{n}}$

This interval is dual to the hypothesis test: $H_0: \mu = \mu_0$ is rejected at level $\alpha$ if and only if $\mu_0$ falls outside the $(1-\alpha) \times 100\%$ CI.

3.6 Cohen's $d$ — Effect Size for the One-Sample t-Test

The standardised effect size expresses how many standard deviation units the sample mean departs from $\mu_0$ :

$d = \frac{\bar{x} - \mu_0}{s}$

This is directly analogous to a z-score, but standardised by the sample SD rather than the population SD.

Note that $d$ and $t$ are related:

$t = d\sqrt{n} \implies d = \frac{t}{\sqrt{n}}$

This relationship reveals that $t$ is a joint function of effect size ( $d$ ) and sample size ( $n$ ). A small $d$ can produce a large $t$ with a large enough $n$ .

3.7 Hedges' $g$ — Bias-Corrected Effect Size

Cohen's $d$ is slightly positively biased (overestimates the true effect) in small samples. Hedges' $g$ applies a correction factor $J$ :

$g = d \times J, \qquad J = 1 - \frac{3}{4(n-1) - 1}$

More precisely:

$J = \frac{\Gamma((n-1)/2)}{\sqrt{(n-1)/2} \cdot \Gamma((n-2)/2)}$

The correction is negligible for $n > 20$ but can be substantial ( $> 5\%$ ) for $n < 10$ . Hedges' $g$ is the preferred effect size for small samples and meta-analysis.

3.8 Exact Confidence Interval for $d$ via the Non-Central t-Distribution

Under $H_1$ , the t-statistic follows a non-central t-distribution with non-centrality parameter:

$\lambda = d\sqrt{n}$

The exact 95% CI for $d$ inverts this relationship numerically: find $\lambda_L$ and $\lambda_U$ such that:

$P(T_{n-1}(\lambda_L) \geq t_{obs}) = 0.025 \quad \text{and} \quad P(T_{n-1}(\lambda_U) \leq t_{obs}) = 0.025$

Then:

$d_L = \frac{\lambda_L}{\sqrt{n}}, \qquad d_U = \frac{\lambda_U}{\sqrt{n}}$

An approximate 95% CI (adequate for $n \geq 20$ ):

$d \pm 1.96 \times SE_d, \qquad SE_d \approx \sqrt{\frac{1}{n} + \frac{d^2}{2(n-1)}}$

3.9 Statistical Power

Power is the probability of correctly rejecting $H_0$ when a true effect of size $d$ exists:

$\text{Power} = P\!\left(T_{n-1}(\lambda) > t_{crit} \mid \lambda = d\sqrt{n}\right)$

Required sample size for desired power $1-\beta$ at two-sided $\alpha$ :

$n \approx \frac{(z_{1-\alpha/2} + z_{1-\beta})^2}{d^2} + \frac{z_{1-\alpha/2}^2}{2}$

For $\alpha = .05$ and power $= 0.80$ : $(1.96 + 0.84)^2 = 7.84$ , so $n \approx 7.84/d^2$ .

Required $n$ for common effect sizes:

Cohen's $d$	Power = 0.80	Power = 0.90	Power = 0.95
0.20 (small)	198	264	327
0.50 (medium)	33	44	54
0.80 (large)	14	18	22
1.00	9	12	15
1.50	5	7	9

4. Assumptions of the One-Sample t-Test

4.1 Normality of the Data (or Sampling Distribution)

The one-sample t-test assumes that either:

The population from which the sample is drawn is normally distributed, OR
The sample size is sufficiently large ( $n \geq 30$ ) for the CLT to ensure approximately normal sampling distribution of $\bar{x}$ .

How to check:

Method	Details
Shapiro-Wilk test	Most powerful normality test for $n < 50$ . $H_0$ : data are normal. $p > .05$ → no evidence of non-normality
Kolmogorov-Smirnov	For $n \geq 50$ , use Lilliefors correction
Q-Q plot	Plot sample quantiles vs. theoretical normal quantiles; points should fall on the diagonal
Histogram + density	Should be approximately bell-shaped
Skewness	$
Kurtosis	$

Robustness: The t-test is remarkably robust to mild-to-moderate non-normality, especially for $n \geq 15$ . Symmetric non-normal distributions cause few problems even for small $n$ . Severe skewness with small $n$ is the primary concern.

When violated: Use the Wilcoxon signed-rank test (one-sample version) as the non-parametric alternative. Consider log or square-root transformation for right-skewed data.

4.2 Independence of Observations

Each observation must be independent — the value of one participant's score must not influence another's. This is a design assumption, not testable statistically.

Common violations:

Scores from participants who discussed the task with each other.
Multiple measurements from the same participant treated as independent.
Cluster sampling without accounting for cluster structure.

When violated: Use mixed models or multilevel approaches that explicitly model the dependency structure.

4.3 Interval or Ratio Scale of Measurement

The dependent variable must be measured on at least an interval scale — that is, the numerical differences between values must be meaningful and equal.

When violated: If the data are ordinal (ranks, Likert items treated as ordinal), use the Wilcoxon signed-rank test. Continuous but severely non-normal data may also warrant a non-parametric approach.

4.4 The Reference Value $\mu_0$ Must Be Pre-Specified

The hypothesised value $\mu_0$ must be specified before examining the data. Choosing $\mu_0$ based on the observed $\bar{x}$ (e.g., setting $\mu_0 = \bar{x}$ from a pilot study and then testing the same data) is circular and invalidates the test.

4.5 No Extreme Outliers

Extreme outliers distort both the mean ( $\bar{x}$ ) and the standard deviation ( $s$ ), potentially inflating or deflating the t-statistic.

How to check:

Boxplots: values beyond $3 \times IQR$ from the quartiles are extreme.
Standardised scores: $|z_i| > 3.29$ are statistical outliers at $\alpha = .001$ .
Grubbs' test for a single outlier in normally distributed data.

When outliers present: Investigate the cause (data entry error? valid extreme value?). Report analyses with and without the outlier. Consider the trimmed mean t-test or Wilcoxon signed-rank as robust alternatives.

4.6 Assumption Summary

Assumption	How to Check	Remedy if Violated
Normality	Shapiro-Wilk, Q-Q plot	Wilcoxon signed-rank; transform data
Independence	Design review	Mixed models
Interval scale	Measurement theory	Wilcoxon signed-rank
Pre-specified $\mu_0$	Research protocol	Re-specify with new data
No severe outliers	Boxplots, $z$ -scores	Investigate; trimmed mean t-test

5. Variants of the One-Sample t-Test

5.1 Standard One-Sample t-Test

The classic form described throughout this tutorial: compare $\bar{x}$ to a fixed value $\mu_0$ assuming approximately normal data.

5.2 One-Sample Wilcoxon Signed-Rank Test

The non-parametric alternative when normality cannot be assumed. Tests whether the population median (not mean) equals $\theta_0$ . Procedure:

Compute $d_i = x_i - \theta_0$ for each observation.
Remove $d_i = 0$ ; let $n'$ = number of non-zero differences.
Rank $|d_i|$ from 1 to $n'$ .
Compute $W^+ = \sum_{d_i > 0} R_i$ and $W^- = \sum_{d_i < 0} R_i$ .
Test statistic: $W = \min(W^+, W^-)$ (or use $W^+$ with the $z$ -approximation).

Effect size: $r_W = z/\sqrt{n'}$

5.3 z-Test for the Mean (Known $\sigma$ )

When the population standard deviation $\sigma$ is known (rare in practice), use the one-sample z-test instead:

$z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$

This follows the standard normal distribution exactly, without the need for the t-distribution. This situation arises in standardised testing (where $\sigma$ is known from large normative samples) or simulation studies.

5.4 Equivalence Testing (One-Sample TOST)

Rather than testing $H_0: \mu = \mu_0$ , equivalence testing asks whether $\mu$ is close enough to $\mu_0$ to be considered practically equivalent. The Two One-Sided Tests (TOST) procedure:

Specify equivalence bounds $[\mu_0 - \Delta, \mu_0 + \Delta]$ (e.g., $\Delta = 5$ units).

Test both:

$H_{01}: \mu \leq \mu_0 - \Delta$ (lower bound)
$H_{02}: \mu \geq \mu_0 + \Delta$ (upper bound)

Equivalence is concluded when both null hypotheses are rejected — equivalently, when the 90% CI for $\bar{x}$ falls entirely within $(\mu_0 - \Delta, \mu_0 + \Delta)$ .

5.5 Bayesian One-Sample t-Test

The Bayesian one-sample t-test computes a Bayes Factor $BF_{10}$ quantifying evidence for $H_1$ (effect exists) vs. $H_0$ (no effect):

$BF_{10} = \frac{P(\text{data} \mid H_1)}{P(\text{data} \mid H_0)}$

Under the Rouder et al. (2009) default prior ( $\delta \sim \text{Cauchy}(0, r = \sqrt{2}/2)$ ):

$BF_{10}$ can be computed from $t$ and $n$ numerically. $BF_{10} > 3$ indicates moderate evidence for $H_1$ ; $BF_{10} < 1/3$ indicates moderate evidence for $H_0$ .

5.6 Trimmed Mean t-Test (Robust Variant)

When outliers are present, the trimmed mean t-test uses the $\alpha$ -trimmed mean (removing the top and bottom $\alpha$ proportion of observations) rather than the arithmetic mean. With 20% trimming:

$\bar{x}_t = \frac{1}{n-2h}\sum_{i=h+1}^{n-h} x_{(i)}, \quad h = \lfloor 0.20n \rfloor$

The test statistic uses the Winsorised standard deviation. This is substantially more robust to outliers and heavy tails while retaining reasonable power.

6. Using the One-Sample t-Test Calculator Component

The One-Sample t-Test Calculator in DataStatPro provides a comprehensive tool for running, diagnosing, and reporting one-sample tests.

Step-by-Step Guide

Step 1 — Select the Test

Navigate to Statistical Tests → t-Tests → One-Sample t-Test.

Step 2 — Input Method

Choose how to provide data:

Raw data: Paste or upload a column of values. DataStatPro computes all summary statistics and runs assumption checks automatically.
Summary statistics: Enter $n$ , $\bar{x}$ , and $s$ directly.
t-statistic + df: Enter a published $t$ and $\nu = n-1$ to compute p-values and effect sizes from a reported result.

Step 3 — Specify the Hypothesised Value

Enter $\mu_0$ — the value you are testing against. Default is $0$ . Common values:

$\mu_0 = 100$ for IQ or standardised test score comparisons.
$\mu_0 = 0$ for testing whether a mean change differs from zero.
$\mu_0 = 0.5$ for testing whether a proportion differs from chance.

Step 4 — Select the Alternative Hypothesis

Two-tailed (default): $H_1: \mu \neq \mu_0$ .
Upper one-tailed: $H_1: \mu > \mu_0$ .
Lower one-tailed: $H_1: \mu < \mu_0$ .

⚠️ One-tailed tests require a strong, pre-registered directional prediction. Selecting one-tailed post-hoc to achieve significance is p-hacking.

Step 5 — Set Significance Level and Confidence Level

Default: $\alpha = .05$ , 95% CI. DataStatPro also displays results for $\alpha = .01$ and $\alpha = .001$ simultaneously.

Step 6 — Select Display Options

✅ t-statistic, df, p-value (exact), and decision.
✅ Sample mean, SD, SEM, and 95% CI for $\mu$ .
✅ Cohen's $d$ and Hedges' $g$ with exact 95% CI.
✅ Common Language Effect Size.
✅ Assumption check results (Shapiro-Wilk, outlier detection).
✅ Sampling distribution diagram showing the observed $t$ and critical region.
✅ Effect size diagram (distribution of sample vs. $\mu_0$ ).
✅ Power analysis: current power and required $n$ for 80%, 90%, 95% power.
✅ Equivalence test (TOST) panel.
✅ Bayesian t-test (Bayes Factor $BF_{10}$ ).
✅ APA 7th edition results paragraph (auto-generated).

Step 7 — Run the Analysis

Click "Run One-Sample t-Test". DataStatPro will:

Compute $\bar{x}$ , $s$ , $SE$ , $t$ , $\nu$ , and exact p-value.
Construct the 95% CI for $\mu$ .
Compute Cohen's $d$ and Hedges' $g$ with exact CIs.
Run Shapiro-Wilk normality test and flag outliers.
Generate all selected visualisations.
Output an APA-compliant results paragraph.

7. Step-by-Step Procedure

7.1 Full Manual Procedure

Step 1 — State the Hypotheses

$H_0: \mu = \mu_0 \qquad H_1: \mu \neq \mu_0$ (two-tailed)

Specify $\mu_0$ based on theory, norms, or a substantive threshold.

Step 2 — Check Assumptions

Verify interval/ratio scale.
Run Shapiro-Wilk test or inspect Q-Q plot.
Identify and investigate outliers with boxplots and $z$ -scores.
Confirm independence of observations by reviewing study design.

Step 3 — Compute Summary Statistics

$n = \text{sample size}$

$\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$

$s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i - \bar{x})^2}$

$SE = \frac{s}{\sqrt{n}}$

Step 4 — Compute the t-Statistic

$t = \frac{\bar{x} - \mu_0}{SE} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$

Step 5 — Determine Degrees of Freedom

$\nu = n - 1$

Step 6 — Compute the p-Value

$p = 2 \times P(T_\nu \geq |t|)$ (two-tailed)

Compare to $\alpha$ . Reject $H_0$ if $p \leq \alpha$ .

Step 7 — Construct the 95% CI for $\mu$

Find $t_{\alpha/2,\; n-1}$ (e.g., $t_{.025,\; n-1}$ for 95% CI):

$\bar{x} \pm t_{\alpha/2,\; n-1} \times SE$

Step 8 — Compute Effect Size

$d = \frac{\bar{x} - \mu_0}{s}$

Hedges' $g$ :

$g = d \times \left(1 - \frac{3}{4(n-1) - 1}\right)$

Step 9 — Compute Approximate 95% CI for $d$

$SE_d = \sqrt{\frac{1}{n} + \frac{d^2}{2(n-1)}}$

$d \pm 1.96 \times SE_d$

Step 10 — Interpret and Report

Use the APA reporting template in Section 15. Always report $t$ , $\nu$ , $p$ , $\bar{x}$ , $s$ , 95% CI for $\mu$ , Cohen's $d$ or Hedges' $g$ , and 95% CI for the effect size.

8. Interpreting the Output

8.1 The t-Statistic

$\vert t \vert$ Relative to $t_{crit}$	Interpretation
$\vert t \vert < t_{crit}$	Fail to reject $H_0$ ; result not significant at $\alpha$
$\vert t \vert \geq t_{crit}$	Reject $H_0$ ; result significant at $\alpha$
Large $\vert t \vert$ with large $n$	Can be significant even for tiny $d$
Small $\vert t \vert$ with small $n$	May be non-significant even for large $d$ (low power)

8.2 The p-Value

p-Value	Conventional Interpretation
$p > .10$	No evidence against $H_0$
$.05 < p \leq .10$	Marginal evidence (trend)
$.01 < p \leq .05$	Significant at $\alpha = .05$
$.001 < p \leq .01$	Significant at $\alpha = .01$
$p \leq .001$	Significant at $\alpha = .001$

⚠️ These thresholds are arbitrary conventions, not natural boundaries. A result with $p = .049$ is not meaningfully more "significant" than $p = .051$ . Focus on effect sizes and CIs, not arbitrary thresholds.

8.3 The 95% Confidence Interval

CI Outcome	Interpretation
CI excludes $\mu_0$	Reject $H_0$ ; $\mu$ plausibly differs from $\mu_0$
CI includes $\mu_0$	Fail to reject $H_0$
Narrow CI	Precise estimate of $\mu$ ; adequate sample size
Wide CI	Imprecise estimate; consider increasing $n$

8.4 Cohen's $d$ — Magnitude Interpretation

Cohen's (1988) benchmarks:

$\vert d \vert$	Verbal Label	Overlap Between Distributions
$0.00$	No effect	$100\%$
$0.20$	Small	$85\%$
$0.50$	Medium	$67\%$
$0.80$	Large	$53\%$
$1.20$	Very large	$40\%$
$2.00$	Huge	$18\%$

Sawilowsky's (2009) extended benchmarks:

$\vert d \vert$	Label
$< 0.10$	Tiny
$0.10 - 0.19$	Very small
$0.20 - 0.49$	Small
$0.50 - 0.79$	Medium
$0.80 - 1.19$	Large
$1.20 - 1.99$	Very large
$\geq 2.00$	Huge

⚠️ Cohen himself warned against mechanical application of these benchmarks. They were "offered as conventions of last resort." Always contextualise effect sizes within your specific research domain and compare to typical effect sizes in your field.

9. Effect Sizes for the One-Sample t-Test

9.1 Cohen's $d$ (One-Sample)

$d = \frac{\bar{x} - \mu_0}{s}$

Interpretation: The sample mean is $d$ standard deviations above (positive) or below (negative) the hypothesised value $\mu_0$ .

9.2 Hedges' $g$

$g = d \times \left(1 - \frac{3}{4(n-1)-1}\right)$

Preferred over $d$ for small samples ( $n < 20$ ) and meta-analysis.

9.3 Point-Biserial Correlation $r$

$r = \sqrt{\frac{t^2}{t^2 + \nu}} = \sqrt{\frac{t^2}{t^2 + n - 1}}$

Equivalent to the correlation between a binary "is vs. norm" variable and the continuous outcome. Ranges from 0 to 1; no directionality.

Convert to $d$ : $d = 2r/\sqrt{1-r^2}$ (assuming equal split)

9.4 Common Language Effect Size (CL)

The probability that a randomly selected individual from the population has a score above $\mu_0$ :

$CL = \Phi\!\left(\frac{d}{\sqrt{1 + 1/n}} \right) \approx \Phi(d)$ for large $n$

Where $\Phi$ is the standard normal CDF.

$CL = 0.50$ → no effect; $CL = 0.84$ → $d \approx 1.00$ ; $CL = 0.69$ → $d \approx 0.50$ .

9.5 Effect Size Summary Table

Effect Size	Formula	Range	Interpretation
Cohen's $d$	$(\bar{x}-\mu_0)/s$	$(-\infty, \infty)$	SD units above/below $\mu_0$
Hedges' $g$	$d \times J$	$(-\infty, \infty)$	Bias-corrected; preferred for small $n$
$r$	$\sqrt{t^2/(t^2+\nu)}$	$[0, 1]$	Correlation-like; no direction
CL	$\Phi(d)$ (approx)	$[0, 1]$	Prob. of exceeding $\mu_0$

10. Confidence Intervals

10.1 CI for the Population Mean $\mu$

$\text{CI}_\mu = \bar{x} \pm t_{\alpha/2,\; n-1} \cdot \frac{s}{\sqrt{n}}$

This directly addresses the primary research question by providing a range of plausible values for the true population mean.

10.2 CI Width as a Function of Sample Size

$\text{CI Width} = 2 \times t_{\alpha/2,\; n-1} \times \frac{s}{\sqrt{n}} \approx \frac{2 \times 1.96 \times s}{\sqrt{n}}$

For $s = 10$ :

$n$	Approx CI Width	Interpretation
5	$17.5$	Very imprecise
10	$12.4$	Imprecise
20	$8.8$	Moderate
50	$5.5$	Good
100	$3.9$	High precision
200	$2.8$	Very high precision

10.3 CI for Cohen's $d$

Approximate (adequate for $n \geq 20$ ):

$d \pm 1.96 \times \sqrt{\frac{1}{n} + \frac{d^2}{2(n-1)}}$

Exact: Uses the non-central t-distribution (computed automatically by DataStatPro).

10.4 Relationship Between CI and Hypothesis Test

The CI and two-tailed hypothesis test are algebraically equivalent:

$H_0$ is rejected at $\alpha$ $\Leftrightarrow$ $\mu_0$ lies outside the $(1-\alpha) \times 100\%$ CI.
The CI provides more information: it shows not just whether $\mu_0$ is excluded but also the entire range of plausible $\mu$ values given the data.

11. Advanced Topics

11.1 Multiple One-Sample Tests on the Same Dataset

When several one-sample t-tests are conducted on data from the same participants (e.g., testing each of 10 subscales against their respective norms), the familywise error rate inflates:

$FWER = 1 - (1-\alpha)^k$

For $k = 10$ tests: $FWER = 1 - (0.95)^{10} = .401$ .

Correction strategies:

Bonferroni: Adjust $\alpha' = \alpha/k$ . Simple but conservative.
Holm-Bonferroni: Sequential Bonferroni — less conservative than Bonferroni.
Benjamini-Hochberg: Controls the False Discovery Rate (FDR) — appropriate for exploratory research.

Report all tests with both original and adjusted p-values.

11.2 Sensitivity Analysis: Minimum Detectable Effect

Given a fixed sample size $n$ , the minimum detectable effect (MDE) at power 80% and $\alpha = .05$ :

$d_{min} = \frac{z_{1-\alpha/2} + z_{1-\beta}}{\sqrt{n}} = \frac{1.96 + 0.84}{\sqrt{n}} = \frac{2.80}{\sqrt{n}}$

$n$	$d_{min}$ (80% power)
10	0.885
20	0.626
30	0.511
50	0.396
100	0.280
200	0.198

If $d_{min}$ exceeds the smallest effect of practical interest, the study is adequately powered. If not, acknowledge that the study may miss practically important effects.

11.3 Comparing the One-Sample t-Test to the Paired t-Test

The paired t-test (Section on Paired t-Test) is mathematically equivalent to a one- sample t-test applied to the difference scores $d_i = x_{1i} - x_{2i}$ , testing $H_0: \mu_d = 0$ . Understanding this equivalence clarifies when each is appropriate:

One-sample t-test: Compare a sample mean to an externally specified value $\mu_0$ .
Paired t-test: Compare a sample mean of differences to zero (or a specified value).

11.4 Bayesian One-Sample t-Test

The Bayes Factor $BF_{10}$ from the Rouder et al. (2009) default prior ( $r = \sqrt{2}/2$ ):

$BF_{10}(t, n) = \frac{\int_0^\infty (1+n\delta^2/(2\nu))^{-(\nu+1)/2} f_{Cauchy}(\delta;r)\,d\delta}{(1+t^2/\nu)^{-(\nu+1)/2}}$

This integral has no closed form but is computed numerically by DataStatPro.

Interpreting $BF_{10}$ :

$BF_{10}$	Evidence for $H_1$ over $H_0$
$> 100$	Extreme
$30 - 100$	Very strong
$10 - 30$	Strong
$3 - 10$	Moderate
$1 - 3$	Anecdotal
$1$	No evidence
$< 1$	Evidence for $H_0$

Key advantage: $BF_{10} < 1/3$ provides positive evidence for the null hypothesis — something p-values cannot do.

11.5 TOST Equivalence Testing for the One-Sample t-Test

To establish that $\mu$ is practically equivalent to $\mu_0$ (e.g., that a new scale yields scores equivalent to an established norm):

Specify $\Delta$ (the equivalence margin — the maximum acceptable deviation from $\mu_0$ ).
Test $H_{01}: \mu \leq \mu_0 - \Delta$ with upper one-tailed t-test.
Test $H_{02}: \mu \geq \mu_0 + \Delta$ with lower one-tailed t-test.
Equivalence concluded if both tests are significant (i.e., 90% CI for $\bar{x}$ falls within $[\mu_0 - \Delta, \mu_0 + \Delta]$ ).

12. Worked Examples

Example 1: IQ in a Clinical Sample

A neuropsychologist measures IQ scores ( $n = 25$ ) in adults diagnosed with early-stage Alzheimer's disease. The population mean for neurotypical adults is $\mu_0 = 100$ .

Data summary: $n = 25$ , $\bar{x} = 88.4$ , $s = 14.2$

Step 1 — Hypotheses: $H_0: \mu = 100$ vs. $H_1: \mu \neq 100$

Step 2 — Standard error:

$SE = 14.2/\sqrt{25} = 14.2/5 = 2.84$

Step 3 — t-statistic:

$t = (88.4 - 100)/2.84 = -11.6/2.84 = -4.085$

Step 4 — df and p-value:

$\nu = 24$ ; $p = 2 \times P(T_{24} \geq 4.085) = 2 \times .00022 = .0004$

Step 5 — 95% CI for $\mu$ :

$t_{.025,24} = 2.064$

$88.4 \pm 2.064 \times 2.84 = 88.4 \pm 5.86 = [82.5, 94.3]$

Step 6 — Cohen's $d$ :

$d = (88.4 - 100)/14.2 = -11.6/14.2 = -0.817$ ; $|d| = 0.817$

Hedges' $g$ :

$g = 0.817 \times (1 - 3/(4 \times 24 - 1)) = 0.817 \times (1 - 3/95) = 0.817 \times 0.968 = 0.791$

95% CI for $d$ (approximate):

$SE_d = \sqrt{1/25 + 0.817^2/48} = \sqrt{0.040 + 0.0139} = \sqrt{0.0539} = 0.232$

$|d|$ 95% CI: $0.817 \pm 1.96(0.232) = [0.362, 1.272]$

Summary:

Statistic	Value	Interpretation
$t(24)$	$-4.085$
$p$ (two-tailed)	$.0004$	Highly significant
$\bar{x}$	$88.4$	$11.6$ points below norm
95% CI for $\mu$	$[82.5, 94.3]$	Excludes 100
Cohen's $d$	$-0.817$	Large effect
95% CI for $d$	$[-1.272, -0.362]$	Entirely below zero
Hedges' $g$	$-0.791$	Large (bias-corrected)

APA write-up: "A one-sample t-test revealed that the Alzheimer's group ( $M = 88.4$ , $SD = 14.2$ , $n = 25$ ) had significantly lower IQ scores than the normative mean of 100, $t(24) = -4.09$ , $p = .0004$ , $d = -0.82$ [95% CI: $-1.27$ , $-0.36$ ]. The 95% CI for the mean, $[82.5, 94.3]$ , excluded the normative value. This represents a large deviation from the normative standard."

Example 2: Quality Control — Tablet Weight

A pharmaceutical quality control analyst measures the weight (mg) of $n = 40$ tablets from a production batch. The target weight is $\mu_0 = 500$ mg.

Data summary: $n = 40$ , $\bar{x} = 497.3$ mg, $s = 6.8$ mg

Step 1 — Hypotheses: $H_0: \mu = 500$ vs. $H_1: \mu \neq 500$

Step 2 — SE and t:

$SE = 6.8/\sqrt{40} = 6.8/6.325 = 1.075$

$t = (497.3 - 500)/1.075 = -2.7/1.075 = -2.512$

Step 3 — df and p-value:

$\nu = 39$ ; $p = 2 \times P(T_{39} \geq 2.512) = .016$

Step 4 — 95% CI:

$t_{.025,39} = 2.023$

$497.3 \pm 2.023 \times 1.075 = 497.3 \pm 2.175 = [495.1, 499.5]$

Step 5 — Effect size:

$d = (497.3 - 500)/6.8 = -0.397$ (small-medium effect)

Step 6 — Equivalence test:

Regulatory limit: $\pm 5$ mg acceptable. Test whether $\mu \in [495, 505]$ .

90% CI for $\bar{x}$ : $497.3 \pm 1.685 \times 1.075 = [495.5, 499.1]$

Since $[495.5, 499.1] \subset [495, 505]$ : the batch is within equivalence bounds even though it differs significantly from exactly 500 mg.

Interpretation: The batch mean is statistically significantly below 500 mg, but the deviation is within the acceptable regulatory range — the batch meets quality standards. This illustrates how statistical significance (the mean is not exactly 500 mg) and practical significance (the deviation is within tolerance) can diverge.

APA write-up: "A one-sample t-test revealed that the mean tablet weight ( $M = 497.3$ mg, $SD = 6.8$ mg, $n = 40$ ) was significantly lower than the target of 500 mg, $t(39) = -2.51$ , $p = .016$ , $d = -0.40$ [95% CI: $-0.71$ , $-0.08$ ]. However, an equivalence test (TOST, $\Delta = 5$ mg) demonstrated that the batch mean was within the acceptable regulatory range [90% CI: 495.5, 499.1] $\subset$ [495, 505], indicating the batch meets quality specifications despite the statistically significant deviation from the nominal target."

Example 3: Exam Scores vs. National Average

A teacher believes their class performs above the national average of $\mu_0 = 68\%$ . They measure $n = 30$ students.

Data summary: $n = 30$ , $\bar{x} = 71.8\%$ , $s = 9.4\%$

Directional hypothesis: $H_0: \mu \leq 68$ vs. $H_1: \mu > 68$ (one-tailed, pre-registered)

t-statistic:

$SE = 9.4/\sqrt{30} = 9.4/5.477 = 1.716$

$t = (71.8 - 68)/1.716 = 3.8/1.716 = 2.215$

p-value (upper one-tailed):

$p = P(T_{29} \geq 2.215) = .017$

95% CI for $\mu$ (two-tailed, for reference): $[68.3, 75.3]$

Cohen's $d$ : $d = 3.8/9.4 = 0.404$ (small-medium)

APA write-up: "A one-tailed one-sample t-test (pre-registered directional hypothesis) indicated that the class mean exam score ( $M = 71.8\%$ , $SD = 9.4\%$ , $n = 30$ ) was significantly above the national average of 68%, $t(29) = 2.22$ , $p = .017$ , $d = 0.40$ [95% CI: $0.02$ , $0.78$ ]. The class outperformed the national average by a small-to-medium margin."

13. Common Mistakes and How to Avoid Them

Mistake 1: Choosing $\mu_0$ Based on the Sample Mean

Problem: Examining the data, seeing $\bar{x} = 73$ , and then testing $H_0: \mu = 70$ because it seems close. Selecting $\mu_0$ based on the observed data is circular and inflates Type I error.

Solution: Always specify $\mu_0$ before data collection, based on theory, published norms, or a substantive threshold. Pre-register the hypothesis if possible.

Mistake 2: Interpreting a Non-Significant Result as "No Difference from $\mu_0$ "

Problem: Concluding that $p = .38$ means $\mu = \mu_0$ . A non-significant result means insufficient evidence against $H_0$ , not evidence that $H_0$ is true. With $n = 5$ , almost no test will be significant, regardless of the true effect.

Solution: Report the 95% CI for $\mu$ alongside the p-value. A wide CI spanning across $\mu_0$ reflects uncertainty, not zero effect. Use equivalence testing (TOST) to positively establish equivalence.

Mistake 3: Using a One-Tailed Test Post-Hoc

Problem: Running a two-tailed test, getting $p = .07$ , and switching to a one-tailed test to achieve $p = .035$ . This doubles the effective Type I error rate.

Solution: Directional hypotheses must be pre-registered. If the research question genuinely allows only one direction (and this was decided before data collection), the one-tailed test is appropriate. Otherwise, use the two-tailed test.

Mistake 4: Reporting Only the p-Value Without Effect Size

Problem: " $t(24) = 4.09$ , $p = .0004$ " tells the reader nothing about how large the deviation from $\mu_0$ is in meaningful units. A study with $n = 10{,}000$ could produce this result for $d = 0.04$ — a trivially small effect.

Solution: Always report Cohen's $d$ (or Hedges' $g$ ) with its 95% CI and interpret the magnitude relative to the research context.

Mistake 5: Using the One-Sample t-Test for Paired Data

Problem: Computing the mean of pre-scores and the mean of post-scores and testing each separately against $\mu_0 = 0$ . This ignores the within-person correlation and misses the point — the interest is in the change, not the absolute level.

Solution: For pre-post designs, use the paired t-test with difference scores $d_i = x_{post} - x_{pre}$ , testing $H_0: \mu_d = 0$ (which is itself a one-sample t-test on the differences).

Mistake 6: Ignoring Outliers in Small Samples

Problem: With $n = 10$ , a single outlier can shift $\bar{x}$ substantially and either inflate or deflate the t-statistic dramatically.

Solution: Always inspect data with boxplots and $z$ -scores before running the test. Report the analysis with and without outliers, and consider the Wilcoxon signed-rank test or trimmed mean t-test as robust alternatives.

Mistake 7: Not Reporting the 95% CI for $\mu$

Problem: The p-value and $d$ alone give an incomplete picture. The CI for $\mu$ directly answers the question: "What values of the true population mean are plausible given these data?" Without it, readers cannot assess the precision of the estimate.

Solution: Always report the 95% CI for $\mu$ alongside the hypothesis test results.

14. Troubleshooting

$\vert t \vert$	Likely Cause	Solution
$t$ -statistic is extremely large ($	t	> 10$)
$p = 1.000$ exactly	$\bar{x} = \mu_0$ exactly, or floating point issue	Check data; add more decimal places
Shapiro-Wilk $p < .05$ with large $n$	Small, inconsequential non-normality detected (test has high power for large $n$ )	Inspect Q-Q plot; with $n \geq 30$ , minor non-normality rarely affects t-test validity
CI for $d$ is very wide	Small $n$	Report the wide CI — it conveys genuine uncertainty; conduct power analysis for future study
Cohen's $d$ is large but $p > .05$	Small $n$ (low power)	Study is underpowered; $d$ may reflect a real but undetected effect
$d$ from $t/\sqrt{n}$ differs from $(\bar{x}-\mu_0)/s$	Rounding in intermediate steps	Compute $d$ directly from summary statistics for accuracy
One-tailed $p$ is larger than two-tailed $p$	Effect is in the wrong direction	Report the result as going against the directional hypothesis; consider reporting two-tailed results
Negative $d$ when a positive effect is expected	Check direction of $\mu_0 - \bar{x}$ vs. $\bar{x} - \mu_0$	Verify formula; report sign with direction (e.g., sample mean is below/above $\mu_0$ )
Equivalence test fails despite small $d$	Equivalence bounds are too narrow for the sample size	Increase $n$ or use wider, substantively justified equivalence bounds
Hedges' $g$ and Cohen's $d$ differ substantially	Very small $n$ ( $< 10$ ) — bias correction is large	Report $g$ (preferred); note that estimates are unstable for very small samples

15. Quick Reference Cheat Sheet

Core Equations

Formula	Description
$t = (\bar{x}-\mu_0)/(s/\sqrt{n})$	One-sample t-statistic
$\nu = n-1$	Degrees of freedom
$p = 2\times P(T_\nu \geq \vert t \vert)$	Two-tailed p-value
$\bar{x} \pm t_{\alpha/2,\;\nu}\cdot s/\sqrt{n}$	95% CI for $\mu$
$d = (\bar{x}-\mu_0)/s$	Cohen's $d$
$d = t/\sqrt{n}$	Cohen's $d$ from $t$ -statistic
$g = d\times(1-3/(4(n-1)-1))$	Hedges' $g$ (bias-corrected)
$r = \sqrt{t^2/(t^2+\nu)}$	Point-biserial $r$ from $t$
$SE_d = \sqrt{1/n + d^2/(2(n-1))}$	SE of Cohen's $d$
$n \approx 7.84/d^2$	Required $n$ for 80% power, $\alpha=.05$

Decision Guide

Condition	Recommended Test
Normal data, known $\mu_0$	One-sample t-test
Non-normal data or ordinal	Wilcoxon signed-rank (one-sample)
Testing equivalence to $\mu_0$	TOST equivalence test
Known population $\sigma$	One-sample z-test
Quantifying evidence for $H_0$	Bayesian t-test (Bayes Factor)

Cohen's Benchmarks

$\vert d \vert$	Label
$0.20$	Small
$0.50$	Medium
$0.80$	Large

Required Sample Size

$d$	Power = 0.80	Power = 0.90
0.20	198	264
0.50	33	44
0.80	14	18
1.00	9	12

Assumes $\alpha = .05$ , two-tailed.

APA 7th Edition Reporting Templates

Standard: "A one-sample t-test indicated that [sample description] ( $M =$ [value], $SD =$ [value], $n =$ [value]) [significantly / did not significantly] differ from [the reference value / normative mean of $\mu_0$ ], $t(\nu) =$ [value], $p =$ [value], $d =$ [value] [95% CI: LB, UB]. The 95% CI for the mean, $[\text{LB}, \text{UB}]$ , [excluded / included] $\mu_0 =$ [value]."

With Hedges' $g$ : "... $g =$ [value] [95% CI: LB, UB] (bias-corrected for small sample)."

With equivalence test: "A TOST equivalence test with bounds $\pm\Delta$ demonstrated [equivalence / non-equivalence] to [reference value], 90% CI for $\bar{x}$ : $[\text{LB}, \text{UB}]$ $[\subset / \not\subset]$ $[\mu_0-\Delta, \mu_0+\Delta]$ ."

With Bayesian t-test: "The Bayesian t-test yielded $BF_{10} =$ [value], indicating [moderate / strong / extreme] evidence for [the alternative / the null] hypothesis."

Reporting Checklist

Item	Required
t-statistic with sign	✅ Always
Degrees of freedom $\nu = n-1$	✅ Always
Exact p-value	✅ Always
Sample mean, SD, and $n$	✅ Always
95% CI for $\mu$	✅ Always
Cohen's $d$ or Hedges' $g$	✅ Always
95% CI for effect size	✅ Always
Hypothesised value $\mu_0$ (stated explicitly)	✅ Always
Alternative hypothesis direction	✅ Always
Normality check result	✅ When $n < 30$
Outlier check result	✅ When $n < 30$
Hedges' $g$ instead of $d$	✅ When $n < 20$
TOST equivalence test	✅ When claiming null result
Bayes Factor	Recommended for null results
Power analysis	✅ For underpowered or null results

This tutorial provides a comprehensive foundation for understanding, conducting, and reporting one-sample t-tests within the DataStatPro application. For further reading, consult Gravetter & Wallnau's "Statistics for the Behavioral Sciences" (10th ed.), Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Lakens's "Equivalence Tests: A Practical Primer" (Social Psychological and Personality Science, 2017), and Rouder et al.'s "Bayesian t-Tests" (Psychonomic Bulletin & Review, 2009). For feature requests or support, contact the DataStatPro team.

One-Sample t-Test

One-Sample t-Test: Zero to Hero Tutorial

Table of Contents

1. Prerequisites and Background Concepts

1.1 The Concept of Statistical Inference

1.2 The Null and Alternative Hypotheses

1.3 The Standard Error of the Mean

1.4 The t-Distribution

1.5 The p-Value and Significance Level

1.6 The Central Limit Theorem

1.7 Point Estimates and Interval Estimates

2. What is a One-Sample t-Test?

2.1 The Core Question

2.2 The General Logic

2.3 When to Use the One-Sample t-Test

2.4 Real-World Applications

2.5 Distinguishing from Related Tests

3. The Mathematics Behind the One-Sample t-Test

3.1 The t-Statistic

3.2 Degrees of Freedom

3.3 Computing the p-Value

3.4 Critical Values

3.5 The 95% Confidence Interval for μ\muμ

3.6 Cohen's ddd — Effect Size for the One-Sample t-Test

3.7 Hedges' ggg — Bias-Corrected Effect Size

3.8 Exact Confidence Interval for ddd via the Non-Central t-Distribution

3.9 Statistical Power

4. Assumptions of the One-Sample t-Test

4.1 Normality of the Data (or Sampling Distribution)

4.2 Independence of Observations

4.3 Interval or Ratio Scale of Measurement

4.4 The Reference Value μ0\mu_0μ0​ Must Be Pre-Specified

4.5 No Extreme Outliers

4.6 Assumption Summary

5. Variants of the One-Sample t-Test

5.1 Standard One-Sample t-Test

5.2 One-Sample Wilcoxon Signed-Rank Test

5.3 z-Test for the Mean (Known σ\sigmaσ)

5.4 Equivalence Testing (One-Sample TOST)

5.5 Bayesian One-Sample t-Test

5.6 Trimmed Mean t-Test (Robust Variant)

6. Using the One-Sample t-Test Calculator Component

Step-by-Step Guide

7. Step-by-Step Procedure

7.1 Full Manual Procedure

Step 1 — State the Hypotheses

Step 2 — Check Assumptions

Step 3 — Compute Summary Statistics

Step 4 — Compute the t-Statistic

Step 5 — Determine Degrees of Freedom

Step 6 — Compute the p-Value

Step 7 — Construct the 95% CI for μ\muμ

Step 8 — Compute Effect Size

Step 9 — Compute Approximate 95% CI for ddd

Step 10 — Interpret and Report

8. Interpreting the Output

8.1 The t-Statistic

8.2 The p-Value

8.3 The 95% Confidence Interval

8.4 Cohen's ddd — Magnitude Interpretation

9. Effect Sizes for the One-Sample t-Test

9.1 Cohen's ddd (One-Sample)

9.2 Hedges' ggg

9.3 Point-Biserial Correlation rrr

9.4 Common Language Effect Size (CL)

9.5 Effect Size Summary Table

10. Confidence Intervals

10.1 CI for the Population Mean μ\muμ

10.2 CI Width as a Function of Sample Size

10.3 CI for Cohen's ddd

10.4 Relationship Between CI and Hypothesis Test

11. Advanced Topics

11.1 Multiple One-Sample Tests on the Same Dataset

11.2 Sensitivity Analysis: Minimum Detectable Effect

11.3 Comparing the One-Sample t-Test to the Paired t-Test

11.4 Bayesian One-Sample t-Test

11.5 TOST Equivalence Testing for the One-Sample t-Test

12. Worked Examples

Example 1: IQ in a Clinical Sample

Example 2: Quality Control — Tablet Weight

3.5 The 95% Confidence Interval for $\mu$

3.6 Cohen's $d$ — Effect Size for the One-Sample t-Test

3.7 Hedges' $g$ — Bias-Corrected Effect Size

3.8 Exact Confidence Interval for $d$ via the Non-Central t-Distribution

4.4 The Reference Value $\mu_0$ Must Be Pre-Specified

5.3 z-Test for the Mean (Known $\sigma$ )

Step 7 — Construct the 95% CI for $\mu$

Step 9 — Compute Approximate 95% CI for $d$

8.4 Cohen's $d$ — Magnitude Interpretation

9.1 Cohen's $d$ (One-Sample)

9.2 Hedges' $g$

9.3 Point-Biserial Correlation $r$

10.1 CI for the Population Mean $\mu$

10.3 CI for Cohen's $d$

Mistake 1: Choosing $\mu_0$ Based on the Sample Mean

Mistake 2: Interpreting a Non-Significant Result as "No Difference from $\mu_0$ "

Mistake 7: Not Reporting the 95% CI for $\mu$