Independent Samples t-Test: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of two-group comparison all the way through advanced implementation, Welch's correction, effect size estimation, reporting, and practical usage within the DataStatPro application. Whether you are encountering the independent samples t-test for the first time or deepening your understanding of between-group inference, this guide builds your knowledge systematically from the ground up.
Table of Contents
- Prerequisites and Background Concepts
- What is the Independent Samples t-Test?
- The Mathematics Behind the Independent Samples t-Test
- Assumptions of the Independent Samples t-Test
- Student's vs. Welch's t-Test
- Using the Independent Samples t-Test Calculator Component
- Step-by-Step Procedure
- Interpreting the Output
- Effect Sizes
- Confidence Intervals
- Advanced Topics
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
1.1 Between-Subjects vs. Within-Subjects Designs
A between-subjects design assigns different participants to different conditions. Each participant contributes exactly one score to the analysis. This is contrasted with within-subjects (repeated measures) designs where participants appear in multiple conditions.
The independent samples t-test is the appropriate test for comparing two independent groups in a between-subjects design.
1.2 The Standard Error of the Difference Between Means
When we compare two independent sample means and , we are interested in the difference . The sampling variability of this difference has a standard error:
When (equal variances), this simplifies to:
Since is unknown, we estimate it from the data using the pooled standard deviation, yielding the estimated standard error.
1.3 The Pooled Variance
When the two populations share a common variance , the pooled variance combines the within-group variance estimates from both groups:
This is a weighted average of and , where larger groups receive more weight. The pooled estimate is more stable than either group's individual estimate.
1.4 Variance Homogeneity and its Consequences
The assumption of equal population variances () is crucial for the pooled t-test. When this assumption is violated:
- With equal group sizes: the t-test is robust (Type I error stays near ).
- With unequal group sizes AND unequal variances: the t-test can have severely inflated or deflated Type I error rates.
This motivates Welch's t-test (Section 5), which does not assume equal variances.
1.5 Effect Sizes for Group Comparisons
A statistically significant result from an independent t-test tells you that the means differ beyond chance. Effect sizes quantify how much they differ in standardised units:
- Cohen's : Mean difference in pooled standard deviation units.
- Hedges' : Bias-corrected Cohen's .
- Glass's : Uses only one group's SD (usually control).
1.6 The Relationship Between t and F
For exactly two groups, the independent samples t-test and one-way ANOVA yield identical p-values: . The t-test is simpler and preferred for two-group comparisons; ANOVA generalises to three or more groups.
2. What is the Independent Samples t-Test?
2.1 The Core Question
The independent samples t-test answers: "Do two independent, unrelated groups have the same population mean?" or equivalently, "Is the observed mean difference between two groups larger than we would expect from random sampling variability alone?"
2.2 The Two Versions
| Version | Assumption | Preferred When |
|---|---|---|
| Student's t-test | Equal population variances () | Confirmed equal variances; historical compatibility |
| Welch's t-test | Unequal population variances allowed | Default recommendation; any situation |
The modern consensus: Use Welch's t-test as the default. When variances are truly equal, Welch's loses negligible power. When variances are unequal, Welch's maintains correct Type I error whereas Student's does not.
2.3 When to Use the Independent Samples t-Test
| Condition | Requirement |
|---|---|
| Number of groups | Exactly two |
| Relationship between groups | Independent (different participants) |
| Outcome variable | Continuous (interval or ratio scale) |
| Distribution | Approximately normal (or per group) |
| Variances | Equal (Student's) or potentially unequal (Welch's) |
2.4 Real-World Applications
| Field | Research Question |
|---|---|
| Clinical | Does CBT reduce anxiety more than a control condition? |
| Education | Do students taught by Method A score higher than those taught by Method B? |
| Marketing | Do customers rate Brand A higher than Brand B? |
| Medicine | Does Drug A lower blood pressure more than a placebo? |
| Organisational | Do remote workers report higher job satisfaction than office workers? |
| Neuroscience | Do patients with depression have different cortisol levels than healthy controls? |
| Sport | Do athletes trained with Method X have faster sprint times than those trained with Method Y? |
3. The Mathematics Behind the Independent Samples t-Test
3.1 Student's t-Statistic (Equal Variances)
Where the pooled standard deviation is:
Degrees of freedom:
3.2 Welch's t-Statistic (Unequal Variances)
Welch-Satterthwaite degrees of freedom:
is generally non-integer and always (fewer or equal df than Student's, making Welch's more conservative when variances are equal).
3.3 The p-Value
Two-tailed:
One-tailed (upper):
3.4 Confidence Intervals
Student's 95% CI for :
Welch's 95% CI for :
3.5 Cohen's — Standardised Mean Difference
Hedges' (bias-corrected):
Glass's (control group SD as standardiser):
Average SD standardiser (when neither group is a natural reference):
3.6 Computing from the t-Statistic
For equal group sizes ():
3.7 Exact CI for via Non-Central t-Distribution
The t-statistic follows a non-central t-distribution under with non-centrality:
Exact 95% CI for : invert this numerically (computed automatically by DataStatPro).
Approximate CI (adequate for per group):
3.8 Common Language Effect Size
Interpretation: the probability that a randomly selected person from Group 1 scores higher than a randomly selected person from Group 2.
| Interpretation | ||
|---|---|---|
| 0.00 | 50.0% | No difference |
| 0.20 | 55.6% | Small |
| 0.50 | 63.8% | Medium |
| 0.80 | 71.4% | Large |
| 1.00 | 76.0% | |
| 1.50 | 85.6% | Very large |
3.9 Statistical Power
For equal group sizes, the non-centrality parameter:
(for )
Required per group for power , two-sided :
For , power :
Required per group:
| Power = 0.80 | Power = 0.90 | Power = 0.95 | |
|---|---|---|---|
| 0.20 | 394 | 527 | 651 |
| 0.35 | 130 | 174 | 215 |
| 0.50 | 64 | 85 | 105 |
| 0.80 | 26 | 34 | 42 |
| 1.00 | 17 | 22 | 27 |
| 1.50 | 8 | 11 | 13 |
4. Assumptions of the Independent Samples t-Test
4.1 Normality
Data in each group should be approximately normally distributed. The test is robust to mild non-normality, especially when per group.
How to check: Shapiro-Wilk (per group), Q-Q plots, histograms, skewness/kurtosis.
When violated: Use Mann-Whitney U test (non-parametric alternative).
4.2 Homogeneity of Variance (for Student's t)
Student's t-test requires .
How to check:
- Levene's test (: equal variances): preferred, robust to non-normality.
- Brown-Forsythe test: more robust for non-normal data.
- Variance ratio: if , heterogeneity is substantial.
- -test (): sensitive to non-normality — use with caution.
When violated: Use Welch's t-test — the recommended default for all independent samples comparisons regardless of Levene's test result.
4.3 Independence of Observations
All observations within and across groups must be independent. No participant should contribute scores to both groups.
Common violations:
- Using the same participant in both "groups" (should use paired t-test).
- Clustered data (e.g., students in the same classroom).
- Family or sibling pairs.
When violated: Use paired t-test (if within-subjects) or multilevel models (if clustered).
4.4 Independence Between Groups
The two groups themselves must be independent. Their scores should not be systematically related (e.g., no matching, no family relationships between groups).
4.5 Interval Scale of Measurement
The DV must be measured on at least an interval scale.
When violated: Use Mann-Whitney U test.
4.6 Absence of Severe Outliers
Outliers distort both and , biasing the t-statistic.
How to check: Boxplots per group; per group.
When outliers present: Investigate; report with and without; consider Welch's t-test (more robust) or Mann-Whitney U.
4.7 Assumption Summary
| Assumption | Student's | Welch's | How to Check | Remedy |
|---|---|---|---|---|
| Normality per group | ✅ | ✅ | Shapiro-Wilk, Q-Q | Mann-Whitney U |
| Equal variances | ✅ | ❌ | Levene's | Use Welch's |
| Independence within groups | ✅ | ✅ | Design review | Multilevel model |
| Independence between groups | ✅ | ✅ | Design review | Paired t-test |
| Interval scale | ✅ | ✅ | Measurement theory | Mann-Whitney U |
| No severe outliers | ✅ | ✅ | Boxplots | Investigate; robust test |
5. Student's vs. Welch's t-Test
5.1 Performance Under Different Conditions
Simulation studies (Ruxton, 2006; Delacre et al., 2017) consistently show:
| Condition | Student's Type I Error | Welch's Type I Error |
|---|---|---|
| Equal , equal | ≈ | ≈ |
| Equal , unequal | ≈ (robust) | ≈ |
| Unequal , equal | ≈ | ≈ (slightly conservative) |
| Unequal , unequal (larger in larger group) | (anti-conservative) | ≈ |
| Unequal , unequal (larger in smaller group) | (liberal) | ≈ |
5.2 Power Comparison
When variances are truly equal:
- Student's has slightly higher power than Welch's (by approximately 0.2–1%).
- This tiny advantage does not justify the risk of inflated Type I error when variances are unequal.
When variances are unequal:
- Welch's maintains appropriate Type I error; Student's does not.
- Welch's has higher valid power.
Recommendation: Always use Welch's t-test as the default. DataStatPro reports both but highlights Welch's results.
5.3 The Decision Framework
For an independent samples comparison:
├── Default: Use Welch's t-test (regardless of Levene's result)
└── If comparability with historical Student's results is needed:
├── Levene's p > .05: Either test is acceptable
└── Levene's p ≤ .05: Use Welch's (do NOT use Student's)
💡 The practice of running Levene's test first and then "choosing" Student's vs. Welch's based on the result (the "pre-test" approach) leads to inflated Type I error because the selection itself is data-driven. Simply using Welch's universally avoids this problem.
6. Using the Independent Samples t-Test Calculator Component
Step-by-Step Guide
Step 1 — Select the Test
Navigate to Statistical Tests → t-Tests → Independent Samples t-Test.
Step 2 — Input Method
- Raw data: Upload data with a group indicator variable and outcome variable. DataStatPro auto-identifies groups and runs all assumption checks.
- Summary statistics: Enter , , for each group.
- t-statistic + df: Enter , , , from a published result.
Step 3 — Specify the Comparison
- Designate which group is "Group 1" (the reference or treatment group).
- The sign of will be positive when Group 1 > Group 2.
- Label groups clearly for interpretable output.
Step 4 — Select Variance Assumption
- Welch's (recommended): No equal variance assumption.
- Student's (equal variances): Uses pooled SD.
- Both (default): Computes both and flags discrepancies.
Step 5 — Select Effect Size Standardiser
When variances are unequal, DataStatPro offers:
- Pooled SD (Cohen's ) — standard but potentially misleading.
- Control group SD (Glass's ) — recommended for treatment-control designs.
- Average SD () — recommended when neither group is a reference.
- All three — displayed together for full reporting.
Step 6 — Select Display Options
- ✅ Full results table (both Student's and Welch's).
- ✅ Group descriptive statistics (mean, SD, SE, 95% CI).
- ✅ Levene's and Brown-Forsythe test results.
- ✅ Shapiro-Wilk normality test per group.
- ✅ Cohen's , Hedges' , Glass's with exact 95% CIs.
- ✅ Common Language Effect Size (CL) and statistic.
- ✅ Two overlapping distribution curves with shaded difference region.
- ✅ Power analysis and required for 80/90/95% power.
- ✅ Equivalence test (TOST) for demonstrating practical equivalence.
- ✅ APA 7th edition results paragraph.
Step 7 — Run the Analysis
Click "Run Independent t-Test". All results, plots, and the APA paragraph are generated automatically.
7. Step-by-Step Procedure
7.1 Full Manual Procedure (Welch's t-Test)
Step 1 — State Hypotheses
(two-tailed)
Or equivalently: vs.
Step 2 — Check Assumptions
- Shapiro-Wilk per group (or Q-Q plots).
- Levene's test for variance homogeneity.
- Boxplots for outliers.
- Confirm design independence.
Step 3 — Compute Summary Statistics
Step 4 — Compute Standard Error Components
Step 5 — Compute t-Statistic
Step 6 — Compute Welch-Satterthwaite df
Round down to the nearest integer.
Step 7 — Compute p-Value
Reject if .
Step 8 — Compute 95% CI for
Step 9 — Compute Effect Sizes
Step 10 — Interpret and Report
Use APA template from Section 15.
8. Interpreting the Output
8.1 Reading the Results Table
| Output | What It Tells You |
|---|---|
| -statistic | How many SEs the mean difference is from zero |
| df (Welch) | Effective degrees of freedom (accounts for unequal variances) |
| p-value | Probability of this or more extreme difference under |
| Mean difference | Raw unstandardised difference |
| 95% CI for difference | Range of plausible values for |
| Cohen's | Standardised effect in SD units |
| 95% CI for | Precision of the effect size estimate |
| Levene's | Evidence against equal variances |
8.2 The Direction of the Effect
The sign of and indicates direction:
- Positive : Group 1 mean is higher than Group 2 mean.
- Negative : Group 1 mean is lower than Group 2 mean.
Always state which group is higher in words — signs alone can be misinterpreted.
8.3 When Student's and Welch's Give Different Conclusions
Disagreement between the two tests signals that variances are unequal AND sample sizes differ. In this case:
- Trust Welch's result — it maintains correct Type I error.
- Note the discrepancy in the results section.
- Report only Welch's in the primary results.
8.4 Cohen's Benchmarks
| | Cohen Label | CL (%) | (%) | | :----- | :---------- | :----- | :-------- | | 0.20 | Small | 55.6 | 57.9 | | 0.50 | Medium | 63.8 | 69.1 | | 0.80 | Large | 71.4 | 78.8 | | 1.00 | | 76.0 | 84.1 | | 1.20 | Very large | 80.2 | 88.5 | | 2.00 | Huge | 92.1 | 97.7 |
9. Effect Sizes
9.1 Choosing the Right Standardiser
| Scenario | Recommended Effect Size | Standardiser |
|---|---|---|
| Equal variances, no reference group | Cohen's | Pooled SD |
| Unequal variances, no reference group | Average SD | |
| Treatment vs. control design | Glass's | Control group SD |
| Small samples () | Hedges' | Pooled SD (bias-corrected) |
| Meta-analysis or cross-study comparison | Hedges' | Pooled SD (bias-corrected) |
9.2 Variance Overlap Statistics
| Statistic | Formula | Interpretation |
|---|---|---|
| Proportion of distributions NOT overlapping | ||
| Proportion of Group 2 exceeded by Group 1 median | ||
| Proportion of Group 2 below the Group 1 mean |
Example for :
Interpretation: 78.8% of Group 2 participants score below the mean of Group 1.
9.3 Effect Sizes for Unequal Variances
When Levene's test is significant (), the choice of standardiser matters:
Glass's standardises by the control/reference group SD:
Interpretation: The treatment group mean is standard deviation units above the control group distribution — directly interpretable in terms of how many control-group SDs the treatment group has moved.
When variance ratio : Strongly prefer Glass's or over Cohen's (which uses the pooled SD and is misleading when variances differ substantially).
10. Confidence Intervals
10.1 CI for the Mean Difference (Unstandardised)
The 95% CI for provides the most directly interpretable estimate in the original measurement units:
Interpretation rules:
| CI Outcome | Conclusion |
|---|---|
| Entirely positive | Group 1 is significantly higher than Group 2 |
| Entirely negative | Group 1 is significantly lower than Group 2 |
| Contains zero | Not significant at |
| Narrow CI | Precise estimate of the mean difference |
| Wide CI | Imprecise; large needed for better precision |
10.2 CI for Cohen's
Approximate 95% CI:
Exact CI: Uses non-central t-distribution (DataStatPro default).
10.3 Precision as a Function of
For equal group sizes and :
| per group | Approx CI Width for |
|---|---|
| 10 | 1.80 |
| 20 | 1.28 |
| 50 | 0.81 |
| 100 | 0.57 |
| 200 | 0.40 |
| 500 | 0.25 |
11. Advanced Topics
11.1 Equivalence Testing for Independent Groups
When claiming two groups are practically equivalent (e.g., two interventions are equally effective), use the TOST procedure:
Specify equivalence bounds in raw mean difference units.
90% CI for must fall within .
Or equivalently, specify (the standardised equivalence margin) and test whether the 90% CI for falls within .
11.2 Bootstrap Confidence Intervals
When normality is violated and samples are small, bootstrap CIs for the mean difference and Cohen's are more trustworthy than t-distribution-based CIs:
- Draw bootstrap samples (with replacement) from each group.
- Compute for each bootstrap sample.
- 95% CI: 2.5th and 97.5th percentiles of the bootstrap distribution.
DataStatPro computes bootstrap CIs automatically when raw data are provided.
11.3 Bayesian Independent Samples t-Test
quantifies evidence for vs. , computed from and (or , ) using the Rouder et al. (2009) default prior. Particularly valuable for null results — can provide positive evidence that the two groups are equivalent.
11.4 Unequal Sample Sizes and Optimal Allocation
When one group is cheaper or easier to sample, unequal allocation can improve statistical power for a fixed total . For two groups with costs and per participant, optimal allocation:
When costs are equal: — allocate more participants to the more variable group.
11.5 Heterogeneity of Variance: When It Matters Substantively
Beyond the technical issue of test validity, unequal variances have substantive implications: if a treatment not only changes the mean but also changes the variability (e.g., a drug works for some patients but not others), the variance difference is itself a scientifically important finding. Always report and discuss unequal variances when they are substantial.
12. Worked Examples
Example 1: CBT vs. Waitlist — Anxiety Scores
A clinical trial randomises participants to CBT and to a waitlist control. Anxiety is measured post-treatment (GAD-7; range 0–21).
| Group | Mean | SD | |
|---|---|---|---|
| CBT | 35 | ||
| Waitlist | 35 |
Levene's test: , → unequal variances → use Welch's.
Welch's t-statistic:
Welch-Satterthwaite df:
Rounded: .
p-value:
95% CI for mean difference:
Effect sizes:
(Large)
Glass's (standardised by waitlist SD):
(Large)
Hedges' :
→ CBT participants have lower anxiety than 81.6% of waitlist participants.
APA write-up: "Due to significant variance heterogeneity (Levene's , ), Welch's t-test was applied. CBT participants (, ) showed significantly lower post-treatment anxiety than waitlist controls (, ), , , [95% CI: , ]. This represents a large treatment effect. CBT participants scored lower than 81.6% of waitlist participants (CL = 81.6%). The mean difference of 5.3 GAD-7 points [95% CI: 3.31, 7.29] exceeds the clinically meaningful threshold of 4 points."
Example 2: Reaction Times — Experimental vs. Control
An experimental psychologist compares reaction times (ms) between two attention conditions: focused () and divided ().
| Group | Mean (ms) | SD | |
|---|---|---|---|
| Focused | 25 | ||
| Divided | 30 |
Levene's test: , → variances not significantly different. Use Welch's (recommended default regardless):
95% CI:
Cohen's :
(Large)
APA write-up: "Welch's independent samples t-test revealed that focused attention participants ( ms, ms) had significantly faster reaction times than divided attention participants ( ms, ms), , , [95% CI: , ]. The mean difference of 52.4 ms [95% CI: 30.5, 74.3 ms] represents a large effect of attention condition."
13. Common Mistakes and How to Avoid Them
Mistake 1: Using Student's Instead of Welch's as the Default
Problem: Defaulting to Student's t-test without considering whether the equal-variance assumption holds. When groups differ in both size and variance, Student's t-test produces invalid p-values.
Solution: Use Welch's t-test as the universal default for independent samples comparisons. The power cost when variances are truly equal is negligible.
Mistake 2: Running the Independent t-Test on Paired Data
Problem: Treating matched pairs or pre-post measurements as independent groups. This inflates the error term (ignores within-person correlation) and substantially reduces power.
Solution: Before choosing a test, ask: "Did the same participants contribute to both groups?" If yes, use the paired t-test.
Mistake 3: Not Reporting Glass's When Variances Are Unequal
Problem: Reporting Cohen's (using pooled SD) when . The pooled SD is a blend of two different distributions — not an appropriate standardiser for either group.
Solution: When Levene's is significant, report Glass's (using the control group SD) or (average of both SDs) alongside Cohen's .
Mistake 4: Conflating Statistical Significance with Practical Importance
Problem: Reporting and concluding the effect is "large." With per group, a difference of 0.5 points on a 100-point scale produces with — trivially small.
Solution: Always report Cohen's with its 95% CI. Interpret the magnitude in the context of the measurement scale and the research domain.
Mistake 5: Ignoring the CI for the Mean Difference
Problem: Reporting only and without the 95% CI for . The CI provides the most directly actionable information — the range of plausible values for the true mean difference in the original units.
Solution: Always report the 95% CI for the mean difference in the abstract or results section. In clinical research, compare this CI to established minimal clinically important differences (MCIDs).
Mistake 6: Making Multiple Independent t-Tests Instead of ANOVA
Problem: Comparing three or more groups with all possible pairwise t-tests, inflating the familywise error rate.
Solution: Use one-way ANOVA (or Welch's ANOVA) followed by appropriate post-hoc tests when comparing three or more groups.
Mistake 7: Not Checking Outliers Before Running the Test
Problem: A single extreme value can drastically shift the mean and inflate the SD within a small group, producing either a falsely significant or falsely non-significant result.
Solution: Always inspect boxplots per group. Investigate outliers and report analyses with and without them. Welch's t-test is more robust to outliers than Student's when outliers affect variance.
14. Troubleshooting
| Problem | Likely Cause | Solution |
|---|---|---|
| Student's and Welch's give very different -values | Unequal variances with unequal | Trust Welch's; report Levene's result |
| Welch's df is very small | One group has very small or near-zero variance | Check data; use exact permutation test |
| is positive but is negative | Group labelling: Group 2 > Group 1 | Relabel or state direction explicitly |
| Levene's is significant but s are equal | Even with equal , if difference is very large, consider Glass's | Report both and ; note variance heterogeneity |
| -value is significant but CI for includes zero | Rounding error or very wide CI | Use exact CI from non-central ; check calculations |
| Bootstrap CI disagrees with -distribution CI | Non-normality in small sample | Trust bootstrap CI; note non-normality |
| Large but non-significant | Underpowered study | Report power; conduct sensitivity analysis; plan larger replication |
| Very wide CI for | Small per group | Report as genuine uncertainty; plan adequately powered study |
| Effect size changes substantially with vs. without outlier | Outlier has large leverage | Report both analyses; consider robust test |
15. Quick Reference Cheat Sheet
Core Equations
| Formula | Description |
|---|---|
| Student's t-statistic | |
| Pooled SD | |
| Welch's t-statistic | |
| Welch-Satterthwaite df | |
| Student's df | |
| Cohen's | |
| from -statistic | |
| Hedges' | |
| Glass's | |
| Common Language Effect Size | |
| $U_3 = \Phi( | d |
| Required /group (80% power, ) |
Variance Standardiser Selection
| Condition | Use |
|---|---|
| Equal variances, no reference | Cohen's (pooled SD) |
| Unequal variances, treatment vs. control | Glass's (control SD) |
| Unequal variances, no reference | (average SD) |
| Small (any) | Hedges' |
| Meta-analysis | Hedges' |
APA 7th Edition Reporting Templates
Welch's (recommended): "[Group 1] ( [value], [value], [value]) and [Group 2] ( [value], [value], [value]) were compared using Welch's independent samples t-test. [Levene's test result here if relevant.] The test revealed [a significant / no significant] difference, [value], [value], [value] [95% CI: LB, UB]. The mean difference was [value] [original units] [95% CI: LB, UB]."
Student's (when variances confirmed equal): "... [value], [value], [value] [95% CI: LB, UB]."
With Glass's : "... Glass's [value] [95% CI: LB, UB] (standardised by the control group SD)."
Reporting Checklist
| Item | Required |
|---|---|
| t-statistic with sign | ✅ Always |
| Degrees of freedom (specify Welch or Student) | ✅ Always |
| Exact p-value | ✅ Always |
| Means and SDs for both groups | ✅ Always |
| Sample sizes for both groups | ✅ Always |
| 95% CI for mean difference | ✅ Always |
| Cohen's or Hedges' with 95% CI | ✅ Always |
| Which test used (Student's vs. Welch's) | ✅ Always |
| Levene's test result | ✅ Always for independent designs |
| Normality check per group | ✅ When per group |
| Glass's | ✅ When variances are unequal |
| CL effect size | Recommended |
| Power analysis | ✅ For null or underpowered results |
| Equivalence test | ✅ When claiming equivalence |
This tutorial provides a comprehensive foundation for understanding, conducting, and reporting independent samples t-tests within the DataStatPro application. For further reading, see Ruxton (2006) "The unequal variance t-test is an underused alternative" (Behavioral Ecology), Delacre, Lakens & Leys (2017) "Why Psychologists Should by Default Use Welch's t-Test" (International Review of Social Psychology), and Lakens (2013) "Calculating and Reporting Effect Sizes" (Frontiers in Psychology). For feature requests or support, contact the DataStatPro team.