Effect Size Calculator: Zero to Hero Tutorial
This comprehensive tutorial takes you from the foundational concepts of Effect Sizes all the way through advanced estimation, interpretation, reporting, and practical usage within the DataStatPro application. Whether you are encountering effect sizes for the first time or looking to deepen your understanding of practical significance in research, this guide builds your knowledge systematically from the ground up.
Table of Contents
- Prerequisites and Background Concepts
- What is an Effect Size?
- The Mathematics Behind Effect Sizes
- Assumptions of Effect Size Estimation
- Types of Effect Sizes
- Using the Effect Size Calculator Component
- Effect Sizes for Mean Differences
- Effect Sizes for Variance Explained
- Effect Sizes for Associations and Categorical Data
- Model Fit and Evaluation
- Advanced Topics
- Worked Examples
- Common Mistakes and How to Avoid Them
- Troubleshooting
- Quick Reference Cheat Sheet
1. Prerequisites and Background Concepts
Before diving into effect sizes, it is essential to be comfortable with the following foundational statistical concepts. Each is briefly reviewed below.
1.1 Statistical Significance vs. Practical Significance
A p-value answers the question: "If the null hypothesis were true, how likely is it that we would observe data at least as extreme as what we actually observed?"
A small p-value tells us the result is unlikely under — but it does not tell us how large the effect is or whether it matters in practice.
Consider two studies, both with :
- Study A: , mean difference = 0.2 points on a 100-point scale.
- Study B: , mean difference = 15 points on a 100-point scale.
Study A has a highly significant but trivially small effect. Study B has a large, practically meaningful effect. Effect sizes quantify the magnitude of an effect independently of sample size — they answer the question: "How big is the effect?"
1.2 Standard Deviation and Variance
The standard deviation (population) or (sample) measures the spread of a distribution:
Most effect sizes for mean differences are standardised by dividing the raw difference by a standard deviation. This makes the effect size unit-free and comparable across studies using different measurement scales.
1.3 The Normal Distribution
Many effect size formulas assume that data come from normally distributed populations. The standard normal distribution is used to convert effect sizes into probabilities such as the common language effect size and probability of superiority.
The relationship between an effect size and the area of non-overlap between two normal distributions is fundamental to interpreting effect sizes in terms of real-world probabilities.
1.4 Correlation and Covariance
The Pearson correlation coefficient is a standardised measure of linear association:
It ranges from to and is itself an effect size for the strength of a linear relationship between two continuous variables.
1.5 Variance Decomposition
Many effect sizes for ANOVA and regression are ratios of variances:
Understanding the decomposition of variance into between-group (explained) and within-group (unexplained) components is essential for interpreting these effect sizes.
1.6 Confidence Intervals
A confidence interval (CI) for an effect size gives a range of plausible values for the true population effect, given the sample data. A 95% CI means that if we repeated the study 100 times, approximately 95 of the resulting intervals would contain the true population effect size.
Always report effect sizes with confidence intervals — a point estimate alone is insufficient because it conveys no information about precision or uncertainty.
1.7 The Non-Central Distributions
Effect sizes such as Cohen's and follow non-central distributions in finite samples — their sampling distributions are not symmetric, especially when the true population effect is non-zero.
- Cohen's follows a non-central -distribution.
- and follow non-central F-distributions.
Confidence intervals for these effect sizes must account for the non-centrality of the sampling distribution, which is why exact CIs require iterative numerical methods rather than simple formulas.
2. What is an Effect Size?
2.1 The Core Idea
An effect size is a standardised, scale-free numerical index that quantifies the magnitude of a phenomenon — how large a difference is, how strong an association is, or how much variance is explained. Effect sizes are:
- Unitless: Not expressed in the original measurement units, enabling comparison across studies.
- Sample-size independent: Unlike -values, effect sizes do not change simply because the sample size changes.
- Interpretable: Translate statistical results into practically meaningful statements.
- Meta-analytically combinable: Effect sizes from multiple studies can be aggregated in a meta-analysis.
2.2 Why Effect Sizes Are Essential
The limitations of p-values alone:
- With large samples, even trivially small effects produce significant p-values.
- With small samples, even large effects may be non-significant.
- A p-value carries no information about the size or direction of an effect.
- p-values cannot be meaningfully compared across studies with different sample sizes.
What effect sizes add:
- Quantify the practical significance of a finding.
- Enable power analysis for planning future studies.
- Facilitate meta-analysis by providing a common metric.
- Allow assessment of clinical significance in applied settings.
- Provide context for evaluating whether a finding is worth acting upon.
2.3 The Effect Size Framework
Every effect size belongs to one of three broad families:
| Family | What It Measures | Examples |
|---|---|---|
| -family | Standardised mean differences | Cohen's , Hedges' , Glass's |
| -family | Strength of association | Pearson , , , , |
| Risk/Odds family | Probability-based contrasts | Odds ratio, Risk ratio, NNT, ARD |
2.4 Real-World Applications
| Field | Effect Size Application | Common Measure |
|---|---|---|
| Clinical Psychology | Effectiveness of CBT vs. control on depression | Cohen's , Hedges' |
| Medicine | Drug vs. placebo on blood pressure | Cohen's , Risk ratio, NNT |
| Education | Effect of tutoring on exam scores | Cohen's , |
| Marketing | Brand A vs. B on purchase intent | Cohen's , Cramér's |
| Neuroscience | Brain region activation between groups | Cohen's , |
| Genetics | SNP association with disease risk | Odds ratio, |
| Organisational Psychology | Leadership training on productivity | Cohen's , |
| Public Health | Vaccination programme on infection rate | Risk ratio, ARD, NNT |
| Ecology | Species richness across habitats | Cohen's , |
2.5 Statistical Significance vs. Effect Size: A Unified View
The relationship between sample size, effect size, and statistical significance can be summarised by the power equation. For a -test:
This shows that the -statistic (and therefore p-value) is a joint function of both the effect size AND the sample size . A non-significant result could mean:
- The true effect is zero or negligible, OR
- The sample is too small to detect a real effect (low power).
A significant result could mean:
- There is a genuine, meaningful effect, OR
- The sample is so large that even a trivially small effect becomes significant.
Effect sizes disentangle magnitude from sample size.
3. The Mathematics Behind Effect Sizes
3.1 Cohen's — The Fundamental Standardised Mean Difference
Cohen's is the cornerstone effect size for comparing two means. It expresses the difference between two means in standard deviation units.
For two independent groups:
Where the pooled standard deviation is:
For a one-sample design (comparing a sample mean to a known population value ):
For a paired/repeated-measures design:
Where is the mean of the difference scores and is the standard deviation of the difference scores.
Interpretation: means the two group means are 1 standard deviation apart — a group with mean 50 differs from a group with mean 60 if both have .
3.2 Hedges' — Bias-Corrected Cohen's
Cohen's is slightly positively biased in small samples — it overestimates the true population effect size. Hedges' applies a correction factor to remove this bias:
Where the correction factor is:
With degrees of freedom (for independent samples) or (for one-sample or paired designs).
A more precise version uses the gamma function:
The bias is negligible for per group but can be substantial () for very small samples ().
3.3 Glass's — Using the Control Group SD
When the two groups have different variances (especially in pre-post or treatment-control designs where the treatment may change variability), Glass's standardises by only the control group standard deviation:
This makes the effect size interpretable as "how many standard deviation units above (or below) the control group distribution is the average treatment participant?"
3.4 Confidence Intervals for Using the Non-Central -Distribution
The exact 95% CI for Cohen's uses the non-central -distribution. The observed -statistic has a non-central -distribution with non-centrality parameter:
The confidence limits for are found by solving:
and
Then converting back to :
This requires numerical iteration (no closed form) and is computed automatically by DataStatPro.
An approximate 95% CI (adequate for per group) uses:
3.5 Eta Squared () — Proportion of Variance Explained
Eta squared is the proportion of total variance in the dependent variable attributable to the independent variable (group membership in ANOVA):
For a one-way ANOVA:
Relationship to Cohen's (two groups only):
Limitation: is biased upward — it overestimates the population effect because it uses the total sum of squares from the sample. It should not be reported for multi-factor ANOVA (use partial or instead).
3.6 Partial Eta Squared () — Controlling for Other Effects
In factorial ANOVA (multiple IVs), partial eta squared estimates the proportion of variance explained by one effect after removing the variance attributable to other effects:
Note that in a one-way ANOVA (single IV), . In multi-factor ANOVA, for every effect, and the sum of all partial values can exceed 1.0.
⚠️ Because partial values can sum to more than 1.0 across all effects in a factorial design, they should never be compared to the "proportion of total variance explained" — that interpretation applies only to , not .
3.7 Omega Squared () — Unbiased Variance-Explained Effect Size
Omega squared () is a bias-corrected version of that better estimates the population proportion of variance explained:
For one-way ANOVA:
Where:
- = number of groups.
- = mean square within groups.
Partial omega squared for factorial designs:
is generally preferred over because it does not inflate with small samples and provides a less biased estimate of the population effect.
3.8 Epsilon Squared () — Another Unbiased Estimate
Epsilon squared is an alternative to omega squared, computationally simpler:
Like , corrects for positive bias and can be slightly negative in small samples when the true population effect is near zero.
3.9 Cohen's and — Effect Size for ANOVA and Regression
Cohen's converts variance-explained effect sizes into a ratio suitable for power analysis:
Or from :
Cohen's is used for multiple regression and includes several variants:
Global (overall model fit):
Local (effect of a specific predictor or set of predictors, controlling for others):
3.10 Pearson's and — Correlation and Coefficient of Determination
Pearson's is the effect size for the linear relationship between two continuous variables:
(the coefficient of determination) is the proportion of variance in explained by :
Confidence interval for using Fisher's -transformation:
95% CI for :
Converting CI bounds back to :
3.11 Odds Ratio, Risk Ratio, and Number Needed to Treat
For binary outcomes (event vs. no event) in two groups, the primary effect sizes are:
Contingency Table:
| Event | No Event | Total | |
|---|---|---|---|
| Group 1 (Treatment) | |||
| Group 2 (Control) |
Risk (Probability) in each group:
Absolute Risk Difference (ARD):
Risk Ratio (Relative Risk, RR):
Odds Ratio (OR):
Number Needed to Treat (NNT):
NNT is the number of patients who must receive the treatment for one additional patient to benefit (or be harmed, if NNT is expressed as NNH — Number Needed to Harm).
95% CI for the log Odds Ratio:
Where:
Back-transforming:
3.12 Effect Sizes for Categorical Association
Phi coefficient () for tables:
Equivalent to Pearson for two binary variables. Ranges from to .
Cramér's for contingency tables ( rows, columns):
Ranges from 0 (no association) to 1 (perfect association).
Cohen's for goodness-of-fit and tests:
Where are the null (expected) proportions and are the alternative (observed/hypothesised) proportions.
3.13 Rank-Biserial Correlation
The rank-biserial correlation () is the effect size for the Mann-Whitney U test (non-parametric alternative to Cohen's when normality is not assumed):
Or equivalently:
Where and are the mean ranks of the two groups, and .
Ranges from to . means that 75% of observations in Group 1 exceed those in Group 2.
4. Assumptions of Effect Size Estimation
4.1 Correct Scale and Direction of Variables
Effect sizes are only meaningful when variables are measured on an appropriate scale and when the direction of differences is clearly defined.
Why it matters: Reversing the direction of scoring (e.g., higher score = worse outcome vs. higher score = better outcome) changes the sign of the effect size. Ambiguous scoring leads to misinterpretation.
How to check: Before computing any effect size, clearly state:
- What constitutes a positive vs. negative effect.
- Whether higher or lower scores are better.
- What the reference/baseline condition is.
4.2 Normally Distributed Populations (for -family)
Cohen's , Hedges' , and Glass's assume that the observed scores come from normally distributed populations. Violations of normality can distort the pooled standard deviation and produce misleading effect size estimates.
How to check:
- Shapiro-Wilk test for small samples ().
- Q-Q plots and histograms for larger samples.
- Skewness () and kurtosis ().
When violated: Use rank-biserial correlation () for the Mann-Whitney test or common language effect size (CL) as alternatives that do not assume normality.
4.3 Equal or Known Population Variances
Cohen's uses the pooled standard deviation, which implicitly assumes homogeneity of variance. When population variances differ substantially:
- Glass's is preferred (uses only the control group SD).
- Use Welch's (sometimes called ), which uses only the denominator group's SD.
- Report both groups' variances alongside the effect size.
Variance ratio rule of thumb: If , consider using Glass's rather than Cohen's .
4.4 Independence of Observations
Effect sizes based on means (Cohen's ), correlations (), and variance-explained measures () all assume that observations are independent of each other.
When violated:
- Paired designs: Use paired (based on difference scores) rather than independent samples .
- Clustered data: Use multilevel effect sizes (e.g., intraclass correlation for between-cluster effects).
- Repeated measures: Report generalised or partial from the correct ANOVA model.
4.5 Adequate Sample Size for Stable Estimates
Effect size estimates are very unstable in small samples. The sampling variability of can be enormous with per group:
| per group | (for true ) | 95% CI width |
|---|---|---|
| 5 | 0.70 | 2.74 |
| 10 | 0.46 | 1.80 |
| 20 | 0.33 | 1.28 |
| 50 | 0.21 | 0.81 |
| 100 | 0.15 | 0.57 |
| 200 | 0.10 | 0.40 |
This table shows that with only 5 observations per group, the 95% CI for spans nearly 3 standard deviation units — essentially uninformative. Effect sizes require adequate sample sizes to be interpretable.
4.6 No Selective Reporting (Publication Bias)
When effect sizes are extracted from published literature, they are subject to publication bias — studies with larger, significant effects are more likely to be published than those with small, non-significant effects. This means that the average published effect size overestimates the true population effect.
Remedies:
- Pre-register studies to reduce selective reporting.
- In meta-analysis, use funnel plots and Egger's test to detect publication bias.
- Use trim-and-fill methods to correct for publication bias.
5. Types of Effect Sizes
5.1 The Three Families of Effect Sizes
The -Family (Standardised Mean Differences)
These effect sizes express the difference between means in standard deviation units.
| Effect Size | Formula | Standardiser | Best For |
|---|---|---|---|
| Cohen's | Pooled SD | Independent samples, equal variances | |
| Hedges' | Pooled SD (bias-corrected) | Small samples ( per group) | |
| Glass's | Control group SD | Unequal variances; treatment-control | |
| SD of differences | Paired/repeated measures | ||
| Average of group SDs | Paired; avoids population assumption | ||
| — | Paired; directly from -test |
The -Family (Variance-Explained and Correlation)
These effect sizes express how much of the total variance is explained by the effect.
| Effect Size | Formula | Range | Best For |
|---|---|---|---|
| Pearson | Linear correlation | ||
| Simple regression | |||
| (multiple) | Multiple regression | ||
| One-way ANOVA | |||
| Factorial ANOVA | |||
| Bias-corrected | ANOVA (preferred) | ||
| Bias-corrected | Factorial ANOVA (preferred) | ||
| Alternative bias correction | One-way ANOVA | ||
| Cohen's | Power analysis for ANOVA | ||
| Cohen's | Power analysis for regression | ||
| Rank-biserial | Mann-Whitney U test |
The Risk/Odds Family
These effect sizes are appropriate for binary outcomes.
| Effect Size | Formula | Range | Best For |
|---|---|---|---|
| Absolute Risk Difference | Clinical decision-making | ||
| Risk Ratio (RR) | Prospective / cohort studies | ||
| Odds Ratio (OR) | Case-control studies | ||
| Number Needed to Treat | Clinical applicability | ||
| Phi () | (for ) | tables | |
| Cramér's | tables | ||
| Cohen's | goodness-of-fit |
5.2 Choosing the Right Effect Size
The table below provides a quick reference for selecting the appropriate effect size measure based on your statistical test:
| Statistical Test | Effect Size to Report | Notes |
|---|---|---|
| t-test (independent samples) | Cohen's or Hedges' | Hedges' preferred for small samples () |
| t-test (one sample or paired) | Cohen's or | Use when comparing to a known parameter |
| One-way ANOVA | (preferred), (common) | is less biased; tends to overestimate |
| Factorial ANOVA | (preferred), (common) | Use partial versions for factorial designs |
| Multiple regression | , adjusted , Cohen's | Cohen's for local/effect-size-specific measures |
| Correlation | Pearson , | shows variance explained |
| Chi-squared () | Phi () | Special case of Pearson for tables |
| Chi-squared () | Cramér's | Generalized version of Phi for larger tables |
| Risk comparison (binary, two groups) | Risk Ratio + ARD + NNT | ARD = Absolute Risk Difference; NNT = Number Needed to Treat |
| Case-control study (binary) | Odds Ratio | Standard measure for case-control studies |
| Mann-Whitney U / Wilcoxon | Rank-biserial correlation () | Non-parametric alternative to |
6. Using the Effect Size Calculator Component
The Effect Size Calculator component in DataStatPro provides a comprehensive tool for computing, visualising, and interpreting effect sizes across all major statistical designs.
Step-by-Step Guide
Step 1 — Select the Effect Size Family
Choose from the "Effect Size Type" dropdown:
- Mean Difference (d-family): For comparing means from -tests or similar.
- Variance Explained (r-family): For ANOVA, regression, and correlation.
- Categorical / Binary Outcomes: For tests, risk ratios, odds ratios.
- Non-Parametric: For Mann-Whitney U, Wilcoxon tests.
Step 2 — Select the Specific Effect Size
Based on your design, select the specific effect size:
- Independent samples: Cohen's , Hedges' , Glass's .
- One-sample or paired: Cohen's , .
- ANOVA: , , , , , Cohen's .
- Regression: , adjusted , Cohen's (global or local).
- Correlation: Pearson , .
- Binary outcomes: OR, RR, ARD, NNT, , Cramér's .
- Non-parametric: Rank-biserial .
💡 Recommendation: For ANOVA, always compute and report (or for factorial designs) in addition to or instead of . Omega squared is less biased and is increasingly required by journals.
Step 3 — Input Method
Choose how to provide the data:
- Raw data: Provide the actual dataset; the app computes summary statistics and effect sizes automatically.
- Summary statistics: Enter means, SDs, and values directly.
- Test statistics: Enter the , , or statistic with degrees of freedom.
💡 Tip: When computing effect sizes from published papers that only report test statistics, use the "From test statistics" input method. For example, and .
Step 4 — Specify Design Details
- Number of groups (for ANOVA).
- Factorial structure (for multi-factor designs — specify main effects and interactions).
- Sample sizes per group.
- Degrees of freedom (if computing from test statistics).
Step 5 — Select Confidence Level
Choose the confidence level for intervals (default: 95%). The application computes:
- Exact CIs based on non-central and distributions for , , .
- Fisher -transformation CIs for Pearson .
- Wald CIs for OR, RR, ARD (on the log scale, back-transformed).
- Bootstrap CIs when raw data are provided.
Step 6 — Select Benchmarks
Choose the benchmark system for interpreting the magnitude:
- Cohen (1988) — the original benchmarks (small/medium/large).
- Field (2013) — discipline-specific benchmarks.
- Funder & Ozer (2019) — benchmarks based on social science effect size distributions.
- Sawilowsky (2009) — extended benchmarks for very small and very large effects.
- Custom — enter your own domain-specific thresholds.
⚠️ Important: Cohen's benchmarks were intended as rough conventions when no better information is available. Always prioritise domain-specific benchmarks and contextual interpretation over generic small/medium/large labels.
Step 7 — Display Options
Select which outputs and visualisations to display:
- ✅ Effect size point estimate with 95% CI.
- ✅ Benchmark classification (small / medium / large) with the selected system.
- ✅ Visualisation of the two distributions with overlap.
- ✅ Common Language Effect Size (CL) percentage.
- ✅ Probability of Superiority.
- ✅ Number Needed to Treat (for binary outcomes).
- ✅ Variance overlap diagram (, , statistics).
- ✅ Forest plot (when multiple effect sizes are provided).
- ✅ Conversion to other effect size types.
- ✅ Power analysis based on observed effect size.
Step 8 — Run the Calculation
Click "Calculate Effect Size". The application will:
- Compute the requested effect size(s) from the provided data or statistics.
- Construct confidence intervals using the appropriate method.
- Classify the magnitude using the selected benchmark system.
- Generate all selected visualisations.
- Provide an interpretation paragraph in plain language.
7. Effect Sizes for Mean Differences
7.1 Cohen's for Independent Samples — Full Procedure
Step 1 — Compute group means and standard deviations
Step 2 — Compute pooled standard deviation
Step 3 — Compute Cohen's
Step 4 — Apply Hedges' correction (especially if per group)
Step 5 — Compute the 95% CI (exact, via non-central )
The exact CI is computed numerically. The approximate 95% CI is:
Step 6 — Compute the Common Language Effect Size (CL)
The Common Language Effect Size (McGraw & Wong, 1992) is the probability that a randomly selected person from Group 1 scores higher than a randomly selected person from Group 2:
Where is the standard normal CDF.
means 50% probability of superiority (no effect); means 75% of the time, a person from Group 1 outscores a person from Group 2.
7.2 Cohen's Benchmark Classification for and
Cohen (1988) proposed the following conventions, intended as rough guides only:
| Cohen's | Verbal Label | Equivalent | Overlap (%) |
|---|---|---|---|
| No effect | |||
| Small | |||
| Medium | |||
| Large | |||
| Very large | |||
| Huge |
⚠️ Cohen himself warned against mechanical application of these benchmarks. He stated: "The effect size conventions are offered as conventions of last resort, to be used only when no better basis for setting the ES is available." Always contextualise effect sizes within your specific research domain.
Extended benchmarks (Sawilowsky, 2009):
| Label | |
|---|---|
| Tiny | |
| Very small | |
| Small | |
| Medium | |
| Large | |
| Very large | |
| Huge |
7.3 Variance Overlap Statistics (Cohen's )
To complement Cohen's , three overlap statistics provide intuitive, probabilistic interpretations of the separation between two normal distributions:
: The proportion of the combined distributions that is NOT overlapping:
: The proportion of one distribution that exceeds the same proportion in the other distribution (percentage of the non-treatment distribution exceeded by the median of the treatment distribution):
(Cohen's ): The proportion of the treatment distribution that exceeds the median of the control distribution:
Example for :
Interpretation: 69.1% of the treatment group scores above the median of the control group.
| CL (%) | Overlap (%) | ||
|---|---|---|---|
| 0.20 | 57.9% | 55.6% | 85.3% |
| 0.50 | 69.1% | 63.8% | 66.9% |
| 0.80 | 78.8% | 71.4% | 52.5% |
| 1.00 | 84.1% | 76.0% | 44.8% |
| 1.50 | 93.3% | 85.6% | 28.1% |
| 2.00 | 97.7% | 92.1% | 16.9% |
7.4 Computing from Common Test Statistics
When raw data are unavailable, can be computed from reported test statistics:
From an independent samples -test:
From a one-sample or paired -test:
From an -ratio (two-group ANOVA, ):
From a statistic (for or Cramér's ):
Converting between effect size families:
(for equal group sizes)
(for equal group sizes)
8. Effect Sizes for Variance Explained
8.1 , , and Comparison
For a one-way ANOVA with groups and total observations:
Given the ANOVA table:
| Source | SS | df | MS |
|---|---|---|---|
| Between (Effect) | |||
| Within (Error) | |||
| Total |
Eta squared:
Omega squared (preferred):
Epsilon squared:
Relationship:
All three measure the same thing (proportion of variance explained) but and are corrected for the positive bias of in finite samples.
8.2 Benchmark Interpretations for Variance-Explained Effect Sizes
Cohen (1988) benchmarks:
| Label | or | or | ||
|---|---|---|---|---|
| Small | ||||
| Medium | ||||
| Large |
Note on benchmarks: These were established when was the standard report. Since and are systematically smaller than , the same verbal benchmarks do not directly transfer. Use the or conversions for power analysis regardless of which variance-explained index you report.
8.3 Generalised Eta Squared ()
Generalised eta squared (; Olejnik & Algina, 2003) is designed for between-subjects comparison across studies by distinguishing between:
- Manipulated variables (experimental factors set by the researcher, e.g., treatment).
- Measured variables (participant characteristics, e.g., age, sex).
Where the summation is over all measured (non-manipulated) variables in the design.
is more comparable across different experimental designs (between-subjects, within-subjects, mixed) than either or and is increasingly recommended for factorial and mixed ANOVA designs.
8.4 and Adjusted for Regression
The coefficient of determination is the proportion of variance in explained by the regression model:
Adjusted corrects for the number of predictors in the model:
Adjusted can be negative when the model fits worse than a horizontal line.
change () for evaluating the increment from adding predictors:
Cohen's for the increment:
8.5 Confidence Intervals for and
CIs for variance-explained effect sizes use the non-central F-distribution. The observed -ratio has a non-central distribution:
Where is the non-centrality parameter:
The CI bounds for are found numerically, then converted to :
For and , a transformation approach is used: first compute the CI for (or ), then convert to the desired effect size.
9. Effect Sizes for Associations and Categorical Data
9.1 Pearson — Correlation Effect Size
The Pearson correlation is simultaneously a descriptive statistic and an effect size. It requires no additional calculation — the correlation coefficient itself IS the standardised effect size for the strength of a linear relationship.
Benchmarks for :
| Cohen (1988) | Funder & Ozer (2019) | |
|---|---|---|
| Negligible | Very small (potentially negligible) | |
| Small | Small | |
| Medium | Medium / large | |
| Large | Very large |
💡 Funder & Ozer (2019) argued that Cohen's benchmarks are too conservative for social/behavioural science, where is actually a large effect in practice. Consider the base rates in your field when applying benchmarks.
9.2 Interpreting the Odds Ratio
The Odds Ratio (OR) is the most common effect size in case-control studies and logistic regression.
| OR | Interpretation |
|---|---|
| No difference in odds between groups | |
| Increased odds of event in Group 1 vs. Group 2 | |
| Decreased odds of event in Group 1 vs. Group 2 | |
| Twice the odds | |
| Half the odds (equivalent to OR = 2.0 in opposite direction) |
Benchmark (Chen et al., 2010 for medical research):
| OR | Label | |
|---|---|---|
| Small | ||
| Medium | ||
| Large |
Converting OR to Cohen's (for meta-analytic purposes):
Converting OR to :
9.3 Risk Ratio vs. Odds Ratio — When to Use Which
| Situation | Recommended Effect Size | Why |
|---|---|---|
| Prospective/cohort study | Risk Ratio (RR) | Probabilities are directly estimable |
| Case-control study | Odds Ratio (OR) | Incidence not estimable; OR is invariant |
| Clinical trial (binary outcome) | RR + ARD + NNT | All provide different, complementary information |
| Rare events () | Either (OR RR when rare) | OR approximates RR for rare outcomes |
| Common events () | RR preferred | OR exaggerates the effect vs. RR |
| Logistic regression | OR | Natural output of logistic model |
⚠️ A common mistake is interpreting an Odds Ratio as a Risk Ratio when the event is common (). The OR always exaggerates the RR when the event is common. For example, OR = 3.0 may correspond to RR = 2.0 when the control event rate is 30%. Always report the ARD and NNT alongside OR or RR for clinical interpretability.
9.4 Number Needed to Treat (NNT)
The NNT is one of the most clinically interpretable effect sizes:
A treatment with NNT = 5 means that on average, 5 patients must be treated for 1 additional patient to benefit compared to control.
| NNT | Clinical Interpretation |
|---|---|
| Every treated patient benefits (perfect) | |
| Excellent — highly effective treatment | |
| Good — meaningful clinical benefit | |
| Moderate — benefit for a minority of treated patients | |
| Small — many patients treated for little benefit | |
| No treatment benefit (ARD = 0) |
95% CI for NNT (Altman method):
Where .
⚠️ NNT CIs can be awkward when the CI for ARD includes 0, producing a CI that spans from a negative NNT (Needed to Harm, NNH) to a positive NNT through an infinite discontinuity. In this case, report both sides of the CI as NNT and NNH separately.
9.5 Cramér's for Multi-Way Tables
Cramér's is the standard effect size for tests of independence in tables larger than :
Benchmarks (Cohen, 1988, adjusted by ):
| Small | Medium | Large | |
|---|---|---|---|
| 1 () | |||
| 2 () | |||
| 3 () | |||
| 4 () |
Corrected Cramér's (Bergsma, 2013 correction for small samples and sparse tables):
Where .
The corrected version corrects for positive bias and is recommended for small samples or sparse tables.
10. Model Fit and Evaluation
10.1 Evaluating Effect Size Precision — The Confidence Interval
The primary evaluation criterion for an effect size is its confidence interval (CI). The CI communicates both the direction and magnitude of the effect AND the uncertainty around the estimate.
Rules for interpreting effect size CIs:
| CI property | Interpretation |
|---|---|
| CI entirely above zero (or positive null) | Effect is significantly positive |
| CI entirely below zero (or negative null) | Effect is significantly negative |
| CI contains zero | Effect not statistically significant |
| Narrow CI | Precise estimate (large ) |
| Wide CI | Imprecise estimate (small ) — interpret point estimate cautiously |
| CI range entirely within small range | Effect is definitely small |
| CI range spans from small to large | Effect magnitude is uncertain |
10.2 Precision as a Function of Sample Size
The width of the 95% CI for Cohen's decreases as increases:
For equal group sizes () and :
| per group | Approx CI Width | Interpretation |
|---|---|---|
| 10 | 1.86 | Very imprecise |
| 20 | 1.28 | Imprecise |
| 50 | 0.80 | Moderate precision |
| 100 | 0.57 | Good precision |
| 200 | 0.40 | High precision |
| 500 | 0.25 | Very high precision |
10.3 The Minimal Effect Size of Interest (MESI) — Equivalence Testing
For many applications, researchers are not just interested in whether an effect is non-zero, but whether it exceeds a minimum meaningful threshold. The Minimum Effect Size of Interest (MESI) defines the smallest effect that would be practically or clinically important.
Two One-Sided Tests (TOST) equivalence testing:
Define bounds and as the MESI (e.g., for a "trivially small" effect). The null hypothesis of the equivalence test is:
(the effect is NOT negligible)
(the effect IS negligible)
The equivalence is supported when both one-sided tests reject their respective nulls. Practically, the effect is declared equivalent to zero (negligible) when the 90% CI for falls entirely within .
💡 Equivalence testing is increasingly important for null results. A study that fails to reject does not establish that the effect is zero or negligible — only equivalence testing can establish negligibility.
10.4 Power Analysis Based on Effect Size
Effect sizes are the primary input to a priori power analysis — determining the required sample size before conducting a study.
Required sample size for a two-sample -test at power and significance :
For and power = (, ):
| Cohen's | per group (power = 0.80) | per group (power = 0.90) |
|---|---|---|
| 0.20 (small) | 394 | 527 |
| 0.50 (medium) | 64 | 85 |
| 0.80 (large) | 26 | 34 |
| 1.00 (large) | 17 | 22 |
10.5 Sensitivity Analysis — Detectable Effect for a Given
The sensitivity analysis (retrospective power analysis) asks: given the sample size already collected, what is the smallest effect size that could be detected with 80% power?
For per group:
This means a study with per group can only reliably detect effects of (close to Cohen's "large" threshold). Effects smaller than this may exist but will often be missed.
⚠️ Sensitivity analysis (post-hoc power) should not be used to "explain" a non-significant result — this is circular reasoning. Sensitivity analysis is valuable for communicating what magnitudes of effects could have been detected, but it does not address whether a true effect exists.
10.6 Comparing Effect Sizes Across Studies
When comparing effect sizes across studies, ensure:
- Same family: and are different families; convert to a common metric.
- Same design: Paired and independent are not directly comparable.
- Same sample type: Clinical vs. community samples may have systematically different effect sizes.
- Bias correction: Use Hedges' (not Cohen's ) when comparing across studies with different sample sizes.
Converting between families for comparison:
(exact)
(equal group sizes)
(equal group sizes)
11. Advanced Topics
11.1 Meta-Analytic Pooling of Effect Sizes
Meta-analysis combines effect sizes from multiple independent studies using weighted averaging. The weight of each study is the inverse of its variance.
Fixed-effects model (assumes all studies estimate the same true effect ):
Where and is the variance of the effect size in study .
Random-effects model (allows true effects to vary across studies, ):
Where is the estimated between-study variance (heterogeneity), computed using the DerSimonian-Laird estimator.
Heterogeneity statistics:
| Heterogeneity | |
|---|---|
| Low | |
| Moderate | |
| Substantial | |
| Considerable |
11.2 Effect Sizes for Multilevel and Longitudinal Designs
In multilevel models (e.g., students within schools, patients within hospitals), effect sizes must account for the nested structure.
ICC-based for between-cluster effects:
for multilevel models (Nakagawa & Schielzeth, 2013):
(fixed effects only)
(fixed + random effects)
Where = variance explained by fixed effects, = random effects variance, = residual variance.
11.3 Standardised vs. Unstandardised Effect Sizes
Not all effect size applications require standardisation. Unstandardised effect sizes (raw mean differences, regression coefficients in original units) are often more informative and actionable than standardised counterparts.
When to use unstandardised effects:
- The original measurement units are widely understood (e.g., blood pressure in mmHg, cognitive test points).
- The audience is practitioners who think in original units.
- The scale is the same across studies being compared.
When to use standardised effects:
- Variables are measured on arbitrary scales (psychological questionnaires).
- Comparing effects across studies using different instruments.
- Meta-analysis.
- Power analysis.
The "point of controversy": Some methodologists (Lenth, 2001; Tukey, 1991) argue that standardised effect sizes are frequently misinterpreted and that the denominator (which SD is used) is itself a critical and often overlooked choice.
11.4 Effect Size for Interaction Effects in Factorial ANOVA
Interaction effect sizes in factorial designs require special treatment:
Partial omega squared for interaction:
Generalised eta squared for the interaction:
💡 For interaction effects, always compute and report the simple effects (main effects at each level of the other factor) alongside the overall interaction effect size. The interaction effect size alone does not communicate the direction or pattern of the interaction.
11.5 Rank-Based Effect Sizes for Non-Parametric Tests
When parametric assumptions are violated, rank-based effect sizes should be used:
Wilcoxon Signed-Rank test (paired or one-sample):
Where is the standardised Wilcoxon test statistic.
Kruskal-Wallis test (non-parametric ANOVA equivalent):
Where is the Kruskal-Wallis statistic, is the number of groups, and is the total sample size.
Spearman's (non-parametric correlation, itself an effect size):
Where is the difference between ranks of the -th observation on and .
11.6 Reporting Effect Sizes According to APA and Journal Standards
The APA Publication Manual (7th ed.) and major journals increasingly require reporting effect sizes with confidence intervals for all primary analyses. Best practice:
Minimum reporting requirements:
- Report the effect size point estimate.
- Report the 95% CI for the effect size.
- Specify which effect size was computed (not just "effect size = ").
- State which benchmark system was used for interpretation.
- Report whether or was used (specify which type).
Example APA-compliant report: "The CBT group showed significantly lower depression scores than the control group (, , , 95% CI [0.40, 1.14]), indicating a large treatment effect."
12. Worked Examples
Example 1: Cohen's and Hedges' — CBT vs. Control for Depression
A clinical trial randomises participants to CBT and to a waitlist control. Depression is measured on the PHQ-9 (0–27 scale, lower = less depression).
Summary statistics:
| Group | Mean PHQ-9 | SD | |
|---|---|---|---|
| CBT | 35 | ||
| Control | 35 |
Step 1 — Pooled SD:
Step 2 — Cohen's :
The negative sign indicates CBT has lower (better) depression scores. By convention, report the absolute value with direction: .
Step 3 — Hedges' (bias correction):
(Minimal correction since is moderate.)
Step 4 — Approximate 95% CI:
Step 5 — Common Language Effect Size:
77.0% of CBT participants score lower (better) than the average control participant.
Step 6 — Statistic:
85.1% of CBT participants have PHQ-9 scores below the mean of the control group.
Summary:
| Statistic | Value | Interpretation |
|---|---|---|
| Cohen's | Large effect (Cohen's benchmark: large ) | |
| Hedges' | Large effect (negligible bias correction) | |
| 95% CI for | Entirely above zero — significant effect | |
| CL | 77% of CBT patients score better than average control | |
| 85% of CBT patients below the control mean |
Conclusion: CBT produced a large, statistically significant reduction in depression compared to waitlist control (, 95% CI [0.54, 1.54]). The effect size indicates that approximately 85% of CBT participants had depression scores below the average control participant. This is a clinically meaningful and large treatment effect.
Example 2: and — Effect of Teaching Method on Exam Scores
An educational researcher tests three teaching methods (Lecture, Flipped Classroom, Project- Based) on exam performance (, %). Total sample: (30 per group).
ANOVA table:
| Source | SS | df | MS | ||
|---|---|---|---|---|---|
| Between (Method) | 2840 | 2 | 1420 | 8.94 | |
| Within (Error) | 13800 | 87 | 158.6 | ||
| Total | 16640 | 89 |
Step 1 — Eta squared:
Step 2 — Omega squared (bias-corrected):
Step 3 — Epsilon squared:
Step 4 — Cohen's :
Step 5 — 95% CI for (via non-central ):
Using the non-central approach with , , :
Non-centrality parameter
95% CI for : (numerical)
Converting: ,
, 95% CI
Comparison of estimates:
| Measure | Value | Benchmark (Cohen) | Label |
|---|---|---|---|
| Large () | Large (biased) | ||
| Large () | Large (unbiased) | ||
| Large () | Large (unbiased) | ||
| Cohen's | Large () | Large |
Conclusion: Teaching method has a large effect on exam performance (, 95% CI [0.074, 0.263]). The bias-corrected estimate () is slightly smaller than , as expected. Approximately 15% of the variance in exam scores is attributable to teaching method. Note that slightly overestimates the true population effect due to sampling bias, illustrating why is preferred.
Example 3: Odds Ratio, Risk Ratio, and NNT — Vaccine Effectiveness
A clinical trial evaluates a new vaccine. Among vaccinated participants, develop the disease. Among control participants, develop the disease.
Table:
| Disease | No Disease | Total | |
|---|---|---|---|
| Vaccinated | 20 | 980 | 1000 |
| Control | 80 | 920 | 1000 |
Step 1 — Risks:
(vaccinated)
(control)
Step 2 — Absolute Risk Difference:
Vaccination reduces disease risk by 6 percentage points.
Step 3 — Risk Ratio:
Vaccinated participants have 25% the risk of unvaccinated (a 75% risk reduction).
Step 4 — Odds Ratio:
95% CI for OR:
Step 5 — NNT:
Step 6 — Vaccine Effectiveness (VE):
95% CI for ARD:
95% CI for NNT:
Summary:
| Effect Size | Value | 95% CI | Interpretation |
|---|---|---|---|
| ARD | Vaccine reduces risk by 6% | ||
| Risk Ratio | 75% risk reduction | ||
| Odds Ratio | Significantly protective | ||
| NNT | 17 vaccinated to prevent 1 case | ||
| Vaccine Effectiveness | High effectiveness |
Conclusion: The vaccine is highly effective, with a risk ratio of 0.25 (75% risk reduction) and an NNT of 17 (13–25). For every 17 people vaccinated, one additional case of disease is prevented compared to no vaccination. All three complementary effect sizes (ARD, RR, NNT) consistently demonstrate a clinically important and statistically significant protective effect of the vaccine.
Example 4: Cramér's — Association Between Study Method and Grade
A researcher surveys students on their primary study method (Flashcards, Practice Tests, Re-Reading) and their final grade (A/B, C, D/F). The chi-squared test yields , .
Cramér's :
95% CI for (using non-central approach):
Non-centrality parameter
95% CI for : (numerical iteration)
Benchmark (for ): Small = 0.07, Medium = 0.21, Large = 0.35.
falls just below the medium threshold.
Conclusion: There is a small-to-medium association between study method and grade (, 95% CI [0.117, 0.236], ). Study method explains approximately (3.8%) of the variance in grade outcomes, indicating a modest but statistically significant relationship. Practice testing and flashcard use appear to produce better grade distributions than re-reading, consistent with retrieval practice research.
13. Common Mistakes and How to Avoid Them
Mistake 1: Conflating Statistical Significance with Effect Size
Problem: Concluding that because , the effect is "large" or "important."
Conversely, concluding that because , the effect is "zero" or "negligible."
Statistical significance is entirely about the strength of evidence against ,
not about the magnitude of the effect.
Solution: Always report BOTH the p-value AND the effect size with its CI. A significant
result with (tiny effect) and a non-significant result with (large
effect, underpowered study) tell very different stories.
Mistake 2: Using Instead of for ANOVA Effect Sizes
Problem: is systematically biased upward — it always overestimates the true
population effect size, especially in small samples with few groups. Many researchers
report simply because it is the default output of SPSS.
Solution: Always report (or for factorial designs) as the
primary ANOVA effect size. Report only if explicitly required by a journal, and
clearly label it as biased. In many cases, the difference is small but the correct
labelling matters.
Mistake 3: Using Cohen's When Glass's is Appropriate
Problem: When group variances differ substantially (variance ratio ), pooling the
standard deviations to compute Cohen's produces a denominator that reflects neither
group well and leads to a misleading effect size.
Solution: When (or ), report Glass's (standardising by
the control group SD) alongside Cohen's . Clearly state which SD was used as the
standardiser.
Mistake 4: Reporting Effect Sizes Without Confidence Intervals
Problem: A point estimate of from a study of per group has a 95%
CI of approximately [0.12, 0.88] — a range spanning from small to large. Reporting only
without the CI gives a false sense of precision.
Solution: Always report the 95% CI alongside every effect size. DataStatPro
automatically computes exact CIs for all effect sizes using non-central distributions.
This is increasingly required by APA and major journals.
Mistake 5: Applying Cohen's Benchmarks Without Context
Problem: Mechanically classifying as "small" based on Cohen's benchmarks
regardless of the research context. In some fields (e.g., cognitive neuroscience or social
psychology in field settings), is a large, practically important effect.
Solution: Use Cohen's benchmarks only as a last resort. Prioritise domain-specific
benchmarks, compare to average effect sizes in your field (e.g., from meta-analyses), and
consider the practical or clinical implications of the effect size given the context.
Mistake 6: Interpreting the OR as the RR
Problem: When the event is common (), the Odds Ratio is numerically larger
(more extreme) than the Risk Ratio. For example, if and , then
RR = 2.0 but OR = 2.67. Reporting "the odds of the event are 2.67 times higher" and
implying that "the risk is 2.67 times higher" substantially overstates the effect.
Solution: Always report the RR (not OR) when the outcome is common () and
absolute probabilities are estimable (prospective study). Clearly distinguish between
"odds" (OR) and "risk" (RR) in all reporting. Always accompany OR with the ARD for context.
Mistake 7: Computing Paired as Independent Samples
Problem: Using the independent samples formula (with pooled SD) for paired or repeated
measures data ignores the correlation between the two measurements, dramatically
underestimating the true within-person effect size (because includes between-
person variability, whereas does not).
Solution: For paired designs, always use where
is the mean of the difference scores and is the SD of the difference scores. The
paired will typically be larger than the independent for the same data when the
pre-post correlation is positive.
Mistake 8: Reporting Values as Proportions of "Total Variance"
Problem: Partial eta squared () in factorial ANOVA is NOT the proportion of
total variance. In a ANOVA with interaction, the values of for
the two main effects and interaction can sum to well over 1.0 — clearly impossible if they
were proportions of total variance.
Solution: When reporting , state explicitly that it is "the proportion of
variance in the DV attributable to this effect after removing variance associated with
other effects." Use (not ) if you want to convey what fraction of
total variance each effect explains.
Mistake 9: Using the Wrong for Computing from
Problem: When computing from a reported -statistic, researchers sometimes use
the total instead of the per-group in the formula ,
or confuse the sample sizes when groups are unequal.
Solution: For independent samples: .
For paired or one-sample: .
Always double-check which -test formula was used by the original authors.
Mistake 10: Reporting NNT Without Specifying the Time Horizon and Base Rate
Problem: An NNT of 20 means very different things depending on whether the outcome
is "prevent a heart attack over 5 years" vs. "cure a headache in 2 hours." Without
specifying the comparison condition (vs. what?), the time horizon, the baseline event rate,
and the population, NNT is not interpretable.
Solution: Always specify the NNT with: (1) the comparison condition (treatment vs.
placebo/control), (2) the outcome, (3) the time horizon, and (4) the baseline event rate
(control group risk). Example: "NNT = 17 (95% CI: 13–25) to prevent one case of disease
in vaccinated vs. unvaccinated adults over 12 months, given a baseline risk of 8%."
14. Troubleshooting
| Problem | Likely Cause | Solution |
|---|---|---|
| is extremely large () | Data entry error; outlier dominating; very small SD | Check raw data for errors; screen for outliers; verify SD calculation |
| or is negative | True effect is near zero; small sample; | Negative should be reported as 0 (convention); increase sample size |
| values sum to more than 1.0 | This is expected in factorial ANOVA; is not a proportion of total variance | Switch to or if total-variance proportions are needed |
| OR and RR give very different conclusions | Common event (base rate ) — OR exaggerates relative to RR | Report RR (or ARD + NNT) for common outcomes; OR is appropriate for case-control |
| 95% CI for NNT includes infinity | ARD CI includes zero (non-significant result) | Report NNT from each bound of the ARD CI separately; report as NNH for lower bound if negative |
| from test statistic differs from from summary statistics | Different formula used; unequal group sizes | For unequal : use ; verify which formula applies |
| CL (Common Language) effect size close to 0.50 for large | Possible — check calculation | CL = , not ; CL of 0.50 corresponds to |
| Fisher's CI for extends beyond | Very small or close to | Check that ; for the CI is degenerate; consider Bayesian credible interval |
| Cramér's is larger than expected for sparse table | Small sample bias in | Use bias-corrected Cramér's (Bergsma, 2013) |
| Paired is larger than independent samples for same data | This is expected — paired removes between-person variance | Both are correct but measure different things; report paired for paired designs |
| Power calculation requires larger than resources allow | Effect size is small or power requirement is high | Accept lower power (state this as limitation); use a one-tailed test if directional; consider sequential design |
| and conversions give inconsistent results | Unequal group sizes affecting the conversion formula | Use the exact formula not the equal- approximation |
15. Quick Reference Cheat Sheet
Core Equations
| Formula | Description |
|---|---|
| Cohen's (independent samples) | |
| Pooled standard deviation | |
| Hedges' (bias-corrected ) | |
| Glass's | |
| Cohen's for paired designs | |
| SE of Cohen's | |
| Common Language Effect Size | |
| Cohen's | |
| Eta squared | |
| Partial eta squared | |
| Omega squared (one-way) | |
| Partial omega squared | |
| Cohen's | |
| Cohen's (global) | |
| Cohen's (local/incremental) | |
| , | Fisher's for CI |
| Odds Ratio | |
| Risk Ratio | |
| Number Needed to Treat | |
| Cramér's | |
| Rank-biserial correlation |
Effect Size Family Selection Guide
| Test | Effect Size | Notes |
|---|---|---|
| One-sample -test | Compared to known value | |
| Independent -test | Cohen's or Hedges' | for small |
| Paired -test | Uses difference scores | |
| One-way ANOVA | or | NOT (biased) |
| Factorial ANOVA | or | Partial versions |
| ANCOVA | (adjusted) | After covariate removal |
| Simple regression | , | Both informative |
| Multiple regression | , | Report adjusted |
| (2×2) | Same as for binary | |
| () | Cramér's | Use corrected if small |
| Binary, prospective | ARD, RR, NNT | All three recommended |
| Binary, case-control | OR | RR not estimable |
| Mann-Whitney | Non-parametric | |
| Wilcoxon signed-rank | Non-parametric | |
| Kruskal-Wallis | Non-parametric ANOVA |
Cohen's Benchmarks (1988) — All Families
| Label | OR | ||||||
|---|---|---|---|---|---|---|---|
| Small | |||||||
| Medium | |||||||
| Large |
Conversion Formulas
| From | To | Formula |
|---|---|---|
| (equal ) | ||
| (equal ) | ||
| (unequal ) | ||
| (2 groups) | ||
| (2 groups) | ||
| OR | ||
| OR | ||
| (independent) | ||
| (paired/one-sample) | ||
| (2 groups, ) | ||
NNT Interpretation Guide
| NNT | Clinical Impact |
|---|---|
| Extraordinary benefit | |
| Excellent | |
| Good | |
| Moderate | |
| Small | |
| Minimal | |
| No benefit |
Required Sample Size for 80% Power (Two-Sided )
| per group | total | total (3 groups) | |||
|---|---|---|---|---|---|
| 0.20 | 394 | 0.10 | 783 | 0.01 | 969 |
| 0.35 | 130 | 0.20 | 193 | 0.04 | 279 |
| 0.50 | 64 | 0.30 | 84 | 0.06 | 159 |
| 0.65 | 38 | 0.40 | 46 | 0.10 | 90 |
| 0.80 | 26 | 0.50 | 29 | 0.14 | 66 |
| 1.00 | 17 | 0.60 | 19 | 0.25 | 36 |
Effect Size Reporting Checklist
| Item | Required |
|---|---|
| Point estimate of effect size | ✅ Always |
| 95% CI for effect size | ✅ Always |
| Which specific effect size (e.g., not just "effect size") | ✅ Always |
| Which benchmark system used | ✅ Always |
| Sample sizes for each group/condition | ✅ Always |
| Direction of effect (which group is higher) | ✅ Always |
| Whether bias correction was applied ( vs. ) | ✅ When |
| ARD + NNT for binary outcomes | ✅ For clinical/applied |
| Power analysis or sensitivity analysis | ✅ For null results |
| Domain-specific context for benchmark | ✅ Recommended |
This tutorial provides a comprehensive foundation for understanding, computing, and interpreting Effect Sizes using the DataStatPro application. For further reading, consult Cohen's "Statistical Power Analysis for the Behavioral Sciences" (2nd ed., 1988), Ellis's "The Essential Guide to Effect Sizes" (2010), Cumming's "Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis" (2012), and Lakens's "Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-Tests and ANOVAs" (Frontiers in Psychology, 2013). For feature requests or support, contact the DataStatPro team.