Is the effect small or large?

Stian Lydersen

doi:10.4045/tidsskr.19.0665

Medicine and numbers

Is the effect small or large?

Norwegian

Stian Lydersen

See All Articles

Stian Lydersen

Orcid

E-mail: stian.lydersen@ntnu.no

Stian Lydersen, dr.ing. and professor of medical statistics at the Regional Centre for Child and Youth Mental Health and Child Welfare (RKBU Central Norway) at the Department of Mental Health, Norwegian University of Science and Technology.

The author has completed the ICMJE form and declares no conflicts of interest.

Article

How do we quantify the results of a study? Is the effect measured on the original scale or a standardised effect size most relevant?

Reindal and colleagues (1) studied age for onset of independent walking. For children diagnosed with autism spectrum disorder, the mean age (standard deviation) was 14.74 (4.28) months, and for children without autism spectrum disorder, it was 13.76 (2.88) months. The difference was therefore 14.74–13.76 = 0.98 months. This is the effect size measured on the original scale, also called unstandardised effect size. In addition, the authors report a standardised effect size as this difference divided by the standard deviation in the comparison group, that is, 0.98/2.88 = 0.34 (see figure 1). Which of these measures is most relevant?

Figure 1 The mean (standard deviation) of age for onset of independent walking among 376 children with autism spectrum… — **Figure 1** The mean (standard deviation) of age for onset of independent walking among 376 children with autism spectrum disorder and 114 children without this diagnosis (1). The difference was 0.98 months, which corresponds to Cohen's d = 0.34.

What is effect size?

The term effect size is not precise. Some authors use this term for Cohen's d or a related measure such as Glass's delta or Hedges' g (2). These are the difference between two means, divided by a standard deviation, and are examples of standardised effect sizes. Other examples of standardised effect sizes are the Pearson correlation coefficient, the standardised regression coefficient in linear regression, and partial eta squared in analyses of variance (ANOVA).

In the behavioural sciences, it is not uncommon to report standardised effect sizes. But what role do they actually have? Researchers who report standardised effect sizes usually refer to the book Statistical Power Analysis for the Behavioral Sciences by Jacob Cohen (1923–1998) (3, 4). In this book, Cohen introduces standardised effect sizes as the basis for computing power or sample size in a future study, but he does not discuss other applications of standardised effect sizes.

After a study has been carried out, the choice of a relevant effect size depends on the context. Examples of unstandardised effect sizes are the difference between two means, the unstandardised regression coefficient, the odds ratio, or the risk difference. Several authors recommend in general to report unstandardised effect sizes (5, 6). Further discussions on unstandardised and standardised effect sizes are given in (7) and (8).

Cohen classifies Cohen's d as small, medium, and large if it equals 0.2, 0.5, or 0.8 (4), p. 26). Other authors classify standardised effect sizes in intervals, and partly somewhat differently from Cohen, see for example (4, p. 79–80 and ((9), p. 123). Classifying standardised effect sizes can be useful when calculating power or sample size for a future study, but several authors find such classifications to have little relevance for observed effect in a completed study ((5, 8).

Unstandardised is easy to understand

A difference in age for onset of independent walking of 0.98 months between two groups is easy to understand. Does the standardised effect size Cohen's d = 0.34 provide any additional clinically relevant information? Standardised effect sizes can be useful as a basis for power or sample size calculation for a future study, and they can also be useful input in meta-analyses, but otherwise, standardised effect sizes seem to have little relevance.

Literature

1.
Reindal L, Nærland T, Weidle B et al. Age of first walking and associations with symptom severity in children with suspected or diagnosed autism spectrum disorder. J Autism Dev Disord 2019; 49: 1–17. [PubMed][CrossRef]
2.
Grissom RJ, Kim JJ. Effect sizes for research. Univariate and multivariate applications. 2nd ed. New York, NY: Routledge, 2012.
3.
Cohen J. Statistical power analysis for the behavioral sicences. 1st ed. Hillsdale, NJ: Lawrence Erlbaum Associates, 1977.
4.
Cohen J. Statistical power analysis for the behavioral sicences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.
5.
Pek J, Flora DB. Reporting effect sizes in original psychological research: A discussion and tutorial. Psychol Methods 2018; 23: 208–25. [PubMed][CrossRef]
6.
Baguley T. Standardized or simple effect size: what should be reported? Br J Psychol 2009; 100: 603–17. [PubMed][CrossRef]
7.
Fritz CO, Morris PE, Richler JJ. Effect size estimates: current use, calculations, and interpretation. J Exp Psychol Gen 2012; 141: 2–18. [PubMed][CrossRef]
8.
Kelley K, Preacher KJ. On effect size. Psychol Methods 2012; 17: 137–52. [PubMed][CrossRef]
9.
Campbell MJ, Swinscow TDV. Statistics at square one. 11th ed. Wiley-Blackwell, 2009.

Comments ( 2 )

Dette kommentarfeltet modereres, men kommentarer blir ikke redaksjonelt behandlet ut over å sikre at de følger retningslinjer for vårt kommentarfelt.

03.06.2020:

I respectfully disagree. Standardized effect sizes can be extremely helpful. In the example you present, the effect size does indeed have little relevance. A month is a difference readily understood by anybody, lay person or professional. But often, as a health care professional or researcher, one reads papers in which the outcome measure - the results - are completely unfamiliar, as are its units e.g. the Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire or the Neck Disability Index. In these cases, the effect size gives a clear indication of the difference, or change, in outcome scores. It also permits, for example, a comparison of two studies (or more) looking at the same intervention but using different outcome measures for which direct comparisons are impossible.

Effect sizes should always be given, and given with 95% (or 99%) confidence intervals. If the arms of the confidence interval cross the 'line of no effect' then the intervention cannot be concluded as effective, no matter where the central point lies.

04.06.2020:

I thank you for your interest in my article. Your viewpoint is that a standardized effect size can be useful when scale of the measure may be unfamiliar to many readers. This argument has been raised by several researchers, and it is also discussed in some of the references in my article.

When a scale measure may be unfamiliar to many readers, I think the author ought to aid the readers in interpreting the effect size. But I am not convinced that a standardized effect size is the way to go. Rather, I would report what is regarded as clinically relevant. For example, in (1) we report a randomized controlled trial comparing two treatment pathways for hip fractures. The primary outcome was mobility four months after surgery, measured by the screening test Short Physical Performance Battery (SPPB). This is a scale ranging from 0 to 12 points, where higher values indicate better mobility. An effect size of 1.0 points on this scale is regarded as a substantial meaningful change, and 0.5 is regarded as a small meaningful change (1). The reported effect size of 0.76 points in favor of the new treatment pathway can thus be regarded as clinically relevant.

We did not report the standardized effect size, which would be the effect size divided by the standard deviation in the control group, in this case 0.76/3.12 = 0.24. Such a standardized effect size would be regarded as a small effect. If the same effect size of 0.76 had been found in a study of more homogeneous patients, say with standard deviation 1.55, the standardized effect size would be 0.49, typically regarded as moderate. But the clinical relevance would be exactly the same.
Regarding your last point, I completely agree that effect sizes should be reported with some uncertainty measure, usually confidence intervals. But I generally prefer the effect size on the original scale, rather than a standardized effect size.

References:

1. Prestmo A, Hagen G, Sletvold O et al. Comprehensive geriatric care for patients with hip fractures: a prospective, randomised, controlled trial. Lancet 2015; 385: 1623-33.

This article was published more than 12 months ago and we have therefore closed it for new comments.

Published: 17 February 2020

Tidsskr Nor Legeforen 17 February 2020 Vol. 140.

doi:

10.4045/tidsskr.19.0665

Published: 17 February 2020

Tidsskr Nor Legeforen 2020 Vol. 140.

doi: 10.4045/tidsskr.19.0665

PDF

Print

Is the effect small or large?

What is effect size?

Unstandardised is easy to understand

Effect sizes can be helpful

Standardized effect size or clinical relevance?

Recent Articles