Statistics and data presentation: Understanding Effect Size

Statistically significant findings may not be practically significant. Statistical significance is directly related to sample size, so it is easier to find significance in a very large sample, even though in a practical sense the difference is trivial. This is because the sample size (n) impacts the standard error. One of the big advantages of effect size is that it can be used to compare two scales that may be different. It works by comparing the means in terms of standard deviation. Effect size is not the same as statistical significance: significance tells you how likely it is that a result is due to chance, and effect size tells you how important the result is.

An effect size is a standardized measure of the size of an effect and as such it is:

·         Comparable across studies

·         Not as reliant on the sample size

·         Useful for objectively evaluating the size of observed effect

 

Sullivan and Feinn (2012) give an example to illustrate the importance of understanding effect size: ‘A commonly cited example of this problem is the Physicians Health Study of aspirin to prevent myocardial infarction (MI). In more than 22,000 subjects over an average of 5 years, aspirin was associated with a reduction in MI (although not in overall cardiovascular mortality) that was highly statistically significant: P < .00001. The study was terminated early due to the conclusive evidence, and aspirin was recommended for general prevention. However, the effect size was very small: a risk difference of 0.77% with r2 = .001—an extremely small effect size. As a result of that study, many people were advised to take aspirin who would not experience benefit yet were also at risk for adverse effects. Further studies found even smaller effects, and the recommendation to use aspirin has since been modified.’

 

Indeed, although effect sizes were not always reported they are becoming more popular in academic papers and reports these days. To calculate effect size, we basically calculate the difference between means divided by the pooled standard deviations of the two groups. In one sample cases you take the hypothesized mean of the population, subtract from it the sample mean, and divide by the standard deviation. In two sample cases you subtract the means from each other and divide by the “pooled” or “weighted average” of the two standard deviations.

 

About 50 to 100 different measures of effect size are known. There are some common ways to measure effect size

      Cohen’s d  this is generally used to estimate effect size when comparing means, so this would be used with ANOVA or t-tests with equal sample size and variance. Because Cohen’s d is the difference in means relative to the pooled variance, regardless of sample size, it doesn’t change with sample size. 
Basically, Cohen's d = (Mean2 − Mean1)/Standard Deviationpooled.

      Pearson’s r – the value of effect size of Pearson’s r correlation varies between −1 and 1. We use this measure of effect size when we are investigating the strength of the relationship between two variables. A related effect size is r2, also referred to as R2 or “r-squared”. In the case of paired data, this is a measure of the proportion of variance shared by the two variables.

Although you would typically calculate these using statistical software, the formula is provided below if you are interested in understanding how it is calculated.

 

                         Pearson's r

                           

                        Where:

                        r = correlation

                        N = number of pairs of data

                        ∑xy = sum of the products of paired scores

                        ∑x = sum of x scores

                        ∑y = sum of y scores

                        ∑x2 = sum of squared x scores

                        ∑y2 = sum of squared y scores

      Glass’s Δ – this is generally used to calculate effect size when we conduct t-tests with unequal variances. A slight modification of this effect size, termed Glass’s Δ*, can be used with small sample sizes.

      Hedges’s g – this is used to estimate effect size for the difference between means and is similar to Cohen’s d. Hedges’s g is typically preferred to Cohen’s d. It has better small sample properties and has better properties when the sample sizes are significantly different. Again, these effect sizes can all be calculated using software, but looking at the formula gives you an understanding of how they are calculated. 

                      Hedges's g

 

      Odds Ratio/Risk rates – odds ratios measure how many times bigger the odds of one outcome is for one value of an Independent Variable, compared to another value. The odds ratio is an effect size which tells you the direction and strength of the relationship. Odds ratios are used in logistic regression. 

There are also typical rules of thumb used to interpret effect sizes and for Hedges’s g, Glass’s Δ and Cohen’s d these are:

      .20  weak effect (a difference of 1/5 of a standard deviation)

      .50  medium effect (a difference of 1/2 of a standard deviation)

      .80  strong effect (a difference of 4/5 of a standard deviation)

      .33  considered meaningful

 

For Pearson’s r the interpretations are:

      r = .1, d = .2 (small effect): the effect explains 1% of the total variance.

      r = .3, d = .5 (medium effect): the effect accounts for 9% of the total variance.

      r = .5, d = .8 (large effect): the effect accounts for 25% of the variance.

 

 Statistical power is the probability that your study will find a statistically significant difference between interventions when an actual difference does exist. Before starting your study, calculate the power of your study with an estimated effect size; if the power is too low, you may need more subjects in the study. How can you estimate an effect size before carrying out the study and finding the differences in outcomes?

 

Effect size helps readers understand the magnitude of differences found, whereas statistical significance examines whether the findings are likely to be due to chance. Both are essential for readers to understand the full impact of your work. Report both in the Abstract and Results sections. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. Effect size is an essential component when evaluating the strength of a statistical claim.

 

Maximise your publication success with Charlesworth Author Services.

Charlesworth Author Services offers offering statistical analysis for researchers in the field of medical and life sciences articles. This service will help the researcher improve the accuracy and reporting of their data prior to submitting their article to a publisher

To find out more about this service please visit: How does the Statistical Review Service Work?

Join us on our FREE series of webinars designed to help you understand statistics and data presentation for publication.  

 
 
 
 
 
 

Share with your colleagues

Related articles

Four common statistical test ideas to share with your academic colleagues

Choosing an Appropriate Quantitative Research Design

How to produce compelling Data Visualisations

Learn more

How do I present data in an academic article?

Tips on using figures and tables

Statistics and data presentation: Understanding Variables