Statistical power: Before, but not after!

Stian Lydersen About the author

Comments

Your comment will be stored on the norwegian translation of the article.
Adelson Pinon
About the author

The author’s main point was to criticize retrospective power calculations. There is no disagreement here. This brief note simply addresses the definitions of Statistical Power (SP) stated in the article.
The author defined SP as “the probability of rejecting the null hypothesis in a future study”. He also asserted that SP “is the probability that the result of a study with a given number of participants will be statistically significant”. Both definitions are incorrect. These definitions don’t take into account the fact that the null hypothesis may be true (i.e., the probability of rejecting the null is the type I error), and seem to conflate power with replication probability (i.e., the probability of rejecting the null hypothesis in a future study, assuming the alternative hypothesis is true, depends not only on a large enough sample size but also on methodological quality).
SP is the long-term probability of rejecting a false null hypothesis, given the population effect size (ES), a significance level (α), and a sample size (N). Any of the four parameters described (SP, ES, α, N) is a function of the other three, which means that when any three of them are fixed, the fourth is completely determined. For example, in research planning, the required N can be determined from a given SP, ES, and α.
SP is based on a number of assumptions: the long-term probability is derived from a normal distribution, the value of the ES is obtained from perfect experimentation, and N is randomly selected. These assumptions need to be formally evaluated depending on the study. E.g., surgical treatment of obesity may be successful in reducing weight but is not capable of increasing weight after the intervention. This odd possibility is nevertheless expected under the assumption that the data is normally distributed. A SP calculation under this circumstance would inflate the desired N.
Studies with low SP appear to be common in the biomedical sciences, and the previous comments don’t condone the surplus of underpowered studies. The main point was to explain the meaning of SP and make explicit the underlying assumptions.

Stian Lydersen
About the author

I thank Adelson Pinon for the interest in my article. We agree that the message of my article was to criticize retrospective power calculations.
Pinon focuses on the definitions of statistical power, and claims that my definitions are incorrect. He defines statistical power as the long-term probability of rejecting a false null hypothesis, given the population effect size, a significance level, and a sample size. I completely agree with Pinon that this is a correct and precise definition of statistical power. But I do not agree that my definitions are incorrect. Rather, they are less precise. There is a limit to which level of precision is purposeful in an article in the column “Medisin og tall”.
I thank Pinon for adding the more precise definition in his comment.