## Different adjustment methods

Consider a study where six hypothesis tests are performed. If all tests are made at a significance level of 5 %, each of them will have a 5 % probability of making a Type I error, that is, erroneously rejecting the null hypothesis (1). The probability of a Type I error in at least one of the hypothesis tests, also referred to as the *family-wise error rate* (FWER) (2), will then be substantially higher than 5 %, and at worst almost 30 %. Sometimes it is desirable to control this error rate to prevent it from exceeding a pre-defined threshold, for example a significance level of 5 %.

The simplest method is a so-called Bonferroni correction. This means multiplying the *p*-values by the number of hypotheses, in this case six, before comparing with the significance level. However, the Bonferroni correction is very conservative, which means that the statistical power, and thereby the probability of determining true hypotheses, will be greatly reduced. By using the Šidák correction, only a marginal improvement is achieved. Alternative methods, in order of increasing statistical power, are Holm's *step-down* correction, Hochberg's *step-up* correction and the Hommel correction (3). These methods are valid under general assumptions, and can be generally recommended.

In some situations, a large number of hypotheses are tested. For example, genetics studies may involve several hundred thousand hypotheses. In practice it will thus be impossible to control for the family-wise error rate. Instead, we have to content ourselves with controlling for the *false discovery rate* (FDR) (2). We allow for a certain proportion, normally 5 %, of the hypotheses that we mark out as true in one and the same study, to be false positives. When controlling for the family-wise error rate, on the other hand, we would not 'accept' even a single false-positive finding. The most common method for controlling for the false discovery rate is called the Benjamini-Hochberg correction (4). Controlling for the false discovery rate can also be relevant in trials, for example with as few as 8 to 16 hypothesis tests, although its benefits are greater for testing a large number of hypotheses (4).

Let us look at an example where we have six unadjusted *p*-values listed by size (Table 1). We can see how methods that make for higher statistical power typically give lower *p*-values. We see that the lowest adjusted *p*-value is the same as that obtained by the Bonferroni correction, irrespective of method. The final column with *p*-values adjusted with the Benjamini-Hochberg correction controls only for the false discovery rate. With only six hypothesis tests, another method would be used in practice.

##### Table 1

An example with six p-values, unadjusted and adjusted by different methods of correction.

Unadjusted *p-*value | Bonferroni | Šidák | Holm's *step-down* | Hochberg's *step-up* | Hommel | Benjamini-Hochberg |
---|

0.0003 | 0.0018 | 0.0018 | 0.0018 | 0.0018 | 0.0018 | 0.0018 |

0.009 | 0.054 | 0.053 | 0.045 | 0.042 | 0.028 | 0.021 |

0.013 | 0.078 | 0.076 | 0.052 | 0.042 | 0.039 | 0.021 |

0.014 | 0.084 | 0.081 | 0.052 | 0.042 | 0.042 | 0.021 |

0.04 | 0.24 | 0.22 | 0.08 | 0.08 | 0.06 | 0.048 |

0.06 | 0.36 | 0.31 | 0.08 | 0.08 | 0.06 | 0.06 |