Multiple comparison procedures

When we’re trying to find significant changes in expression, we’re basically testing $H_{0k}$ : gene $k$ is differentially expressed, for each gene. We’ve seen two possible hypothesis tests based on ANOVA parameters, but there are many others that can be formulated. Most will produce a p-value for each gene, which represents the probability of observing a result this large or larger under the null hypothesis.

We need to decide how small a p-value needs to be before we consider it to be significant. When testing single hypotheses, we often fix a type I error rate of $\alpha = 0.05$ or $\alpha = 0.01$ . (Type I error = reject null hypothesis when it is true = false positive).

Setting a type I error rate of 0.05 means we are willing to make a type I error rate in 5% of our test. If we’re testing 20,000 genes for differential expression, we would expect 1,000 type 1 errors! To get around this problem we use multiple comparison procedures.

Family wise error rate control

Many multiple testing procedures try to control the FWER, where the FWER is defined as the probability that there is a single type I error in the entire set (family) of hypotheses tested.

Bonferroni correction

The most simple MCP is the Bonferroni procedure, where we use $\alpha^{*} = \alpha / n$ . This guarantees control regardless of the true number of null hypotheses, known as strong control. The Bonferroni correction is simple, but very conservation, and has low power.

Fisher’s least significant difference test

Fisher also proposed the LSD which involves performing an overall F-test (to test the global null hypothesis) and only proceeding with individual significant tests if the global null is rejected. This controls the FWER only when all null hypotheses are true, called weak control.

Multi-stage procedures: Holm’s sequential rejection procedure

Multistage procedures (as demonstrated by Fisher’s LSD) can give very tight strong control of FWER.

Holm’s sequential rejection provides strong control, but has more power than the Bonferroni procedure:

Put p-values in ascending order. Let $\alpha^{*}_i = \alpha / i$ . Step up procedure: Start from the smallest p-value and ascend while $p(i) \lt \alpha^{*}_i$ Step down procedure: Start from the largest p-value and descend until $p(i) \lt \alpha^{*}_i$ .

FWER gives a high level of certainty, but is very conservative, and at very small p-values our assumptions (eg. normality) may not hold well.

False discovery rate

FDR introduced in 1995 by Benjamini and Hochberg. Provides less conservative approach than FWER methods: greater power, but at the cost of increased type I errors.

Some notation:

$m_0$ true null hypothesis, of which $U$ are declared non-significant (true negatives) and $V$ significant (false positives)
$m-m_0$ non-true null hypotheses, of which $T$ are declared non-significant (false negatives) and $S$ significant (true positives)
total of $R$ tests declared significant (positives)

FWER tries to make sure $P(V \gt 0) \le \alpha$ . FDR is concerned with controlling $V / R$ , the proportion of incorrect rejected null hypotheses.

Formally B&H defined FDR as $E( \frac{V}{R} | R \gt 0 ) P(R \gt 0) $ . Other have defined FDR as $\frac{V}{R}$ and $E( \frac{V}{R} | R \gt 0 )$ (positive FDR).

Procedure

Put p-values in ascending order. Let $\alpha^{*}_i = i\alpha / m$ . Step up procedure: Start from the smallest p-value and ascend while $p(i) \lt \alpha^{*}_i$ Step down procedure: Start from the largest p-value and descend until $p(i) \lt \alpha^{*}_i$ .

Operating characteristics

Guarantees $FDR \lt \alpha$ regardless of how many null hypotheses are true. In fact, has been proved that $FDR = \alpha \pi_0$ , where $\pi_0$ is the proportion of true null hypotheses, so when $\pi_0$ is small, FDR provides considerably more control. Adaptive FDR attempts to estimate $\pi_0$ and use this to compensate, but in practise is hard to get good estimates of $\pi_0$ .

FDR defined as expectation which means there must be underlying distribution. What is this distribution? How variable is it? How does changing $\pi_0$ affect the distribution?

Other MCP alternatives

Fixed rejection regions, assigns arbitrary rejection region then calculates error rates. Same as regular frequentest approaches.

Bayesian approaches use posterior probabilities. Works really well when model correct.

Summary

Choose procedure based on the level of control you’re interested in. (You can’t do both!)

All MCPs above assume that tests are independent, which is probably not true in practise. Various adjustments can overcome this.