Normalisation

So far have used linear models approach to remove experimental effects from data. Removal of experimental effects also known as normalisation. Most methods assume that relatively few genes are undergoing differential expression.

Original method: divide background corrected values by channel median (average differential expression for each channel should be about the same)

ANOVA method: basically the same, but instead of dividing by median, we subtracted logged mean from logged data, has better statistical properties but not really any different from original method

What if the experiment effects are non-linear?

Non-linear dye effect

Some non-linear dye effects often found (can be detected with control hybridisation). Can lead to false discoveries. MA plot seems to be a good graphical way of detecting these intensity dependent effects not easily seen on simple R vs G plot.

$M = log_2(R/G) = log R - log G$ , $A = \frac{1}{2} log(RG) = 1/2 (log R + log G)$ .

ANOVA approach would simply subtract off average effect – this would centre cloud about zero, but wouldn’t change shape.

Alternatives: we assume non-differentially data should be distributed around diagonal, with some amount of symmetric data. This is equivalent to an MA plot symmetric about 0. Can straighten this out using some kind of locally weighted regression (eg. lowess or loess).

In R: l <- lowess(A,M); m <- m - lowess[order(m)].

Then transform back to R and G values: $R' = \sqrt{2^{2A' + M'}}$ , $G' = \sqrt{2^{2A' - M'}}$ .

Problems

We assume most genes aren’t differentially expressed – what if they are? What if expression is not symmetric? Default smoothing parameters can also cause problems. Make sure to check the smoothed fit!

Pin tip correction

Possible there may have been problems with some pin tips, so do separate correction for each pin-tip. Beware – can be a disaster!