Experimental design

Want to determine best way to put samples on microarrays, rather like field trials. We’re interested in the contrasts between treatments ($t_k_1 - t_k_2$ ), known as elementary contrasts. We want to design an experiment that allows us to estimate these contrasts, while accounting for other effects.

Blocking

Often we can’t observe different treatment under the same experimental conditions, we try and account for this using random blocks.

Some notation:

$b$ blocks, $v$ treatments and $r_k$ replicates for each treatment
if $r_k = r$ , design is equireplicate
if all blocks same size, experiment is proper
if treatment occurs at most once in each block, design is binary
design is connected is all elementary contrasts can be estimated

For microarrays, we’re interested in proper, equireplicate, binary, connected designs.

If $k \lt v$ then every treatment can’t occur in every block, called incomplete block design. Best we can do is to find a balanced design, where the variance of elementary contrasts is equal, known as balanced incomplete block design.

If there are more than one blocking factors, we can use a latin cross design (provided all factors have same number of levels).

Microarray experimental design

Visualising microarray designs

Arrow denotes a block, two treatments joined by line appear on same array. Treatment at pointy end labelled red, other green.

Want to minimise variance of elementary contrasts, which depends on distance on design graph. Treatments that share line have smallest variance.

BIBD

Because microarrays have very small block size, may not be best option, as will require many (expensive) arrays.

Reference design

Each treatment compared to reference sample on same array.

Disadvantages: half available spots wasted, no direct comparison, often no dye balance Advantages: can easily add more later, or repeat arrays

Loop design

Circular design graph.

Advantages: don’t waste any channels, dyes balanced Disadvantages: variance gets big for large loops, easy to disconnect loop, but tricky to fix, hard to add more treatments

Testing contrasts

With the designs above, as the number of treatments and arrays increases, the number of comparisons does to (even though the model stays the same). For each gene we need to test each combination of treatments (eg. $T_1 - T_2$ , $T_2 - T_3$ , $T_3 - T_1$ ). A complicating factor is that the tests aren’t independent, eg. if $T_1 = T_2$ and $T_2 = T_3$ , then $T_1 = T_3$ . Although random error may obscure this, dependence is still present.

R can test all pairwise comparisons using the multcomp library. The simtest function provides p-values adjusted for dependence, by using a multivariate t-distribution. Tukey’s HSD also provides a way of dealing with correlations between tests.

Things get trickier with more than 2 treatments on one array, but these are uncommon anyway.

Additional variance components

Until now we have assumed that the model parameters are fixed, and only error is random. It’s possible to describe additional variance components for effects we are interested in as viewing as random variables. Mixed models is the field on statistics concerned with this.

In microarray analysis, we’re usually interested in modelling the array effect as random. Wolfinger et al (2001) were the first to suggest this, using a two stage model:

$Y_ijkl = \mu + A_i + T_j + AT_ij + \delta_ijkl$ . $\delta_ijkl = G_k + AG_ik + TG_jk + ATG_ijk + \epsilon_ijkl$ , where all array effects and interactions are considered to be normally distributed with mean 0 and appropriate variance.

In general, it’s probably more appropriate to use a random model, but it does make calculations more complicated. Has implications for experimental design, as there will be increased variance between contrasts not on the same array (eg. in a reference design).

IBD with random blocks

Possible to calculate intra- and inter-block estimates, which can be combined to give improved estimates.