Block structure

In experimental design set of treatments applied to set of EU(experimental units)’s. Treatments and experimental units can have variety of structures that determine the analysis. Notation

n, number of observations
k, number of blocks
r, number of repititions within each block

Data model

As we use more complicated arrangements, helpful to represent data by a “data model”. For block structure of single set of eu’s, the data model is $y_i = \mu + e_ij$ . where $E[e_i] = 0$ , $Var[e_i]=\sigma^2$ , $Cov[e_i, e_j] = 0$ .

Significance test and confidence intervals require further assumptions about the error. Often assume i.i.d normal(0, $\sigma^2$ ).

Why use a data model?

helps to determine analysis
helps to understand what can be learned from experiment
tells what can be estimated and how to evaluate precision of estimates

Block structure 1. Single set of eu’s

Structure: All eu’s in single set.

Data model: $y_i = \mu + e_ij$

Block structure 2. One-way classification.

Structure: Separate groups of eu’s, determined by characteristic of eu

Data model: $y_i = \mu_i + e_ij$

Use ANOVA to examine differences between blocks. Can estimate variance within gardens ($s^2$ , pooled variance) and between blocks ($\s_G^2 = \sum_i\frac{(y_{i.} - \hat{y_ii})^2}{k-1}$ ). If there are no differences between blocks, then the block means would be random samples of size k from a single distribution with variance $\sigma_G / r$ and $k\cdot s_G^2$ esimates $\sigma^2$ .

So the ratio $k s_G^2$ to $s^2$ measures variability among blocks relative to variability within blocks. We call this summary statistic F, and it has a known distribution which depends only on the numerator and denominator degrees of freedom.

This set of calculations usually displayed in ANOVA table.

Source	df	SS
Block	b-1	\[r\sum_i(y_{i.} - y_{..})^2 = r\sum_i y_{i.}^2 - kry_{..}^2\]
Error	$b(r-1) = n - b$	\[\sum_i\sum_j (y_ij - y_{i.})^2 = \sum_i\sum_j y_ij^2 - r\sum_ij y_i.^2\]
Total	$n-1$	\[\sum_i\sum_j (y_ij - y_{..})^2 = \sum_i\sum_j y_ij^2 - kry_{..}^2 \]

When blocking eu’s, we normally select them so that they are different, we aren’t testing the hypothesis there is no difference. In this case the F-test just provides confirmation they are indeed different and gives information on how large that variability is relative to variability within blocks.

ANOVA calculation rule: multipler applied to squared mean is number of observations included in that mean.

Intuitively, ANOVA separates total variability into two portions that represent the data’s structure: among blocks and within blocks.

Alternative data model: $y_i = \mu + g_i + e_ij$ . $y_ii$ estimates $\mu$ , $y_i. - y..$ estimates $g_i$ , $MS_e$ estimates $\sigma$ .

Unequal replication.

Equal replication is not required, or always achievable.

If $r_i$ equals number of replications in block i, then $n = \sum_i r_i$ , $SS_B = \sum_i r_i y_i^2 - ny_ii^2$ , $SS_E = \sum_i \sum_j y_ij^2 - \sum_i r_i y_i.^2$ .

Structure 3: Two-way classification.

Eu’s “cross-classified” by two factors.

Data model: $y_ij = \mu + a_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ . Assume $\sum a_i = 0$ , $\sum b_j = 0$ .

ANOVA: Block classified by two-way (by row and column) – we must calculate both row SS and col SS. No replication in this structure, so no “pure error” SS to calculate. However, we can obtain “residual SS” which can be treated as error SS under certain assumptions.

Source	df	SS
Rows	a-1	\[b\sum_i y_{i.}^2 - aby_{..}^2\]
Columns	b-1	\[a\sum_j y_{.j}^2 - aby_{..}^2\]
Error	(a-1)(b-1)	subtraction
Total	ab-1	\[\sum_i \sum_j y_ij^2 - aby_{..}^2\]

Expected value of $y_ij$ estimated by $\hat{\mu} + \hat{a_i} + \hat{b_j}$ = $y_{..} + (y_{i.} - {y_{..}}) + (y_{.b} - {y_{..}})$ = $y_{i.} + y_{.j} - {y_{..}}$ . Residual = $y_ij - y_{i.} - y_{.j} + y_{..}$ .

Hidden replication.

In two-way classification with one eu per block, there is no true replication. However, we have hidden replication in the following sense, because each a difference is replicated b times and (vice versa).

Structure 4. Two-way classification with replication.

Must randomise order or all combinations.

Notation: $y_ijk$ is row i, col j and observation k.

Data model: $\mu + a_i + b_j + (ab)_ij + e_ijk$ , $e_ijk ~ N(0, \sum^2)$ . $(ab)_ij$ is an additional term denoting “interaction” between A and B.

Structure 5. Nested factors.

Consider two blocking factors A and B. If the levels of B are different for each level of A then we have a nested factor structure.

Data model: $y_ijk = \mu + a_i + b_i,j + e_ijk$ , where $b_i,j$ denotes effect of factor b in factor a.

Can do ANOVA, and ANOVA breaking down nested factor.

Variance component models.

Consider again the one-way classification situation ($y_ij + \mu + b_i + e_ij$ ). There are situations where blocks are picked randomly rather than deliberately. $b_i ~ NID(0, \sigma_b^2$ and $e_ij ~ NID(0, \sigma^2$ , also $b_i$ and $e_ij$ are independent. Analysis objective is now to estimate two “variance components” in this model. (eg. Is $\sigma_b^2$ too large and causing quality problems).

In ANOVA, $E[MS_B] = r(\sigma_b^2 + \frac{\sigma^2}{r}) = r\sigma_b^2 + \sigma^2)$ and $E[MS_E] = \sigma^2$ , so is possible to calculate each variance component.

In some situations can be difficult to decide whether a block should be regarded as random or fixed. Can be a controversial issue. The litmus test: do we care about individual blocks or not? (yes = fixed, no = random)