Introduction to sample surveys

Why sample?

Basic procedure:

  1. decide on objectives
  2. choose sampling units
  3. choose mode of sampling
  4. design measuring instrument
  5. pilot
  6. choose sampling design
  7. choose sample size
  8. select units
  9. train interviewers
  10. enter and check data
  11. anaylse

Two types of surveys:

Sampling distributions

Why use random sampling?

Have chosen to use random sampling, but still have a lot of flexibility in choice of randomisation scheme, and some flexibility in choice of estimator. Can use MSE $= Var(\bar{y}) + bias^2$ to compare estimators and randomisation schemes.

Nonsampling errors

Conventional to split total survey error into sampling and non-sampling error. Non-sampling error includes all errors not attributable to incomplete enumeration.

In contrast to sampling errors, non-sampling errors often increase with sample size or complexity of sampling scheme. Non-sampling error likely to be at least as large as sampling error.

Types:

Simple random sampling

Basic ideas.

Simplest random scheme where every subset of n units has the same chance of forming the sample, $p = \left( \binom{N}{n} \right)$ .

Distribution of sample mean.

\[E(\bar{y}) = \mu\] \[Var(\bar{y}) = (1-f)\frac{\sigma^2}{n} \] \[E(s^2) = \sum^n_1 \frac{(y_i - \bar{y})^2}{n-1} \]

With independent observations, the CLT ensures that $\bar{y}$ is approximately normal, which we can use to create CI(confidence interval)s etc.

Estimating proportions

\[E(p) = \pi\] \[Var(p) = (1-f)\frac{\sigma^2}{n} = (1-f)\frac{p(1-p)}{n-1} \]

Stratified sampling

Can improve on SRS given more information.

Basic idea is to split population into strata:

Advantages:

General theory

Population mean = weighted average of stratum means.

\[ \bar{y} = \sum_l W_l \bar{y}_l \]

Population SE(standard error) = weighted average of statrum SE’s

\[ Var[\bar{y}_st] = \sum_l W_l^2 Var[\bar{y}_l] \]

Stratified sampling does well when $\sigma^2_l$ are small (values in each stratum are similar).

Allocation

\[n_l \propto \frac{W_n \sigma_n}{\sqrt{cost_n}} \]

Post-stratification

How many strata?

Goal of stratification is to reduce within stratum variation, so we want more strata, but this comes with increased complexity and cost. Additionally var $\propto \sigma^2_l / n_l$ so decreasing strata sizes will increase variance.

If number of strata, $L$ , chosen, then choose stratum boundaries to minimise $\sum W_l \sigma_l$ (need preliminary smaple of distribution of $y$ ). Approximate rule is to choose $y_h$ so that $W_h(y_h - y_{h-1})$ is constant. With optimal choice of boundaries var $\propto 1/L^2$ . Increasing L too far – within stratum variance dominates.

Conclusion: $L$ often small, < 6.

Cluster sampling

SRS and stratified sampling both need list of all experimental units, and if you have to visit them it can be expensive. Cluster sampling reduces problem by only sampling cluster of population, cheaper but higher standard errors (although usually lower for same cost)

Basic results of cluster sampling

$N$ (possibly unknown) experimental units, grouped in $C$ (known) clusters of size $M_i$ . $ \mu_Y = \frac{\sum_C M_i \bar{Y}_i}{\sum_C M_i} = \frac{\sum_C T_i}{\sum_C M_i} $ (with obvious estimator) $Var(\bar{y}) = \frac{1- f_1}{c\bar{M}^2} \frac{\sum_C M_i^2 (\bar{Y}_i - \mu_Y)^2}{C - 1}$ , which can be estimated with $\frac{1-f_1}{c\bar{m}^2}s_r^2$ , where $s_r^2 = \frac{\sum_C m_i^2 (\bar{y}_i - \bar{y})^2}{c-1}$ (using $\bar{M} = N / C$ if $N$ known). (remember t-values for CI(confidence interval)s depend on C, not N)

To estimate $\tau$ we can treat clusters as sampling units and totals as unit values, giving $\tau = C\bar{m}\bar{y}$ , with standard error $C\sqrt{\frac{1-f_1}{c}}s_t$ .

Sampling with probability proportion to size (PPS)

Another way to estimate $\mu_Y$ is to select clusters with probability proportional to size, then $\bar{y}_pps = (\bar{y}_1 + ... + \bar{y}_c) / c$ , which is an unbiased estimator of $\mu_Y$ . Variance is harder to calculate but if the sampling fraction ($f_1$ ) is small, then we can pretend the clusters are drawn independently (with replacement).

\[ \hat{V}_pps(\bar{y}_pps) = \frac{1}{c}\left( \sum_1^c \frac{(\bar{y}_i - \bar{y}_pps)^2}{c-1} \right) \]

In general, precision is high when cluster means are similar. PPS strategy tends to have slight edge of SRS in practice.

Special case: Equal cluster sizes

Both reduce to same formula for standard error, ie. $se(\bar{y}) = \sqrt{\frac{1-f}{c}}s_1$ where $s_1$ is the variance of the cluster means.

Special case: Estimating proportions

General formulae for estimator and standard errors don’t reduce much when estimating a population proportion.

\[ p = \frac{\sum_c m_i p_i}{\sum_c m_i} \] \[ se(p) = \sqrt{\frac{1-f_1}{c}} \frac{s_r}{\bar{m}} \]

where

\[ s_r^2 = \sum_c m_i^2 \frac{(p_i - p)^2}{c-1} \]

PPS estimator remains nearly unchanged

\[ \bar{\pi}_pps = \frac{\sum_c p_i}{c} \] \[ se(\bar{\pi}_pps) = \frac{s_p}{\sqrt{c}} \]

where $s_p^2$ is the variance of the cluster proportions.

As above, both reduce in the case of equal cluster sizes.

Special case: Systematic sampling

Choose unit at random from first k units, and then every k units after that. Can think of type of cluster sampling where the clusters are the partion under mod k, and we select one cluster at random. Systematic sampling works well if trend is present (built-in stratification effect) and for time series output, but badly for periodic data when sampling interval is multiple of period. Operationally easy.

How to get the variance?

Multistage sampling

As with cluster sampling, we select $c$ of $C$ clusters, but now instead of sampling all units in each cluster, we take a random sample. Most large surveys carried out this way.

Advantages:

Disadvantages:

Basic results

\[ \hat{\mu}_R = \frac{\sum_C M_i \bar{y}_i}{\sum_C M_i} \]

Let $f_i = \frac{c}{C}$ and $f_{2i} = \frac{m_i}{M_i}$ , then

\[ \hat{V}(\hat{\mu}_R) = \frac{1-f_1}{c} \sum_sample \frac{(M_i / \bar{M})^2 (\bar{y}_i - \hat{\mu}_R)^2}{c-1} + \frac{f_1}{c} \sum_sample \frac{(M_i / \bar{M})^2 (1-f_{2i})s^2_{2i}}{c m_i} \]

If number of sampled clusters is reasonably large, then $\hat{\mu}_R$ is approximately normally distributed.

Note: If $f_1$ is very small, then $\hat{V}(\hat{\mu}_R) ~ s_1^2 / c$ . This result holds for more general subsampling schemes than SRS; only need a scheme with unbiased sample mean. Using systematic sampling is common.

Sampling with probability proportional to size

Similar to PPS for cluster sampling, and if $f_1$ is small can pretend we are sampling with replacement and treat clusters like individuals.

Performance is similar to SRS subsampling (but need to know $M_i$ for every cluster). Is intuitively appealing because if we take same-sized sample from every cluster then every unit has same chance of being selected.

Equal cluster sizes

If $M_i = M$ for all clusters then both estimators reduce to mean of cluster means.

If $m_i = m$ as well then variance reduces to:

$\hat{V}(\hat{\mu}_R) = (1-f_1)\frac{s_1^2}{c} + f_1(1-f_2)\frac{s_2^2}{cm}$ , where $s_1^2 = \sum_c \frac{(\bar{y}_i - \bar{y}) ^2}{c-1}$ , and $s_2^2 = \sum_c \frac{(y_ij - \bar{y}_{i.})^2}{m-1}$ .

Can obtain these results from a standard one-way ANOVA where $s_1^2 = s_B^2 / m$ and $s_2^2 = s_w^2$

Estimating proportions

As with cluster sampling, formulae don’t simplify much. See formula sheet for details.

Optimal sub-sample sizes

For simplicity, we’ll only deal with equal cluster and sample sizes, when all estimators reduce to $\bar{y}$ . Suppose cost = $k_1 c \times k_2 cm$ . Variance of $\bar{y} = (1-f_1)\frac{\sigma^2_1}{c} + (1-f_2)\frac{\sigma^2_2}{cm}$ . Minimised when $m = \sqrt{\frac{k_1}{k_2}}\left( \frac{\sigma_2}{\sigma_u} \right)$ .

Stratified multistage sampling

In most large surveys first-stage sample will be stratified. Introduces no new problems, use results results above to estimate mean and se for each clutser, then weighted average to get overall results.

Using auxillary information in estimation

Some times can measure extra characteristics that have known population totals. We can often use this information to improve the precision of our estimate.

Ratio estimator

Suppose we draw SRS and obtain $y_1, y_2, ..., y_n$ for primary variable $Y$ and $x_1, x_2, ..., x_n$ for some other variable with known population mean $\mu_X$ . If $R = \mu_Y / \mu_X$ , then $\mu_Y = R \mu_X$ .

$\hat{\mu}_Y = \mu_X \frac{\bar{y}}{\bar{x}}$ $\hat{V}(\hat{\mu}_Y) = \frac{1-f}{n}s_r^2$ , where $s_r^2 = \sum \frac{(y_i - \bar{r}x_i)^2}{n-1}$

Called ratio estimator and in large samples is approximately normally distributed.

Regression estimator

Extends ratio estimator to more general linear regression case.

$\hat{\mu}_LR = \bar{y} + \hat{\beta}(\mu_X - \bar{x})$ $\hat{V}(\hat{\mu}_LR) = \frac{1-f}{n} \sigma^2_Res$ , where $\sigma^2_Res$ is the usual residual sum of squares.

Approximately normally distributed (remember less 2 degrees of freedom). Asymptotically has better variance than $\hat{\mu}_R$ or $\bar{y}$ .

Ratio and regression estimators in stratified sampling

Separate ratio estimator: estimator for each stratum and then combine using stratum weights.

Combined ratio estimator: form $\bar{y}_st$ and $\bar{x}_st$ and use to form ratio estimate

Both have $\hat{V}(\hat{\mu}_RC) = \sum { W_l^2 \frac{(1-f_l)}{n_l} s^2_l_rs}$ where $s^2_l_rs = \sum \frac{(y_l_j - \bar{r}x_l_j)^2}{n_l-1}$ .

Separate more efficient if sample sizes are large and slopes vary from stratum to stratum, combined if some of the stratum sizes are small.

Block structure

In experimental design set of treatments applied to set of EU(experimental units)’s. Treatments and experimental units can have variety of structures that determine the analysis. Notation

Data model

As we use more complicated arrangements, helpful to represent data by a “data model”. For block structure of single set of eu’s, the data model is $y_i = \mu + e_ij$ . where $E[e_i] = 0$ , $Var[e_i]=\sigma^2$ , $Cov[e_i, e_j] = 0$ .

Significance test and confidence intervals require further assumptions about the error. Often assume i.i.d normal(0, $\sigma^2$ ).

Why use a data model?

Block structure 1. Single set of eu’s

Structure: All eu’s in single set.

Data model: $y_i = \mu + e_ij$

Block structure 2. One-way classification.

Structure: Separate groups of eu’s, determined by characteristic of eu

Data model: $y_i = \mu_i + e_ij$

Use ANOVA to examine differences between blocks. Can estimate variance within gardens ($s^2$ , pooled variance) and between blocks ($\s_G^2 = \sum_i\frac{(y_{i.} - \hat{y_ii})^2}{k-1}$ ). If there are no differences between blocks, then the block means would be random samples of size k from a single distribution with variance $\sigma_G / r$ and $k\cdot s_G^2$ esimates $\sigma^2$ .

So the ratio $k s_G^2$ to $s^2$ measures variability among blocks relative to variability within blocks. We call this summary statistic F, and it has a known distribution which depends only on the numerator and denominator degrees of freedom.

This set of calculations usually displayed in ANOVA table.

Source df SS
Block b-1 \[r\sum_i(y_{i.} - y_{..})^2 = r\sum_i y_{i.}^2 - kry_{..}^2\]
Error $b(r-1) = n - b$ \[\sum_i\sum_j (y_ij - y_{i.})^2 = \sum_i\sum_j y_ij^2 - r\sum_ij y_i.^2\]
Total $n-1$ \[\sum_i\sum_j (y_ij - y_{..})^2 = \sum_i\sum_j y_ij^2 - kry_{..}^2 \]

When blocking eu’s, we normally select them so that they are different, we aren’t testing the hypothesis there is no difference. In this case the F-test just provides confirmation they are indeed different and gives information on how large that variability is relative to variability within blocks.

ANOVA calculation rule: multipler applied to squared mean is number of observations included in that mean.

Intuitively, ANOVA separates total variability into two portions that represent the data’s structure: among blocks and within blocks.

Alternative data model: $y_i = \mu + g_i + e_ij$ . $y_ii$ estimates $\mu$ , $y_i. - y..$ estimates $g_i$ , $MS_e$ estimates $\sigma$ .

Unequal replication.

Equal replication is not required, or always achievable.

If $r_i$ equals number of replications in block i, then $n = \sum_i r_i$ , $SS_B = \sum_i r_i y_i^2 - ny_ii^2$ , $SS_E = \sum_i \sum_j y_ij^2 - \sum_i r_i y_i.^2$ .

Structure 3: Two-way classification.

Eu’s “cross-classified” by two factors.

Data model: $y_ij = \mu + a_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ . Assume $\sum a_i = 0$ , $\sum b_j = 0$ .

ANOVA: Block classified by two-way (by row and column) – we must calculate both row SS and col SS. No replication in this structure, so no “pure error” SS to calculate. However, we can obtain “residual SS” which can be treated as error SS under certain assumptions.

Source df SS
Rows a-1 \[b\sum_i y_{i.}^2 - aby_{..}^2\]
Columns b-1 \[a\sum_j y_{.j}^2 - aby_{..}^2\]
Error (a-1)(b-1) subtraction
Total ab-1 \[\sum_i \sum_j y_ij^2 - aby_{..}^2\]

Expected value of $y_ij$ estimated by $\hat{\mu} + \hat{a_i} + \hat{b_j}$ = $y_{..} + (y_{i.} - {y_{..}}) + (y_{.b} - {y_{..}})$ = $y_{i.} + y_{.j} - {y_{..}}$ . Residual = $y_ij - y_{i.} - y_{.j} + y_{..}$ .

Hidden replication.

In two-way classification with one eu per block, there is no true replication. However, we have hidden replication in the following sense, because each a difference is replicated b times and (vice versa).

Structure 4. Two-way classification with replication.

Must randomise order or all combinations.

Notation: $y_ijk$ is row i, col j and observation k.

Data model: $\mu + a_i + b_j + (ab)_ij + e_ijk$ , $e_ijk ~ N(0, \sum^2)$ . $(ab)_ij$ is an additional term denoting “interaction” between A and B.

Structure 5. Nested factors.

Consider two blocking factors A and B. If the levels of B are different for each level of A then we have a nested factor structure.

Data model: $y_ijk = \mu + a_i + b_i,j + e_ijk$ , where $b_i,j$ denotes effect of factor b in factor a.

Can do ANOVA, and ANOVA breaking down nested factor.

Variance component models.

Consider again the one-way classification situation ($y_ij + \mu + b_i + e_ij$ ). There are situations where blocks are picked randomly rather than deliberately. $b_i ~ NID(0, \sigma_b^2$ and $e_ij ~ NID(0, \sigma^2$ , also $b_i$ and $e_ij$ are independent. Analysis objective is now to estimate two “variance components” in this model. (eg. Is $\sigma_b^2$ too large and causing quality problems).

In ANOVA, $E[MS_B] = r(\sigma_b^2 + \frac{\sigma^2}{r}) = r\sigma_b^2 + \sigma^2)$ and $E[MS_E] = \sigma^2$ , so is possible to calculate each variance component.

In some situations can be difficult to decide whether a block should be regarded as random or fixed. Can be a controversial issue. The litmus test: do we care about individual blocks or not? (yes = fixed, no = random)

Treatment structure

Like block structure, treatment structure can involve one factor, or multiple factors, crossed or nested.

Single treatment factor.

Experimental objective: evaluate and compare effects of single treatment factor (with k levels) on response.

Method: Randomly assign treatments to EU(experiemental units)’s, a completely randomised design.

Analysis: Same as for one-way blocking structure, but we are now interested in p-value for ANOVA. The F-test is just a screening test, provides evidence (when significant) that means differ more than could be expected by chance. Further analysis is need to identify how.

Assumptions underlying F-test

Need to check assumptions before we go too far:

Checking for normality. Estimate $\mu_i$ by $\hat{y_{i.}}$ . Residual is $y_ij - \hat{y_{i.}}$ . Check for normality with q-q plot, shapiro-wilk test for numerical summary.

Data anomalies (eg. outliers, clusters) generally due to “lurking variables”. It’s not often not that normality assumption has been violated, but identical distributed violated because actually a mix of two or more data sets. Up to experiementer and others involved to think about “lurking variables” and design experiment and protcol so they won’t invalidate the experiment.

Checking for equal variances. Graphical display (eg. from minitab, overlapping cofidence intervals indicate assumption ok), or formal statistical tests (eg. Bartlett’s, Levene’s).

Comparing treatment means

Confidence interval for a treatment mean, $\mu_i$

Treatment mean: $y_i$ Standard error of mean: $SE = \sqrt{\frac{MS_E}{r}}$ , $df = df_E$ . From table obtain $t = t_{0.25}(df_E)$ . 95% CI = $y_i \pm t \cdot SE$ .

Meaning of confidence interval: if our conclusions are unchanged over the entire range of the confidence interval then we have a definitive result, otherwise we need more data.

Graphical comparison.

Plot treatment means on dot plot. Draw bell-shaped curve centered at $y_{..}$ with centre 95% spanning $\pm t \cdot SE$ . Interpret by mentally sliding curve along chart. Provides indication of which means could be from same distribution and which stand out. Minitab has a function which does this automatically.

“Formal” statistical comparison

Goal: control the type I error rate ($\alpha$ ) when comparing pairs of treatment means.

Problem: Non-independence in multiple comparisons among same means, makes it difficult to evaluate type I error rate.

T-test: $SED = \sqrt{2 \cdot MS_E / r}$ . Can’t just use t-test, because if $\alpha = 0.05$ for comparison of two means, if we compare multiple means, the probability that at least of those of comparisons is significant is $\alpha \geq 0.05$ . Therefore, multiple use of t-test will find too many significant differences.

Tukey’s studentised range. Basic idea: if all means are equal then ${y_i}$ are all from same distribution. Look at range of sample ($max(y_{i.}) - min(y_{i.})$ ), “studentise” by dividing by SE. $q = (max(y_{i.}) - min(y_{i.}))/ \sqrt{MS_E / r}$ . Derive reference distribution of q (depends only on means and $df_E$ ) and compare to observed value of q. Interpretation: look up $q_{k; df_E, \alpha}$ and multiply by SE, any differences larger than this are significant.

Duncan’s multiple range test. Once you’ve decided one group is significant different, remove and compare remaining means.

Bonferroni. TSR required special tables. Bonferroni is alternative that only requires t tables. Number of tests that can be compared is $m = \frac{k(k-1)}{2}$ , so do t-test using $\alpha = 0.05 / m$ . Overall type I error rate will be less than 0.05.

Can express all in similar terms by calculating 95% CI. Bonferroni only slightly conservative relative to TSR.

Unequal replication.

If number of replications in each treatment ($r_i$ ) is not consant, but $r_i$ ’s don’t vary too much, apply previous techniques, using $\bar r_. = average(r_i)$ or $\bar r_. = harmonic average(r_i)$

Sizing an experiment: How many EU’s?

Sample size questions often most difficult to answer statistically – requires some idea of unknown parameters that you want to estimate, and costs and benefits to define reasonablle goals and trade-offs.

Approach one: Using CI(confidence interval)s.

  1. Specify some criterion related to precision of results. (eg. $LSD = \frac{t_{0.025}(k(r-1))}{\sqrt{2 \cdot MS_E / r}}$ ).
  2. Evaluate that criterion as function of r (assume t-value ~ 2 for large r).
  3. Specify threshold.
  4. Determine r that meets threshold level.

Once we state the problem we can solve for required number of replications. Big problem is deciding on quantitative threshold.

Approach two: using a decision rule

Eg. Economic analyis has shown that if fertiliser B yields at least 2 pounds more per plant than fertiliser A, fertiliser B will be economically justified. Investigator wants to be sure of 2 lb. difference, so decision rule is set: switch to fertiliser B if lower 95% confidence interval on $(\mu_B - \mu_A)$ is at least 2. Recognises and accounts for uncertainty in estimating $(\mu_B - \mu_A)$ – good statistical practice.

Rule: select B if $y_{B.} - y_{A.} - t_{.05}(k(r-1))SED \ge 2$ , where $SED = \sqrt{2MSE_r / r}$ . Treat $\sigma$ as known based on pilot experiment, use $\sqrt{MSE} = \sigma$ , let $\delta = \mu_B - \mu_A$ ; denote $d = y_{B.} - y_{A.}$ , then $d ~ N(\delta, \sigma^2(2/r))$ .

Probability that data will satifisfy decision rule is a function of $\delta$ and $r$ . $P(choose B) = P(z \ge \frac{2-\delta}{\delta(2/r)} + 1.65)$ . Draw curves for differing values of $\delta$ and $r$ (operating characteristic curves) and choose appropriate. Note that more replications can detect a smaller difference in $\delta$ . Can extend by recognising that $\sigma$ is also estimated, and account for the df’s need to estimate $\sigma$ .

Other comparisons of treatment means

Context-driven comparisons. Eg. deoderant manufacturer claims that my deoderant lasts twice as long as brand X. Experiment:

Special comparisons: contrasts

Contrast – linear combination of treatment means $C = \sum_{i=1}^k c_i y_i$ , with $\sum_{i=1}^k c_i = 0$ .

Contrasts are orthogonal if dot-product of coefficient vectors = 0. If a treatment has k levels, we can find k-1 independent or orthogonal contrasts. Can be tested separately, each with 1 degree of freedom.

Treatment structures with two or more factors

Write data models as for block structures. Crossed factors much more common than nested factors.

Now important to examine interaction – use interaction plot.

Experimental designs

Three components of experimental design:

Many possible experimental designs, with same two goals:

Completely randomised design

Blocks: single group Treatments: $t$ treatments, any structure possible Allocation: randomly assign treatment $i$ to $r_i$ EU’s

Randomised complete block design

Block structure: $b$ groups Treatments: $t$ treatments, any structure possible Allocation: randomly assign each treatment to one EU in each block

Model: $y_ij = \mu + t_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ .

Advantages: cost, block means cancel out when comparing treatments Assumptions: no interaction between treatment and block, error distribution same for all treatments and blocks

Test for interaction

Possible to test for interaction using Tukey’s test which detects curvilinear relationship between $y - \hat{y}$ and $\hat{y}$ (transformable non-additivity).

  1. Calculate residuals
  2. Calculate square predicted values $q_ij = (y_{i.} + y_{.j} - y_{..})^2$
  3. Calculate residuals for $q$ ’s
  4. Plot residuals vs. q-residuals. Interaction will show up as linear trend
  5. Calculate $P = \sum(q-resid)(resid)$ , $Q = \sum(q-resid)^2$ , slope = $P / Q$ , $SS_slope = P^2 / Q$
  6. Partition error SS in anova into non-add (slope) and diff, perform F-test

Randomised complete block design with replication

Model: $y_ij = \mu + t_i + b_j + (tb)_ij + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ .

Can relax assumption of no block-treatment interaction (other possibilities: live with it, you less divergent blocks). Have $b$ blocks of $r \times t$ units, each recieving one randomly assigned treatment. Can now separate true replication from interaction, and have more sensitivity for testing significance of differences. However, will be more expensive.

In industrial expt, usually minimise expt’s and maximise factors, assume low variability because well-controlled processes: RCBD with replication not appropriate. In social and life sciences, usually have fewer factors and more error: RCBD with replication useful and appropriate

Repeated measures design

Repeated measures taken on same EU. Has two-way structure similar to RCBD, but subject is not group of EU’s with randomly applied treatments, nevertheless can still be appropriate to use same analysis.

Balanced imcomplete block design

In RCBD designs number of EU’s in a block constrained to be multiple of number of treatments (all treatments replicated in each block). What do we do if block size is smaller than number of treatments? BIBD.

Model: $y_ij = \mu + t_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ . Same model as RCBD, but not all combinations included so need different analysis: use regression model to correct for effects of correlation.

Hidden power: if A & B are paired $n$ times, estimate of their difference is as precise if they were measure $2n$ times! Relies on assumptions of block-treatment independence and careful balancing.

$b$ blocks of $k$ EU’s, $t$ treatments occuring $r$ times, $bk = tr$ . Each pair of treatments occurs in $\lambda = r(k-1) / (t-1)$ blocks Relative efficiency: $e = t\lambda / rk$

When we can model blocks as random effects, block totals can be analysed to provide another set of estimated treatment effects, an inter-block analysis. Indepedent of above analysis (intra-block analysis). Weighted average of the two can be more precise than either set separately.

Latin square designs

Three factors (usually 2 blocking, 1 treatment) with same number of levels. Want to compare treatments removing variability from blocks. RCBD would need 64 runs for factors with four levels, can we make do with less? Latin square only needs 16, one treatment tested for each combination of blocks in such a way that effects can be separated and eliminated.

Model: $y_ij = \mu + t_i + r_j + c_k + e_ijk$ , $e_ij ~ NID(0, \sigma^2)$ Key assumption: no interaction between rows, columns and treatments.

Error term in model ($e_ijk$ ) represents variability if same conditions repeated independently – not true error variance, but can estimate it assuming no interaction. However, because is type of factorial design if we decide one factor is unimportant we can collapse the design and analyse without (sim. to stepwise regression, controversial if overdone). Analyse using standard ANOVA: means can be calculated directly because all other effects will cancel out.

Split-plot design