Introduction to sample surveys
Why sample?
- cheaper than census
- quicker
- better control of response quality
Basic procedure:
- decide on objectives
- choose sampling units
- choose mode of sampling
- design measuring instrument
- pilot
- choose sampling design
- choose sample size
- select units
- train interviewers
- enter and check data
- anaylse
Two types of surveys:
- descriptive: estimate some characteristic of population
- analytic: examine some relationship between variables
Sampling distributions
Why use random sampling?
- removes subjective bias
- lets us assess accuracy of estimates
- results will be more acceptable to wider audience
- guarantees accurary proportional to sample size
Have chosen to use random sampling, but still have a lot of flexibility in choice of randomisation scheme, and some flexibility in choice of estimator. Can use MSE $= Var(\bar{y}) + bias^2$ to compare estimators and randomisation schemes.
Nonsampling errors
Conventional to split total survey error into sampling and non-sampling error. Non-sampling error includes all errors not attributable to incomplete enumeration.
In contrast to sampling errors, non-sampling errors often increase with sample size or complexity of sampling scheme. Non-sampling error likely to be at least as large as sampling error.
Types:
- coverage errors: frame errors, non-response
- measurement errors: question & format effects, respondent errors, interviewer effects
- processing errors
Simple random sampling
Basic ideas.
Simplest random scheme where every subset of n units has the same chance of forming the sample, $p = \left( \binom{N}{n} \right)$ .Distribution of sample mean.
\[E(\bar{y}) = \mu\] \[Var(\bar{y}) = (1-f)\frac{\sigma^2}{n} \] \[E(s^2) = \sum^n_1 \frac{(y_i - \bar{y})^2}{n-1} \]With independent observations, the CLT ensures that $\bar{y}$ is approximately normal, which we can use to create CI(confidence interval)s etc.
Estimating proportions
\[E(p) = \pi\] \[Var(p) = (1-f)\frac{\sigma^2}{n} = (1-f)\frac{p(1-p)}{n-1} \]Stratified sampling
Can improve on SRS given more information.
Basic idea is to split population into strata:
- divide population into non-overlapping groups (strata)
- SRS each strata
- produce estimate for each strata
- overall estimates weighted average of strata estimates
Advantages:
- protects against getting a really bad sample
- improves precision of estimates
- can use difference sampling methods in each stratum
- separate estimates in each stratum
- some protection against non-response
General theory
Population mean = weighted average of stratum means.
\[ \bar{y} = \sum_l W_l \bar{y}_l \]Population SE(standard error) = weighted average of statrum SE’s
\[ Var[\bar{y}_st] = \sum_l W_l^2 Var[\bar{y}_l] \]Stratified sampling does well when $\sigma^2_l$ are small (values in each stratum are similar).
Allocation
\[n_l \propto \frac{W_n \sigma_n}{\sqrt{cost_n}} \]Post-stratification
- eg. can’t identify stratum unit is from before sampling
- cannot benefit from sample design
- can use correct stratum weight $W_l$
- if sample size large, post-stratification will be close to proportional allocation
- biggest benefit is overcoming some effects of non-response
How many strata?
Goal of stratification is to reduce within stratum variation, so we want more strata, but this comes with increased complexity and cost. Additionally var $\propto \sigma^2_l / n_l$ so decreasing strata sizes will increase variance.
If number of strata, $L$ , chosen, then choose stratum boundaries to minimise $\sum W_l \sigma_l$ (need preliminary smaple of distribution of $y$ ). Approximate rule is to choose $y_h$ so that $W_h(y_h - y_{h-1})$ is constant. With optimal choice of boundaries var $\propto 1/L^2$ . Increasing L too far – within stratum variance dominates.
Conclusion: $L$ often small, < 6.
Cluster sampling
SRS and stratified sampling both need list of all experimental units, and if you have to visit them it can be expensive. Cluster sampling reduces problem by only sampling cluster of population, cheaper but higher standard errors (although usually lower for same cost)
Basic results of cluster sampling
$N$ (possibly unknown) experimental units, grouped in $C$ (known) clusters of size $M_i$ . $ \mu_Y = \frac{\sum_C M_i \bar{Y}_i}{\sum_C M_i} = \frac{\sum_C T_i}{\sum_C M_i} $ (with obvious estimator) $Var(\bar{y}) = \frac{1- f_1}{c\bar{M}^2} \frac{\sum_C M_i^2 (\bar{Y}_i - \mu_Y)^2}{C - 1}$ , which can be estimated with $\frac{1-f_1}{c\bar{m}^2}s_r^2$ , where $s_r^2 = \frac{\sum_C m_i^2 (\bar{y}_i - \bar{y})^2}{c-1}$ (using $\bar{M} = N / C$ if $N$ known). (remember t-values for CI(confidence interval)s depend on C, not N)To estimate $\tau$ we can treat clusters as sampling units and totals as unit values, giving $\tau = C\bar{m}\bar{y}$ , with standard error $C\sqrt{\frac{1-f_1}{c}}s_t$ .
Sampling with probability proportion to size (PPS)
Another way to estimate $\mu_Y$ is to select clusters with probability proportional to size, then $\bar{y}_pps = (\bar{y}_1 + ... + \bar{y}_c) / c$ , which is an unbiased estimator of $\mu_Y$ . Variance is harder to calculate but if the sampling fraction ($f_1$ ) is small, then we can pretend the clusters are drawn independently (with replacement).
\[ \hat{V}_pps(\bar{y}_pps) = \frac{1}{c}\left( \sum_1^c \frac{(\bar{y}_i - \bar{y}_pps)^2}{c-1} \right) \]In general, precision is high when cluster means are similar. PPS strategy tends to have slight edge of SRS in practice.
Special case: Equal cluster sizes
Both reduce to same formula for standard error, ie. $se(\bar{y}) = \sqrt{\frac{1-f}{c}}s_1$ where $s_1$ is the variance of the cluster means.
Special case: Estimating proportions
General formulae for estimator and standard errors don’t reduce much when estimating a population proportion.
\[ p = \frac{\sum_c m_i p_i}{\sum_c m_i} \] \[ se(p) = \sqrt{\frac{1-f_1}{c}} \frac{s_r}{\bar{m}} \]where
\[ s_r^2 = \sum_c m_i^2 \frac{(p_i - p)^2}{c-1} \]PPS estimator remains nearly unchanged
\[ \bar{\pi}_pps = \frac{\sum_c p_i}{c} \] \[ se(\bar{\pi}_pps) = \frac{s_p}{\sqrt{c}} \]where $s_p^2$ is the variance of the cluster proportions.
As above, both reduce in the case of equal cluster sizes.
Special case: Systematic sampling
Choose unit at random from first k units, and then every k units after that. Can think of type of cluster sampling where the clusters are the partion under mod k, and we select one cluster at random. Systematic sampling works well if trend is present (built-in stratification effect) and for time series output, but badly for periodic data when sampling interval is multiple of period. Operationally easy.
How to get the variance?
- take more than one systematic sample
- use SRS formula (overestimate)
- post-stratify (underestimate)
- build model for $Y$ as function of $i$ and use to suggest variance
Multistage sampling
As with cluster sampling, we select $c$ of $C$ clusters, but now instead of sampling all units in each cluster, we take a random sample. Most large surveys carried out this way.
Advantages:
- cost and speed
- convenience (only need list of clusters and individuals in selected clusters)
- usually more accurately than cluster for same total size
Disadvantages:
- less accurate than SRS of same size (but more accurate for same cost)
- further analysis is difficult
Basic results
\[ \hat{\mu}_R = \frac{\sum_C M_i \bar{y}_i}{\sum_C M_i} \]Let $f_i = \frac{c}{C}$ and $f_{2i} = \frac{m_i}{M_i}$ , then
\[ \hat{V}(\hat{\mu}_R) = \frac{1-f_1}{c} \sum_sample \frac{(M_i / \bar{M})^2 (\bar{y}_i - \hat{\mu}_R)^2}{c-1} + \frac{f_1}{c} \sum_sample \frac{(M_i / \bar{M})^2 (1-f_{2i})s^2_{2i}}{c m_i} \]If number of sampled clusters is reasonably large, then $\hat{\mu}_R$ is approximately normally distributed.
Note: If $f_1$ is very small, then $\hat{V}(\hat{\mu}_R) ~ s_1^2 / c$ . This result holds for more general subsampling schemes than SRS; only need a scheme with unbiased sample mean. Using systematic sampling is common.
Sampling with probability proportional to size
Similar to PPS for cluster sampling, and if $f_1$ is small can pretend we are sampling with replacement and treat clusters like individuals.
Performance is similar to SRS subsampling (but need to know $M_i$ for every cluster). Is intuitively appealing because if we take same-sized sample from every cluster then every unit has same chance of being selected.
Equal cluster sizes
If $M_i = M$ for all clusters then both estimators reduce to mean of cluster means.
If $m_i = m$ as well then variance reduces to:
$\hat{V}(\hat{\mu}_R) = (1-f_1)\frac{s_1^2}{c} + f_1(1-f_2)\frac{s_2^2}{cm}$ , where $s_1^2 = \sum_c \frac{(\bar{y}_i - \bar{y}) ^2}{c-1}$ , and $s_2^2 = \sum_c \frac{(y_ij - \bar{y}_{i.})^2}{m-1}$ .Can obtain these results from a standard one-way ANOVA where $s_1^2 = s_B^2 / m$ and $s_2^2 = s_w^2$
Estimating proportions
As with cluster sampling, formulae don’t simplify much. See formula sheet for details.
Optimal sub-sample sizes
For simplicity, we’ll only deal with equal cluster and sample sizes, when all estimators reduce to $\bar{y}$ . Suppose cost = $k_1 c \times k_2 cm$ . Variance of $\bar{y} = (1-f_1)\frac{\sigma^2_1}{c} + (1-f_2)\frac{\sigma^2_2}{cm}$ . Minimised when $m = \sqrt{\frac{k_1}{k_2}}\left( \frac{\sigma_2}{\sigma_u} \right)$ .
Stratified multistage sampling
In most large surveys first-stage sample will be stratified. Introduces no new problems, use results results above to estimate mean and se for each clutser, then weighted average to get overall results.
Using auxillary information in estimation
Some times can measure extra characteristics that have known population totals. We can often use this information to improve the precision of our estimate.
Ratio estimator
Suppose we draw SRS and obtain $y_1, y_2, ..., y_n$ for primary variable $Y$ and $x_1, x_2, ..., x_n$ for some other variable with known population mean $\mu_X$ . If $R = \mu_Y / \mu_X$ , then $\mu_Y = R \mu_X$ .
$\hat{\mu}_Y = \mu_X \frac{\bar{y}}{\bar{x}}$ $\hat{V}(\hat{\mu}_Y) = \frac{1-f}{n}s_r^2$ , where $s_r^2 = \sum \frac{(y_i - \bar{r}x_i)^2}{n-1}$Called ratio estimator and in large samples is approximately normally distributed.
Regression estimator
Extends ratio estimator to more general linear regression case.
$\hat{\mu}_LR = \bar{y} + \hat{\beta}(\mu_X - \bar{x})$ $\hat{V}(\hat{\mu}_LR) = \frac{1-f}{n} \sigma^2_Res$ , where $\sigma^2_Res$ is the usual residual sum of squares.Approximately normally distributed (remember less 2 degrees of freedom). Asymptotically has better variance than $\hat{\mu}_R$ or $\bar{y}$ .
Ratio and regression estimators in stratified sampling
Separate ratio estimator: estimator for each stratum and then combine using stratum weights.
Combined ratio estimator: form $\bar{y}_st$ and $\bar{x}_st$ and use to form ratio estimate
Both have $\hat{V}(\hat{\mu}_RC) = \sum { W_l^2 \frac{(1-f_l)}{n_l} s^2_l_rs}$ where $s^2_l_rs = \sum \frac{(y_l_j - \bar{r}x_l_j)^2}{n_l-1}$ .
Separate more efficient if sample sizes are large and slopes vary from stratum to stratum, combined if some of the stratum sizes are small.
Block structure
In experimental design set of treatments applied to set of EU(experimental units)’s. Treatments and experimental units can have variety of structures that determine the analysis. Notation
- n, number of observations
- k, number of blocks
- r, number of repititions within each block
Data model
As we use more complicated arrangements, helpful to represent data by a “data model”. For block structure of single set of eu’s, the data model is $y_i = \mu + e_ij$ . where $E[e_i] = 0$ , $Var[e_i]=\sigma^2$ , $Cov[e_i, e_j] = 0$ .
Significance test and confidence intervals require further assumptions about the error. Often assume i.i.d normal(0, $\sigma^2$ ).
Why use a data model?
- helps to determine analysis
- helps to understand what can be learned from experiment
- tells what can be estimated and how to evaluate precision of estimates
Block structure 1. Single set of eu’s
Structure: All eu’s in single set.
Data model: $y_i = \mu + e_ij$
Block structure 2. One-way classification.
Structure: Separate groups of eu’s, determined by characteristic of eu
Data model: $y_i = \mu_i + e_ij$
Use ANOVA to examine differences between blocks. Can estimate variance within gardens ($s^2$ , pooled variance) and between blocks ($\s_G^2 = \sum_i\frac{(y_{i.} - \hat{y_ii})^2}{k-1}$ ). If there are no differences between blocks, then the block means would be random samples of size k from a single distribution with variance $\sigma_G / r$ and $k\cdot s_G^2$ esimates $\sigma^2$ .
So the ratio $k s_G^2$ to $s^2$ measures variability among blocks relative to variability within blocks. We call this summary statistic F, and it has a known distribution which depends only on the numerator and denominator degrees of freedom.
This set of calculations usually displayed in ANOVA table.
Source | df | SS |
---|---|---|
Block | b-1 | \[r\sum_i(y_{i.} - y_{..})^2 = r\sum_i y_{i.}^2 - kry_{..}^2\] |
Error | $b(r-1) = n - b$ | \[\sum_i\sum_j (y_ij - y_{i.})^2 = \sum_i\sum_j y_ij^2 - r\sum_ij y_i.^2\] |
Total | $n-1$ | \[\sum_i\sum_j (y_ij - y_{..})^2 = \sum_i\sum_j y_ij^2 - kry_{..}^2 \] |
When blocking eu’s, we normally select them so that they are different, we aren’t testing the hypothesis there is no difference. In this case the F-test just provides confirmation they are indeed different and gives information on how large that variability is relative to variability within blocks.
ANOVA calculation rule: multipler applied to squared mean is number of observations included in that mean.
Intuitively, ANOVA separates total variability into two portions that represent the data’s structure: among blocks and within blocks.
Alternative data model: $y_i = \mu + g_i + e_ij$ . $y_ii$ estimates $\mu$ , $y_i. - y..$ estimates $g_i$ , $MS_e$ estimates $\sigma$ .
Unequal replication.
Equal replication is not required, or always achievable.
If $r_i$ equals number of replications in block i, then $n = \sum_i r_i$ , $SS_B = \sum_i r_i y_i^2 - ny_ii^2$ , $SS_E = \sum_i \sum_j y_ij^2 - \sum_i r_i y_i.^2$ .
Structure 3: Two-way classification.
Eu’s “cross-classified” by two factors.
Data model: $y_ij = \mu + a_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ . Assume $\sum a_i = 0$ , $\sum b_j = 0$ .
ANOVA: Block classified by two-way (by row and column) – we must calculate both row SS and col SS. No replication in this structure, so no “pure error” SS to calculate. However, we can obtain “residual SS” which can be treated as error SS under certain assumptions.
Source | df | SS |
---|---|---|
Rows | a-1 | \[b\sum_i y_{i.}^2 - aby_{..}^2\] |
Columns | b-1 | \[a\sum_j y_{.j}^2 - aby_{..}^2\] |
Error | (a-1)(b-1) | subtraction |
Total | ab-1 | \[\sum_i \sum_j y_ij^2 - aby_{..}^2\] |
Expected value of $y_ij$ estimated by $\hat{\mu} + \hat{a_i} + \hat{b_j}$ = $y_{..} + (y_{i.} - {y_{..}}) + (y_{.b} - {y_{..}})$ = $y_{i.} + y_{.j} - {y_{..}}$ . Residual = $y_ij - y_{i.} - y_{.j} + y_{..}$ .
Hidden replication.
In two-way classification with one eu per block, there is no true replication. However, we have hidden replication in the following sense, because each a difference is replicated b times and (vice versa).
Structure 4. Two-way classification with replication.
Must randomise order or all combinations.
Notation: $y_ijk$ is row i, col j and observation k.
Data model: $\mu + a_i + b_j + (ab)_ij + e_ijk$ , $e_ijk ~ N(0, \sum^2)$ . $(ab)_ij$ is an additional term denoting “interaction” between A and B.
Structure 5. Nested factors.
Consider two blocking factors A and B. If the levels of B are different for each level of A then we have a nested factor structure.
Data model: $y_ijk = \mu + a_i + b_i,j + e_ijk$ , where $b_i,j$ denotes effect of factor b in factor a.
Can do ANOVA, and ANOVA breaking down nested factor.
Variance component models.
Consider again the one-way classification situation ($y_ij + \mu + b_i + e_ij$ ). There are situations where blocks are picked randomly rather than deliberately. $b_i ~ NID(0, \sigma_b^2$ and $e_ij ~ NID(0, \sigma^2$ , also $b_i$ and $e_ij$ are independent. Analysis objective is now to estimate two “variance components” in this model. (eg. Is $\sigma_b^2$ too large and causing quality problems).
In ANOVA, $E[MS_B] = r(\sigma_b^2 + \frac{\sigma^2}{r}) = r\sigma_b^2 + \sigma^2)$ and $E[MS_E] = \sigma^2$ , so is possible to calculate each variance component.
In some situations can be difficult to decide whether a block should be regarded as random or fixed. Can be a controversial issue. The litmus test: do we care about individual blocks or not? (yes = fixed, no = random)
Treatment structure
Like block structure, treatment structure can involve one factor, or multiple factors, crossed or nested.Single treatment factor.
Experimental objective: evaluate and compare effects of single treatment factor (with k levels) on response.
Method: Randomly assign treatments to EU(experiemental units)’s, a completely randomised design.
Analysis: Same as for one-way blocking structure, but we are now interested in p-value for ANOVA. The F-test is just a screening test, provides evidence (when significant) that means differ more than could be expected by chance. Further analysis is need to identify how.
Assumptions underlying F-test
Need to check assumptions before we go too far:
- errors normally distributed
- variance of errors same for all treatments
- also check for sequence effects and effects of other variables
Checking for normality. Estimate $\mu_i$ by $\hat{y_{i.}}$ . Residual is $y_ij - \hat{y_{i.}}$ . Check for normality with q-q plot, shapiro-wilk test for numerical summary.
Data anomalies (eg. outliers, clusters) generally due to “lurking variables”. It’s not often not that normality assumption has been violated, but identical distributed violated because actually a mix of two or more data sets. Up to experiementer and others involved to think about “lurking variables” and design experiment and protcol so they won’t invalidate the experiment.
Checking for equal variances. Graphical display (eg. from minitab, overlapping cofidence intervals indicate assumption ok), or formal statistical tests (eg. Bartlett’s, Levene’s).
Comparing treatment means
- graphically
- significance test and confidence intervals * for individual treatment means * for differences between 2 treatment means * for “contrasts” of treatment means
- multiple comparisons
Confidence interval for a treatment mean, $\mu_i$
Treatment mean: $y_i$ Standard error of mean: $SE = \sqrt{\frac{MS_E}{r}}$ , $df = df_E$ . From table obtain $t = t_{0.25}(df_E)$ . 95% CI = $y_i \pm t \cdot SE$ .
Meaning of confidence interval: if our conclusions are unchanged over the entire range of the confidence interval then we have a definitive result, otherwise we need more data.
Graphical comparison.
Plot treatment means on dot plot. Draw bell-shaped curve centered at $y_{..}$ with centre 95% spanning $\pm t \cdot SE$ . Interpret by mentally sliding curve along chart. Provides indication of which means could be from same distribution and which stand out. Minitab has a function which does this automatically.
“Formal” statistical comparison
Goal: control the type I error rate ($\alpha$ ) when comparing pairs of treatment means.
Problem: Non-independence in multiple comparisons among same means, makes it difficult to evaluate type I error rate.
T-test: $SED = \sqrt{2 \cdot MS_E / r}$ . Can’t just use t-test, because if $\alpha = 0.05$ for comparison of two means, if we compare multiple means, the probability that at least of those of comparisons is significant is $\alpha \geq 0.05$ . Therefore, multiple use of t-test will find too many significant differences.
Tukey’s studentised range. Basic idea: if all means are equal then ${y_i}$ are all from same distribution. Look at range of sample ($max(y_{i.}) - min(y_{i.})$ ), “studentise” by dividing by SE. $q = (max(y_{i.}) - min(y_{i.}))/ \sqrt{MS_E / r}$ . Derive reference distribution of q (depends only on means and $df_E$ ) and compare to observed value of q. Interpretation: look up $q_{k; df_E, \alpha}$ and multiply by SE, any differences larger than this are significant.
Duncan’s multiple range test. Once you’ve decided one group is significant different, remove and compare remaining means.
Bonferroni. TSR required special tables. Bonferroni is alternative that only requires t tables. Number of tests that can be compared is $m = \frac{k(k-1)}{2}$ , so do t-test using $\alpha = 0.05 / m$ . Overall type I error rate will be less than 0.05.
Can express all in similar terms by calculating 95% CI. Bonferroni only slightly conservative relative to TSR.
Unequal replication.
If number of replications in each treatment ($r_i$ ) is not consant, but $r_i$ ’s don’t vary too much, apply previous techniques, using $\bar r_. = average(r_i)$ or $\bar r_. = harmonic average(r_i)$
Sizing an experiment: How many EU’s?
Sample size questions often most difficult to answer statistically – requires some idea of unknown parameters that you want to estimate, and costs and benefits to define reasonablle goals and trade-offs.
Approach one: Using CI(confidence interval)s.
- Specify some criterion related to precision of results. (eg. $LSD = \frac{t_{0.025}(k(r-1))}{\sqrt{2 \cdot MS_E / r}}$ ).
- Evaluate that criterion as function of r (assume t-value ~ 2 for large r).
- Specify threshold.
- Determine r that meets threshold level.
Once we state the problem we can solve for required number of replications. Big problem is deciding on quantitative threshold.
Approach two: using a decision rule
Eg. Economic analyis has shown that if fertiliser B yields at least 2 pounds more per plant than fertiliser A, fertiliser B will be economically justified. Investigator wants to be sure of 2 lb. difference, so decision rule is set: switch to fertiliser B if lower 95% confidence interval on $(\mu_B - \mu_A)$ is at least 2. Recognises and accounts for uncertainty in estimating $(\mu_B - \mu_A)$ – good statistical practice.
Rule: select B if $y_{B.} - y_{A.} - t_{.05}(k(r-1))SED \ge 2$ , where $SED = \sqrt{2MSE_r / r}$ . Treat $\sigma$ as known based on pilot experiment, use $\sqrt{MSE} = \sigma$ , let $\delta = \mu_B - \mu_A$ ; denote $d = y_{B.} - y_{A.}$ , then $d ~ N(\delta, \sigma^2(2/r))$ .
Probability that data will satifisfy decision rule is a function of $\delta$ and $r$ . $P(choose B) = P(z \ge \frac{2-\delta}{\delta(2/r)} + 1.65)$ . Draw curves for differing values of $\delta$ and $r$ (operating characteristic curves) and choose appropriate. Note that more replications can detect a smaller difference in $\delta$ . Can extend by recognising that $\sigma$ is also estimated, and account for the df’s need to estimate $\sigma$ .
Other comparisons of treatment means
Context-driven comparisons. Eg. deoderant manufacturer claims that my deoderant lasts twice as long as brand X. Experiment:
- summary statistics: $y_{A.}$ , $y_{X.}$ , $MS_E$
- $H_0 = \mu_A = 2\mu_X$
- $var(y_{A.} - 2y_{X.}) = \sigma^2(1/r + 4/r) = 5\sigma^2 / r$
- test statistic: $t = \frac{y_{A.} - 2y_{X.}}{5\sigma^2 / r}$
Special comparisons: contrasts
Contrast – linear combination of treatment means $C = \sum_{i=1}^k c_i y_i$ , with $\sum_{i=1}^k c_i = 0$ .Contrasts are orthogonal if dot-product of coefficient vectors = 0. If a treatment has k levels, we can find k-1 independent or orthogonal contrasts. Can be tested separately, each with 1 degree of freedom.
Treatment structures with two or more factors
Write data models as for block structures. Crossed factors much more common than nested factors.
Now important to examine interaction – use interaction plot.
Experimental designs
Three components of experimental design:
- block structure: organisation etc. of experimental units
- treatment structure: reflect questions expt set up to answer
- experimental design: allocation of treatments to EU(experimental unit)s
Many possible experimental designs, with same two goals:
- treatment comparisons as precise as possible
- estimates free of bias and inferences valid
Completely randomised design
Blocks: single group Treatments: $t$ treatments, any structure possible Allocation: randomly assign treatment $i$ to $r_i$ EU’s
Randomised complete block design
Block structure: $b$ groups Treatments: $t$ treatments, any structure possible Allocation: randomly assign each treatment to one EU in each block
Model: $y_ij = \mu + t_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ .
Advantages: cost, block means cancel out when comparing treatments Assumptions: no interaction between treatment and block, error distribution same for all treatments and blocks
Test for interaction
Possible to test for interaction using Tukey’s test which detects curvilinear relationship between $y - \hat{y}$ and $\hat{y}$ (transformable non-additivity).
- Calculate residuals
- Calculate square predicted values $q_ij = (y_{i.} + y_{.j} - y_{..})^2$
- Calculate residuals for $q$ ’s
- Plot residuals vs. q-residuals. Interaction will show up as linear trend
- Calculate $P = \sum(q-resid)(resid)$ , $Q = \sum(q-resid)^2$ , slope = $P / Q$ , $SS_slope = P^2 / Q$
- Partition error SS in anova into non-add (slope) and diff, perform F-test
Randomised complete block design with replication
Model: $y_ij = \mu + t_i + b_j + (tb)_ij + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ .
Can relax assumption of no block-treatment interaction (other possibilities: live with it, you less divergent blocks). Have $b$ blocks of $r \times t$ units, each recieving one randomly assigned treatment. Can now separate true replication from interaction, and have more sensitivity for testing significance of differences. However, will be more expensive.
In industrial expt, usually minimise expt’s and maximise factors, assume low variability because well-controlled processes: RCBD with replication not appropriate. In social and life sciences, usually have fewer factors and more error: RCBD with replication useful and appropriate
Repeated measures design
Repeated measures taken on same EU. Has two-way structure similar to RCBD, but subject is not group of EU’s with randomly applied treatments, nevertheless can still be appropriate to use same analysis.
Balanced imcomplete block design
In RCBD designs number of EU’s in a block constrained to be multiple of number of treatments (all treatments replicated in each block). What do we do if block size is smaller than number of treatments? BIBD.
Model: $y_ij = \mu + t_i + b_j + e_ij$ , $e_ij ~ NID(0, \sigma^2)$ . Same model as RCBD, but not all combinations included so need different analysis: use regression model to correct for effects of correlation.
Hidden power: if A & B are paired $n$ times, estimate of their difference is as precise if they were measure $2n$ times! Relies on assumptions of block-treatment independence and careful balancing.
$b$ blocks of $k$ EU’s, $t$ treatments occuring $r$ times, $bk = tr$ . Each pair of treatments occurs in $\lambda = r(k-1) / (t-1)$ blocks Relative efficiency: $e = t\lambda / rk$When we can model blocks as random effects, block totals can be analysed to provide another set of estimated treatment effects, an inter-block analysis. Indepedent of above analysis (intra-block analysis). Weighted average of the two can be more precise than either set separately.
Latin square designs
Three factors (usually 2 blocking, 1 treatment) with same number of levels. Want to compare treatments removing variability from blocks. RCBD would need 64 runs for factors with four levels, can we make do with less? Latin square only needs 16, one treatment tested for each combination of blocks in such a way that effects can be separated and eliminated.
Model: $y_ij = \mu + t_i + r_j + c_k + e_ijk$ , $e_ij ~ NID(0, \sigma^2)$ Key assumption: no interaction between rows, columns and treatments.
Error term in model ($e_ijk$ ) represents variability if same conditions repeated independently – not true error variance, but can estimate it assuming no interaction. However, because is type of factorial design if we decide one factor is unimportant we can collapse the design and analyse without (sim. to stepwise regression, controversial if overdone). Analyse using standard ANOVA: means can be calculated directly because all other effects will cancel out.