Treatment structure

Like block structure, treatment structure can involve one factor, or multiple factors, crossed or nested.

Single treatment factor.

Experimental objective: evaluate and compare effects of single treatment factor (with k levels) on response.

Method: Randomly assign treatments to EU(experiemental units)’s, a completely randomised design.

Analysis: Same as for one-way blocking structure, but we are now interested in p-value for ANOVA. The F-test is just a screening test, provides evidence (when significant) that means differ more than could be expected by chance. Further analysis is need to identify how.

Assumptions underlying F-test

Need to check assumptions before we go too far:

errors normally distributed
variance of errors same for all treatments
also check for sequence effects and effects of other variables

Checking for normality. Estimate $\mu_i$ by $\hat{y_{i.}}$ . Residual is $y_ij - \hat{y_{i.}}$ . Check for normality with q-q plot, shapiro-wilk test for numerical summary.

Data anomalies (eg. outliers, clusters) generally due to “lurking variables”. It’s not often not that normality assumption has been violated, but identical distributed violated because actually a mix of two or more data sets. Up to experiementer and others involved to think about “lurking variables” and design experiment and protcol so they won’t invalidate the experiment.

Checking for equal variances. Graphical display (eg. from minitab, overlapping cofidence intervals indicate assumption ok), or formal statistical tests (eg. Bartlett’s, Levene’s).

Comparing treatment means

graphically
significance test and confidence intervals * for individual treatment means * for differences between 2 treatment means * for “contrasts” of treatment means
multiple comparisons

Confidence interval for a treatment mean, $\mu_i$

Treatment mean: $y_i$ Standard error of mean: $SE = \sqrt{\frac{MS_E}{r}}$ , $df = df_E$ . From table obtain $t = t_{0.25}(df_E)$ . 95% CI = $y_i \pm t \cdot SE$ .

Meaning of confidence interval: if our conclusions are unchanged over the entire range of the confidence interval then we have a definitive result, otherwise we need more data.

Graphical comparison.

Plot treatment means on dot plot. Draw bell-shaped curve centered at $y_{..}$ with centre 95% spanning $\pm t \cdot SE$ . Interpret by mentally sliding curve along chart. Provides indication of which means could be from same distribution and which stand out. Minitab has a function which does this automatically.

“Formal” statistical comparison

Goal: control the type I error rate ($\alpha$ ) when comparing pairs of treatment means.

Problem: Non-independence in multiple comparisons among same means, makes it difficult to evaluate type I error rate.

T-test: $SED = \sqrt{2 \cdot MS_E / r}$ . Can’t just use t-test, because if $\alpha = 0.05$ for comparison of two means, if we compare multiple means, the probability that at least of those of comparisons is significant is $\alpha \geq 0.05$ . Therefore, multiple use of t-test will find too many significant differences.

Tukey’s studentised range. Basic idea: if all means are equal then ${y_i}$ are all from same distribution. Look at range of sample ($max(y_{i.}) - min(y_{i.})$ ), “studentise” by dividing by SE. $q = (max(y_{i.}) - min(y_{i.}))/ \sqrt{MS_E / r}$ . Derive reference distribution of q (depends only on means and $df_E$ ) and compare to observed value of q. Interpretation: look up $q_{k; df_E, \alpha}$ and multiply by SE, any differences larger than this are significant.

Duncan’s multiple range test. Once you’ve decided one group is significant different, remove and compare remaining means.

Bonferroni. TSR required special tables. Bonferroni is alternative that only requires t tables. Number of tests that can be compared is $m = \frac{k(k-1)}{2}$ , so do t-test using $\alpha = 0.05 / m$ . Overall type I error rate will be less than 0.05.

Can express all in similar terms by calculating 95% CI. Bonferroni only slightly conservative relative to TSR.

Unequal replication.

If number of replications in each treatment ($r_i$ ) is not consant, but $r_i$ ’s don’t vary too much, apply previous techniques, using $\bar r_. = average(r_i)$ or $\bar r_. = harmonic average(r_i)$

Sizing an experiment: How many EU’s?

Sample size questions often most difficult to answer statistically – requires some idea of unknown parameters that you want to estimate, and costs and benefits to define reasonablle goals and trade-offs.

Approach one: Using CI(confidence interval)s.

Specify some criterion related to precision of results. (eg. $LSD = \frac{t_{0.025}(k(r-1))}{\sqrt{2 \cdot MS_E / r}}$ ).
Evaluate that criterion as function of r (assume t-value ~ 2 for large r).
Specify threshold.
Determine r that meets threshold level.

Once we state the problem we can solve for required number of replications. Big problem is deciding on quantitative threshold.

Approach two: using a decision rule

Eg. Economic analyis has shown that if fertiliser B yields at least 2 pounds more per plant than fertiliser A, fertiliser B will be economically justified. Investigator wants to be sure of 2 lb. difference, so decision rule is set: switch to fertiliser B if lower 95% confidence interval on $(\mu_B - \mu_A)$ is at least 2. Recognises and accounts for uncertainty in estimating $(\mu_B - \mu_A)$ – good statistical practice.

Rule: select B if $y_{B.} - y_{A.} - t_{.05}(k(r-1))SED \ge 2$ , where $SED = \sqrt{2MSE_r / r}$ . Treat $\sigma$ as known based on pilot experiment, use $\sqrt{MSE} = \sigma$ , let $\delta = \mu_B - \mu_A$ ; denote $d = y_{B.} - y_{A.}$ , then $d ~ N(\delta, \sigma^2(2/r))$ .

Probability that data will satifisfy decision rule is a function of $\delta$ and $r$ . $P(choose B) = P(z \ge \frac{2-\delta}{\delta(2/r)} + 1.65)$ . Draw curves for differing values of $\delta$ and $r$ (operating characteristic curves) and choose appropriate. Note that more replications can detect a smaller difference in $\delta$ . Can extend by recognising that $\sigma$ is also estimated, and account for the df’s need to estimate $\sigma$ .

Other comparisons of treatment means

Context-driven comparisons. Eg. deoderant manufacturer claims that my deoderant lasts twice as long as brand X. Experiment:

summary statistics: $y_{A.}$ , $y_{X.}$ , $MS_E$
$H_0 = \mu_A = 2\mu_X$
$var(y_{A.} - 2y_{X.}) = \sigma^2(1/r + 4/r) = 5\sigma^2 / r$
test statistic: $t = \frac{y_{A.} - 2y_{X.}}{5\sigma^2 / r}$

Special comparisons: contrasts

Contrast – linear combination of treatment means $C = \sum_{i=1}^k c_i y_i$ , with $\sum_{i=1}^k c_i = 0$ .

Contrasts are orthogonal if dot-product of coefficient vectors = 0. If a treatment has k levels, we can find k-1 independent or orthogonal contrasts. Can be tested separately, each with 1 degree of freedom.

Treatment structures with two or more factors

Write data models as for block structures. Crossed factors much more common than nested factors.

Now important to examine interaction – use interaction plot.