Multistage sampling
As with cluster sampling, we select of clusters, but now instead of sampling all units in each cluster, we take a random sample. Most large surveys carried out this way.
Advantages:
- cost and speed
- convenience (only need list of clusters and individuals in selected clusters)
- usually more accurately than cluster for same total size
Disadvantages:
- less accurate than SRS of same size (but more accurate for same cost)
- further analysis is difficult
Basic results
\[ \hat{\mu}_R = \frac{\sum_C M_i \bar{y}_i}{\sum_C M_i} \]Let and , then
\[ \hat{V}(\hat{\mu}_R) = \frac{1-f_1}{c} \sum_sample \frac{(M_i / \bar{M})^2 (\bar{y}_i - \hat{\mu}_R)^2}{c-1} + \frac{f_1}{c} \sum_sample \frac{(M_i / \bar{M})^2 (1-f_{2i})s^2_{2i}}{c m_i} \]If number of sampled clusters is reasonably large, then is approximately normally distributed.
Note: If is very small, then . This result holds for more general subsampling schemes than SRS; only need a scheme with unbiased sample mean. Using systematic sampling is common.
Sampling with probability proportional to size
Similar to PPS for cluster sampling, and if is small can pretend we are sampling with replacement and treat clusters like individuals.
Performance is similar to SRS subsampling (but need to know for every cluster). Is intuitively appealing because if we take same-sized sample from every cluster then every unit has same chance of being selected.
Equal cluster sizes
If for all clusters then both estimators reduce to mean of cluster means.
If as well then variance reduces to:
, where , and .Can obtain these results from a standard one-way ANOVA where and
Estimating proportions
As with cluster sampling, formulae don’t simplify much. See formula sheet for details.
Optimal sub-sample sizes
For simplicity, we’ll only deal with equal cluster and sample sizes, when all estimators reduce to . Suppose cost = . Variance of . Minimised when .
Stratified multistage sampling
In most large surveys first-stage sample will be stratified. Introduces no new problems, use results results above to estimate mean and se for each clutser, then weighted average to get overall results.