Repeated measures
Any dataset in which subjects are measured repeatedly over time can be described as repeated measure data. Can be made at pre-determined times or in an uncontrolled fashion. This will determine the types of analysis available.
Fixed effect approaches
- Analyse mean response over time: Satisfactory if overall treatment effect
- is of interest, and no missing data. Does not give any information on
- treatment time interaction.
- Separate analyses at each time point: treatment se correctly estimated at
- between-patient level. Multiple comparisons may lead to spurious
- significance, and tests may be correlated. Treatment se less accurate
- because only use observations at one time point.
- Analyse response factors: can generate summary values for each patient,
- must be careful with missing values, and multiple testing
- Analyse raw data with fixed patient effects: Gives same result as mixed
- model if no missing data. Treatment se hard to calculate.
Mixed model approaches
Advantages:
- single model can estimate overall treatment effects and effect at each
- time point
- treatment se at individual timepoints use all information available
- no problems with MAR missing data
- covariance pattern can be determined and taken account of
There are several ways to use mixed models. The simplest is to use a random effects model with patient effects fitted as random. This allows for constant correlation between all observations on same patient – but this is often not the case. Use a covariance pattern or random coefficients model instead.
Covariance pattern models
Define covariance structure directly rather than using random effects. Observations within each category of a blocking effect assumed to have same covariance structure. Defined as block diagonal matrix $\mathbf{R} = .
Covariance patterns
Large selection of covariance patterns available. Most depend on observations being taken at fixed times, and some are more easily justified when observations are evenly spaced. Some patterns take into account exact value of time, and are best used when intervals are irregular.
Some simple patterns are:
- general: no restrictions
- Toeplitz: constant variance, measurements $t$ steps apart have covariance
- $\theta_t$
- first order autoregressive: constant variance, measurements $t$ steps
- apart have correlation $\rho^t$
- compound symmetry: all variances = $\sigma^2$ , all covariances = $\theta$
- (equivalent to random effects model)
If variability in a measurement is different at each time point, we can obvious heretogeneous generalisations. Can also fit a separate covariance structure to each treatment group.
If widely separated observations appear to be uncorrelated, can create a banded covariance structure by setting all covariances more than $t$ steps apart to 0. Can be done to any covariance pattern, and reduces number of variance components to be estimated.
Covariance patterns using the exact separation in time also exist (eg. power $r_ijk = \sigma^2 \rho^{d_ijk}$ ). Useful when time points do not occur at fixed intervals. Most cause covariance to decrease exponentially with increasing distance.
Which covariance pattern should be used?
Want to choose the covariance pattern that best fits the true covariance pattern – not easy! Increasing number of parameters will improve fit, but will lead to over-fitting. Can test using likelihood-ratio test (provided models are nested), or by comparing goodness of fit statistics adjusted for number of parameters (eg. AIC = $log(L) - q$ or BIC = $log(L) - q ).
Which patterns should be considered? Not usually practical to test large numbers. Can either start with simplest and work up, or use general to get some idea of what it should look like. Often covariance pattern will make little difference to estimate of treatment effect and standard errors, in this case compound symmetry is reasonable (roughly check by comparing to general) or you could just use the empirical estimator.
General points
Missing data occurs frequently in repeated measures experiments. Less of a problem with specified covariance structure as patient with only a few observations still influence others through the covariance pattern. Still need to be careful.
Significance testing should be performed using F-tests and Satterthwaites df. If Satterthwaites not available, use patient df to compare treatment effects (will be conservative).
Fixed effect standard estimates will be biased downward as the covariance parameters are estimated, not known. Can use robust ‘emprical’ variance estimator as described previously, but this will ignore model specification and use covariance pattern from data.
Residuals assumed $~N(\mathbf{0}, \mathbf{R})$ . Difficult to check formally, but plots of residuals should be sufficient to identify any major outliers or deviations from normality.
Random coefficients models
Random coefficient models develop an explicit relationship between the measurement and time. The most common model is linear with time, and interested if the slope differs between treatment groups. Usually fit time and treatment.time as fixed effects and patient and patient.time as random to allow patients to randomly vary around the treatment mean.
Fitting polynomial models to the data can be accomplished by successively adding polynomials of higher order (as fixed and random effects) until the variance component becomes negative (for random effects) or effect becomes non-significant (fixed effects).
General points
If negative variance component estimate obtained, refit the model without that component. However, not all software will produce negative variance components. In SAS non-convergence or a non-semi-definite G-matrix are signs of a negative variance component. In this case, remove components in order of complexity until problem resolved.
- baseline measurement can be specified as a fixed effect (covariate) or
- as the first repeated measurement at time 0.
- patient estimates of slope and intercept can be calculated and will be
- “shrunken” towards population values
- significance testing proceeds as for covariance pattern models
- check residuals are normal by plotting against predicted values
- difficult to check assumption that random coefficients are distributed
- MVN, but usual plots should pick up any major
- departures
Sample size estimation
Often calculated as simple-between patient trial, because of correlation between repeated measures, actually require fewer patients. Obviously won’t know covariance pattern in advance, but compound symmetry pattern will probably be adequate. If no estimate of within-patient correlation is known, a conservative prediction of correlation could be used.
$Var(\sum_j y_ij) = m \sigma^2_p (1 + (m -1)\rho)$ $Var(\bar{y}_i) = \sigma^2_p ( 1 + (m-1)\rho)/m$ $SE(t_i - t_j) = 2\sqrt{Var(\bar{y}_i)}/n$with $m$ repeated measurements, $n$ patients in each group, $\sigma^2_p$ between patient variance, $\rho$ correlation between measurements.