Repeated measures

Any dataset in which subjects are measured repeatedly over time can be described as repeated measure data. Can be made at pre-determined times or in an uncontrolled fashion. This will determine the types of analysis available.

Fixed effect approaches

Analyse mean response over time: Satisfactory if overall treatment effect
is of interest, and no missing data. Does not give any information on
treatment time interaction.
Separate analyses at each time point: treatment se correctly estimated at
between-patient level. Multiple comparisons may lead to spurious
significance, and tests may be correlated. Treatment se less accurate
because only use observations at one time point.
Analyse response factors: can generate summary values for each patient,
must be careful with missing values, and multiple testing
Analyse raw data with fixed patient effects: Gives same result as mixed
model if no missing data. Treatment se hard to calculate.

Mixed model approaches

Advantages:

single model can estimate overall treatment effects and effect at each
time point
treatment se at individual timepoints use all information available
no problems with MAR missing data
covariance pattern can be determined and taken account of

There are several ways to use mixed models. The simplest is to use a random effects model with patient effects fitted as random. This allows for constant correlation between all observations on same patient – but this is often not the case. Use a covariance pattern or random coefficients model instead.

Covariance pattern models

Define covariance structure directly rather than using random effects. Observations within each category of a blocking effect assumed to have same covariance structure. Defined as block diagonal matrix $\mathbf{R} = .

Covariance patterns

Large selection of covariance patterns available. Most depend on observations being taken at fixed times, and some are more easily justified when observations are evenly spaced. Some patterns take into account exact value of time, and are best used when intervals are irregular.

Some simple patterns are:

general: no restrictions
Toeplitz: constant variance, measurements $t$ steps apart have covariance
$\theta_t$
first order autoregressive: constant variance, measurements $t$ steps
apart have correlation $\rho^t$
compound symmetry: all variances = $\sigma^2$ , all covariances = $\theta$
(equivalent to random effects model)

If variability in a measurement is different at each time point, we can obvious heretogeneous generalisations. Can also fit a separate covariance structure to each treatment group.

If widely separated observations appear to be uncorrelated, can create a banded covariance structure by setting all covariances more than $t$ steps apart to 0. Can be done to any covariance pattern, and reduces number of variance components to be estimated.

Covariance patterns using the exact separation in time also exist (eg. power $r_ijk = \sigma^2 \rho^{d_ijk}$ ). Useful when time points do not occur at fixed intervals. Most cause covariance to decrease exponentially with increasing distance.

Which covariance pattern should be used?

Want to choose the covariance pattern that best fits the true covariance pattern – not easy! Increasing number of parameters will improve fit, but will lead to over-fitting. Can test using likelihood-ratio test (provided models are nested), or by comparing goodness of fit statistics adjusted for number of parameters (eg. AIC = $log(L) - q$ or BIC = $log(L) - q ).

Which patterns should be considered? Not usually practical to test large numbers. Can either start with simplest and work up, or use general to get some idea of what it should look like. Often covariance pattern will make little difference to estimate of treatment effect and standard errors, in this case compound symmetry is reasonable (roughly check by comparing to general) or you could just use the empirical estimator.

General points

Missing data occurs frequently in repeated measures experiments. Less of a problem with specified covariance structure as patient with only a few observations still influence others through the covariance pattern. Still need to be careful.

Significance testing should be performed using F-tests and Satterthwaites df. If Satterthwaites not available, use patient df to compare treatment effects (will be conservative).

Fixed effect standard estimates will be biased downward as the covariance parameters are estimated, not known. Can use robust ‘emprical’ variance estimator as described previously, but this will ignore model specification and use covariance pattern from data.

Residuals assumed $~N(\mathbf{0}, \mathbf{R})$ . Difficult to check formally, but plots of residuals should be sufficient to identify any major outliers or deviations from normality.

Random coefficients models

Random coefficient models develop an explicit relationship between the measurement and time. The most common model is linear with time, and interested if the slope differs between treatment groups. Usually fit time and treatment.time as fixed effects and patient and patient.time as random to allow patients to randomly vary around the treatment mean.

Fitting polynomial models to the data can be accomplished by successively adding polynomials of higher order (as fixed and random effects) until the variance component becomes negative (for random effects) or effect becomes non-significant (fixed effects).

General points

If negative variance component estimate obtained, refit the model without that component. However, not all software will produce negative variance components. In SAS non-convergence or a non-semi-definite G-matrix are signs of a negative variance component. In this case, remove components in order of complexity until problem resolved.

baseline measurement can be specified as a fixed effect (covariate) or
as the first repeated measurement at time 0.
patient estimates of slope and intercept can be calculated and will be
“shrunken” towards population values
significance testing proceeds as for covariance pattern models
check residuals are normal by plotting against predicted values
difficult to check assumption that random coefficients are distributed
MVN, but usual plots should pick up any major
departures

Sample size estimation

Often calculated as simple-between patient trial, because of correlation between repeated measures, actually require fewer patients. Obviously won’t know covariance pattern in advance, but compound symmetry pattern will probably be adequate. If no estimate of within-patient correlation is known, a conservative prediction of correlation could be used.

$Var(\sum_j y_ij) = m \sigma^2_p (1 + (m -1)\rho)$ $Var(\bar{y}_i) = \sigma^2_p ( 1 + (m-1)\rho)/m$ $SE(t_i - t_j) = 2\sqrt{Var(\bar{y}_i)}/n$

with $m$ repeated measurements, $n$ patients in each group, $\sigma^2_p$ between patient variance, $\rho$ correlation between measurements.