Logistic regression

Binary data

Grouped or ungrouped (in R, use tapply to go from ungrouped to grouped). Grouped data has form covariates, number of successes ($r$ ) and number of trials ($n$ ). Probability of success = $r / n$ . This is the binomial distribution!

Relationship between response and covariates $ Y = \frac{e^{model formula}}{1+ e^{model formula}}$ . (Logit used to avoid nasty boundary problems). Called logistic regression. Estimate model coefficients using maximum likelihood (numerical method used in R: iteratively reweighted least squares), implemented by glm).

Grouped: glm(cbind(sucesses, failures) ~ covariates, family = binomial) Ungrouped: glm(response ~ covariates, family = binomial)

Probability, odds and log-odds

Probability: $\pi = e^{\alpha + \beta x} / (1 + e^{\alpha + \beta x})$ Odds: $\pi / (1 - \pi) = e^{\alpha + \beta x}$ . Log odds: $\alpha + \beta x$ .

$\beta$ measures effect of unit change of x: increases log odds by $\beta$ , increases odds by factor of $e^\beta$ .

Prediction

Probability: predict(model, new.data, type="response", se.fit =T) Log odds: predict(model, new.data, se.fit=T)

Plotting estimated probability

Easiest with grouped data: plot(response, r/n) lines(response, predict(model, data.frame, type="response")) A bit harder with ungrouped: plot(response, occurred) lines(sort(response), predict(model, sort(data.frame), type="response"))

Inference

Providing there’s enough data, estimated coefficients will be approx. normal. ML gives ways of estimating standard errors. Therefore tests and confidence intervals proceed exactly as for linear regression.

Deviance

Model had 2 parts: binomial assumption ($r ~ Bin(n, \pi)$ ) and logistic assumption (logit of $\pi$ is linear). If we only make binomial assumption, we have most general model, since we put no restrictions on the probabilities. Maximising this (ie. $\pi_i = r_i / n_i$ ) gives maximum value of likelihood, $L_max$ . If we compare this to the likelihood from logistic model, $L_mod$ , then residual deviance = $2(log L_max - log L_mod)$ ~ $\chi^2_{m-k-1}$ , where m is number of covariate patterns, k number of covariates. If 1 - pchisq(deviance, df) small, model is significantly different from maximal model. (Only applies to grouped data, with small m and large n)

Other extreme, most restrictive model has all probabilities $\pi_i = \pi = r / n$ . Deviance of this model is called null deviance. Use to see if any model will be useful.

Also use deviances when comparing models. If two models fit the data equally well, then the difference in deviances will be $\chi^2_{df difference}$ . (Works for both grouped and ungrouped data)

Residuals

Two kinds:

Pearsons: useful for grouped only, similar to residuals in linear regression (actual – fitted), standardised to unit variance. $\frac{r_i - n_i \hat{\pi}_i}{\sqrt{n_i \hat{\pi}_i(1 - \hat{\pi})}}$ . residuals(model.glm, type="pearson")
Deviance: useful for both, measure contribution of each covariate pattern to total deviance. Big value indicates large influence. residuals(model.glm, type="deviance")

Diagnostics

Influential points:

Hat matrix diagonals: defined similarly as in linear models. Measures how outlying covariates are, but down-weighted according to estimated probability of observation
Cook’s distance
Leave-one-out deviance change

Non-planarity

For single x, plot $logit(r_i / n_i)$ vs. x For multiple x, plot Pearson residuals vs. each x For either, use GAM.

If covariate groups mostly have only one member, not much can be done.

Over/under dispersion

Variance of binomial distribution is $np(1-p)$ , which is always less than the mean. Sometimes the individuals with the same covariate pattern are positively (or negatively) correlated, this will result in the variance being greater (or less) than $np(1-p)$ . This is called over (or under) dispersion – and standard errors will be correct. Can use quasibinomial family which estimates scale parameter from data.

Other points

ANOVA

Construction and interpretation same as for linear ANOVA, but in terms of log-odds rather than means. Remember to convert parameter estimates to odds form!

Variable selection

Proceeds as for linear. AIC = $deviance - 2 * (number of parameters)$ . Fit full then use stepwise and anova to select simpler model if appropriate.