Models based on response distribution

Greatly influnced by generalised linear models, but they form a small, restricted, subset.

Alternative to thinking about model in terms of signal and noise, is to think about systematic and random components that can combine in a non-additive way. Systematic equivalent to signal, but random refers to basic distribution of response variables, not additive error. In this type of model, usually first think about type of response distribution if other factors in model held constant.

Generalised linear models

Class of non-linear models with response distributions from the exponential family. Don’t have to be iid but can only differ through their natural parameters (`theta_i`) but not dispersion parameter (a constant `phi`).

`f(y_i | theta_i) = exp( phi(y_i theta_i 0 b(theta_i)) + c(y_i, phi))` – gives random component. Systematic component consists of link and linear predictor (usually a typical linear model). The link is defined as `g(mu_i) = eta_i`, `g` monotonic. Special set of link functions called canonical `g(mu_i) = b'^(-1)(mu_i) = theta_i`

Normal: `g(mu_i) = mu_i`
Poisson: `g(mu_i) = log(mu_i)`
Binomial: `g(mu_i) = log(mu_i / (1-mu_i))`
Gamma: `g(mu_i) = 1/mu_i`
Inversion gaussian: `g(mu_i) = 1/mu_i^2`

Nice properties, but not especially wonderful. Most important thing is that link function maps `RR` to appropriate range.

Log link
Power link `g(mu_i) = mu_i^lambda`
Complimentary log-log link: `log(1-log(1-mu_i))`

Other important aspect is the variance function `V(mu_i)` which is proportional to the variance of the response (constant of proportionality = `1/phi`). Not open to specification, but determined by random component.

normal: 1
poission: `mu_i`
binomial: `mu_i(1-mu_i)`
gamma: `mu_i^2`