Families of distributions useful in modelling

Exponential families

Multiple representations:

`f(Y | theta) = exp( sum q_j(eta) T_j(Y) ) c(eta) h(Y)`
`f(Y | theta) = a(theta) t(Y) exp(theta^T t(Y))`
`f(Y | theta) = exp( sum q_j(eta) T_j(Y) - B(eta)) c(Y)`
`f(Y | theta) = exp( sum theta_j T_j(Y) - B(theta) + c(Y))` (this is the one we use)

Properties:

The parameter space (`Theta`) is a convex set. To avoid difficulties will assume that neither `T_j(Y)` nor `theta_j` satisfy a linear constraint. If `Theta` contains an open s-dimensional rectangle then family said to be of full rank or regular
For a minimal, regular family the statistic `T = (T_1, ..., T_s)` is minimal sufficiennt for `theta`
For an integrable function `h(*)` can exchange integration and differentiation
`E[T_j(Y) = del/(del theta_j) B(theta`)
`cov(T_j(Y), T_k(Y)) = del^2/(del theta_j del theta_k) B(theta)`
the moment generating function of an exponential family is defined to be that for the moments of the `T_j`’s

The parameters `theta_j` are called canonical parameters. Often the easiest for derivation of properties, not always the best. Helpful to have other parameterisations with certain desirable properties.

Mean value parameterisation 1. Expected value often of interest, but usually none of the canonical parameters correspond to the mean. Can transform `(theta_1, ..., theta_s)` to `(mu = E(Y), theta_1, ..., theta_(s_1))`.

Mean value parameterisation 2.In canonical representation, clear relationship between parameters and sufficient statistics, so we could parameterise using expected values of the `T_j`, transforming `(theta_1, ..., theta_s)` to `(mu_1(theta), ..., mu_s(theta))` where `mu_j(theta) = E[T_j(Y)] = del/(del theta_j) B(theta)`. Has the potential advantage that each parmater of the density is the expected value of a random variable associated with an observable quantity.

Mixed parameterisations. Also possible to write exponential family in terms of both canoncial and mean value eg. `(mu_1(theta), theta_2)`

Reasons for choosing one parameterisation over another:

for interpretation, performed after estimation
for numerical stability
may connect model with elements of experiment more directly
make it more clear how to incorporate covariates
parameters restrictions more appropriate to actual situation

Exponential dispersion families

Only talk about small subset – essentially one parameter families extended to include additional dispersion parameter (most common for applications). Important family is those where the sufficient statistics is `T(Y) = Y`, called natural exponential families. eg. binomial with fixed n, normal with fixed variance. Have the form `f(Y| theta) = exp phi(Y theta - b(theta)) + c(Y phi)` and `E[X] = b'(theta)`, `"Var"[x] = 1/phi b``(theta) = 1/phi V(mu)`.

Have essentially coerced a two parameter family to look “almost” like a natural exponential family with the addition of a nuisance parameter `phi` called the dispersion parameter which is a scale factor for the variance
Can only write in this form if one of the sufficient statistics is given by the identity function (eg. binomial, Poisson, normal, gamma, inverse Gaussian, also used for glm’s)

If we have more than one (iid) variable then there joint distribution will be `f(Y | theta) = exp( sum theta_j sum T_j(Y_i) - B(theta) + sum c(Y_i))`, which is still in the exponential family form.

Location-Scale families

Families of distributions formed from classes of transformations are useful, particularly location-scale transformations. Let `U` be an rv with distribution `F`, if `U` is transformed into `V = U + mu` then `V` will have distribution `F(Y - mu)`. The set of distributions generated from all `mu in (-oo, oo)` is called a location family. Similarly, the transformation `V = sigma U` generates a scale family of distributions, and the composition generates `V = mu + sigma U` with distribution `F((Y-mu)/sigma)`.

Location-scale families include double exponential, uniform, logistic and normal.

Properties

`E(V) = E(U) + mu` `Var(V) = sigma^2 var(U)`

Prominence and limitations of normal distribution

Can easily be expressed in terms of variance independent parameters
A `N(mu, sigma^2)` parameterisation gives location and scale directly
In samples, allows reduction through sufficiency (generally true for exponential, but not location-scale families)

Limitations:

Is continuous, so not good for discrete
Has fixed “shape”
Range is entire real line