Maximum likelihood estimation

Basic likelihood estimation and inference

`bbY = (Y_1, Y_2, ..., Y_n)^T` vector of iid rv with possible values in `Omega_1, ..., Omega_n`, and assume that `bbY in Omega = Omega_1 xx ... xx Omega_n` (positivity condition). `bbtheta = (theta_1, ..., theta_p)^T` vector of parameters st `theta in Theta in RR^p`, `p < n`.

Likelihood function `l_n (theta) = prod f_i(y_i | theta)`. Maximum likelihood estimator of `theta` is `hat theta in Theta` st ` l_n(hat theta) >= l_n(theta) AA theta in Theta`. Typically solved by taking logs, then derivatives and finding roots.

Let `P_theta` be distribution of rv indexed by parameter `theta`. Suppose for `theta in Theta`

`P_theta` have common support
random variables are iid with common density function `f(y_i | theta)`
true value of `theta`, `theta_0` lies in interior of `Theta`

Then as `n -> oo` `P[ prod f(y_i | theta_0) > prod f(y_i | theta) ] -> 1`. Provides connection between ML estimate and true value.

If independent and iid:

`l_n(theta) = prod f(y_i | theta)`
`L_n(theta) = sum log(f(y_i | theta))`
`U_(n,k)(theta) = sum 1/(f(y_i|theta)) (grad/(grad theta_k)) f(y_i | theta)` (score function)
`I_(n,j,k)(theta) = nE[grad/(grad theta_k) log(f(y_i|theta)) grad/(grad theta_j) log(f(y_i | theta))]` (information matrix)

Properties of estimators

Developed under sets of technical conditions called regularity conditions. Many different sets from which we can derive a range of properties. Will focus on two sets, the first guarantees a consistent estimator of `theta`, and the second provides asymptotic normality.

Regularity conditions set 1

Distributions of `Y_1, ..., Y_n` are distinct and have common support
True value lies in interior of open interval contained within parameter space
For almost all `y` the density function is differentiable with respect to all elements of `theta`

`=>` Then there exists a sequence of values `{hat theta_n}` which solve the likelihood equations and is a consistent sequence of estimators for `theta`. (does say solutions are unique, or maximum likelihood esimators.

Corollary 1: If the parameter space is finite, then there is a sequence of consistent unique ML estimates Corollary 2: If the likelihood has a unique root for each `n` then that sequence of estimators is consistent.

There are four basic (essentially indepdent) things we want to happen:

existence of mle (or sequence)
existence of roots of the likelihood equation
uniqueness of estimators
consistency of sequences of estimators

Regularly conditions set 2

`E[grad/(grad theta_k) log(f(Y|theta))] = 0`
`I_(j,k)(theta) = -E[ grad^2/(grad theta_k grad theta_j) log(f(Y|theta))]`
the information matrix is positive definite
`I_n(theta) -> oo`

Can replace 1 and 2 with being under to exchange differentiate under integral.

`=>` there exists a sequence of solutions to the likelihood equations st:

`hat theta_n` is consistent for `theta`
`n^(1/2)(hat theta_n - theta)` is asymptotically normal with mean `bb0` and covariance `nI_n^(-1)(bb theta)`.
`hat theta_(n,k)` is asymptotically efficient

Two additional properties of MLEs are useful:

if a given scalar parameter `theta` has a single sufficient statistic, then the MLE must be a function of that statistic. If the statistic is minimal and complete, then the MLE is unique; if the MLE is unbiased, then it is the UMVU
invariance: the MLE of a function of parameters, is the function of the MLEs of the parmeters

Wald theory inference

`(hat theta_n - theta)^T I_n(hat theta_n) (hat theta_n - theta) -> Chi^2_p`

Let `b(theta) = (R_1(theta), ..., R_r(theta))^T` be an `r xx 1` matrix of defined restrictions on model parameters. Let `C(theta) = (c)_(k,j) = grad/(grad theta_k) R_j(theta)`. Then `W_n = b^T (C I_n^(-1) C^T) b -> X^2_r`.

Likelihood inference

Let `dim{Theta} = p` and `dim{Theta_0} = r` and `hat theta_n = spr_(theta in Theta) L_n(theta)`, `bar theta_n = spr_(theta in Theta_0) L_n(theta)`, then `T_n = -2(L_n(bar theta_n) - L_n(hat theta_n)) -> Chi^2_(p-r)`.