Maximum likelihood estimation

Basic likelihood estimation and inference

`bbY = (Y_1, Y_2, ..., Y_n)^T` vector of iid rv with possible values in `Omega_1, ..., Omega_n`, and assume that `bbY in Omega = Omega_1 xx ... xx Omega_n` (positivity condition). `bbtheta = (theta_1, ..., theta_p)^T` vector of parameters st `theta in Theta in RR^p`, `p < n`.

Likelihood function `l_n (theta) = prod f_i(y_i | theta)`. Maximum likelihood estimator of `theta` is `hat theta in Theta` st ` l_n(hat theta) >= l_n(theta) AA theta in Theta`. Typically solved by taking logs, then derivatives and finding roots.

Let `P_theta` be distribution of rv indexed by parameter `theta`. Suppose for `theta in Theta`

Then as `n -> oo` `P[ prod f(y_i | theta_0) > prod f(y_i | theta) ] -> 1`. Provides connection between ML estimate and true value.

If independent and iid:

Properties of estimators

Developed under sets of technical conditions called regularity conditions. Many different sets from which we can derive a range of properties. Will focus on two sets, the first guarantees a consistent estimator of `theta`, and the second provides asymptotic normality.

Regularity conditions set 1

`=>` Then there exists a sequence of values `{hat theta_n}` which solve the likelihood equations and is a consistent sequence of estimators for `theta`. (does say solutions are unique, or maximum likelihood esimators.

Corollary 1: If the parameter space is finite, then there is a sequence of consistent unique ML estimates Corollary 2: If the likelihood has a unique root for each `n` then that sequence of estimators is consistent.

There are four basic (essentially indepdent) things we want to happen:

Regularly conditions set 2

  1. `E[grad/(grad theta_k) log(f(Y|theta))] = 0`
  2. `I_(j,k)(theta) = -E[ grad^2/(grad theta_k grad theta_j) log(f(Y|theta))]`
  3. the information matrix is positive definite
  4. `I_n(theta) -> oo`

Can replace 1 and 2 with being under to exchange differentiate under integral.

`=>` there exists a sequence of solutions to the likelihood equations st:

Two additional properties of MLEs are useful:

Wald theory inference

`(hat theta_n - theta)^T I_n(hat theta_n) (hat theta_n - theta) -> Chi^2_p`

Let `b(theta) = (R_1(theta), ..., R_r(theta))^T` be an `r xx 1` matrix of defined restrictions on model parameters. Let `C(theta) = (c)_(k,j) = grad/(grad theta_k) R_j(theta)`. Then `W_n = b^T (C I_n^(-1) C^T) b -> X^2_r`.

Likelihood inference

Let `dim{Theta} = p` and `dim{Theta_0} = r` and `hat theta_n = spr_(theta in Theta) L_n(theta)`, `bar theta_n = spr_(theta in Theta_0) L_n(theta)`, then `T_n = -2(L_n(bar theta_n) - L_n(hat theta_n)) -> Chi^2_(p-r)`.