Cramer-Rao inequality
Motivation: Will eventually show that ML estimators converge in distribution at $\sqrt(n)$ rate, subject to very general regularity conditions. Ie., if the data are iid $\sqrt(\hat{\theta}_n - \theta) \arrow_D N_s(0, V(\theta))$ . More large n, MLE is approximately ~ $N(\theta, V(\theta)/n)$ .
Cramer-Rao inequality for $\theta \in \mathcal{R}$
Let:
- $X = (X_1, ..., X_n)$ be a sample from distribution, with joint pdf $f_n(x;\theta)$ .
- $\delta(X)$ be any unbiased estimator of $\theta$ .
If:
- $Var_\theta(\delta) \lt \infty$ , and
- $\int {\delta (x)f_x (x;\theta )} dx$ and $\int {f_x (x;\theta )} dx$ can be differentiated wrt $\theta$ under the integral sign
Then:
- $Var_\theta(\delta) \ge 1 / I(\theta)$
- where $I(\theta ) = E_\theta [ ( {\frac{d l(\theta ;X}{d \theta } )^2 } ]$
- $1 / I(\theta)$ is the Cramer-Rao lower bound
- $I(\theta)$ is the (expected) Fisher information that $X$ contains about $\theta$ . It quantifies the amount of info the random vector $X$ provides about $\theta$ . Large $I(\theta)$ is good.
Proof:
- want to show $Var_\theta (\delta )I(\theta ) \ge 1$
- $Var_\theta (\delta )I(\theta ) = E( {\delta (x) - \theta } )^2 E({\frac{{d\,l(\theta ;X)}}{{d\theta }}} )^2 \le E[ {( {\delta (x) - \theta })( {\frac{{d\,l(\theta ;X)}}{{d\theta }}} )} ]^2 $ , by the Cauchy-Schwartz inequality
- RHS = $( {\int {(\delta (x) - \theta )( {\frac{{d\,\log f_n }}{{d\theta }}} )f_n dx} } )^2 = ( {\int {(\delta (x) - \theta )\frac{{f'_n }}{{f_n }}f_n dx} } )^2 = ( {\int {(\delta (x) - \theta )f'_n dx} } )^2$
- $\int {\theta f'_n } = \theta \frac{{d\int {f_n dx} }}{{d\theta }} = \theta \frac{{d1}}{{d\theta }} = 0$
- and $\int {\delta (x)f'_n dx = \frac{d}{{d\theta }}\int {\delta (x)dx} } = \frac{{d\theta }}{{d\theta }}$
CR inequality does not imply existence of an unbiased estimator that achieves the lower bound, or in fact, any unbiased estimator.
CR inequality for $g(\theta)$
If $\delta(X)$ is an unbiased estimator of $g(\theta)$ then $Var_\theta(\delta) \ge \frac{g'(\theta)^2}{I(\theta)}$ . Can be proved through minor modification of above proof, or when $g(\theta)$ is invertible, by reparameterising the likelihood function in terms of $\zeta = g(\theta)$ .
Alternative formulae
- $I(\theta) = Var_\theta [ \frac{d l(\theta;X)}{d \theta} ]$
- $I(\theta) = -E_\theta [ \frac{d^2 l(\theta;X)}{d \theta^2} ]$
iid case
If indepedent, multiply likelihoods, add log-likelihoods, indentical so distributions all equal: $I_n(\theta) = n I_1(\theta)$ .
Multiparameter case
Results as for single parameter case, but in vector form:
- $Var_\theta(\delta) \gt (g')^T I^{-1}(\theta) g^T$
- $I_ij(\theta) = E [ \frac{dl}{d\theta_i} \frac{dl}{\theta_j} ]$
- $I_ij(\theta) = -E [ \frac{dl}{d\theta_i \theta_j} ]$