Properties of maximum likelihood estimation
Nice things:
- intuitive
- very widely applicable, can combine data from multiple experiments
- unaffected by monotonic transformations of the data
- MLE of a function of the parameters, is that function of the MLE
- theory provides large sample properties
- asymptotically efficient estimators
- provides general methods of inference
Not-so-nice things:
- may be slightly biased
- parametric model required and must adequately describe statistical process generating data
- can be computationally demanding
- fails in some case, eg. if too many nuisance parameters
Usually easier to maximise log-likelihood.
ML estimation doesn’t depend on parameterisation of model. If $g$ is a 1:1 function, then the MLE of $g(\theta) = g(\hat{\theta})$ , and more generally we will define $g(\hat{\theta})$ to be the MLE of $g(\theta)$ . This means we can use the most convenient parameterisation (although some may have better properties than others)
ML estimation invariant to transformation of observations. If $Y$ is a function of $X$ , then $f_Y(y;\theta) = |dx/dy| f_X(x;\theta)$ , where $dx/dy$ does not depend on $\theta$ .