Properties of maximum likelihood estimation

Nice things:

intuitive
very widely applicable, can combine data from multiple experiments
unaffected by monotonic transformations of the data
MLE of a function of the parameters, is that function of the MLE
theory provides large sample properties
asymptotically efficient estimators
provides general methods of inference

Not-so-nice things:

may be slightly biased
parametric model required and must adequately describe statistical process generating data
can be computationally demanding
fails in some case, eg. if too many nuisance parameters

Usually easier to maximise log-likelihood.

ML estimation doesn’t depend on parameterisation of model. If $g$ is a 1:1 function, then the MLE of $g(\theta) = g(\hat{\theta})$ , and more generally we will define $g(\hat{\theta})$ to be the MLE of $g(\theta)$ . This means we can use the most convenient parameterisation (although some may have better properties than others)

ML estimation invariant to transformation of observations. If $Y$ is a function of $X$ , then $f_Y(y;\theta) = |dx/dy| f_X(x;\theta)$ , where $dx/dy$ does not depend on $\theta$ .