Fundamental paradigms and principles of inference
- Frequentest/classical: widely used, built around idea of repeat experiments
- Fisherian: uncommon, bases inference on likelihood alone
- Bayesian: growing in popularity, bases inference on likelihood and prior probabilities, views probabilities as expressions of belief
Two principles are widely considered to be fundamental: sufficiency and conditionality. Intuitively, if we partition $(x, y)$ into $y$ and $y | x$ , sufficiency says that $y | x$ has no information about $\theta$ and conditionality says that $y$ has no information about $\theta$ .
Let $\mathbf{X} (X_1, X_2, \cdots, X_n)$ be a random vector of observations, with joint pdf $f_n(\mathbf{X}, \mathbf{\theta}) \equiv f_n(x_1, \cdots, x_n;\mathbf{\theta})$ , where $\mathbf{x} \in \mathbb{R}^n$ . Given $\mathbf{x} = x_1, \cdots, x_n$ , any $\mathbf{\hat{\theta}}$ that maximises $L(\mathbf{\theta}) \equiv L(\mathbf{\theta}; \mathbf{x}) = f_n(\mathbf{x}, \mathbf{\theta})$ over $\Theta$ is called a maximum likelihood estimate of the unknown true parameter.
Sufficiency
$T(\mathbf{X})$ is sufficient for $\mathbf{\theta}$ if the distribution of $\mathbf{X}$ given $T(\mathbf{X})$ does not depend on $\mathbf{\theta}$ .This means we can partition the data into pieces $T(\mathbf{X})$ and $\mathbf{X} | T(\mathbf{X})$ . The latter doesn’t contain any information about $\theta$ , so we should base inference solely on $T(\mathbf{X})$ . You can establish that a statistic is sufficient by calculating the conditional distribution, or use the factorisation theorem.
$T(\mathbf{X})$ is sufficient for $\mathbf{\theta}$ iff we can partition the distribution function into two function, one involving $T(x)$ and $\theta$ , the other $x$ . ie. $f_n(\mathbf{x}, \mathbf{\theta}) = g(T(\mathbf{x}); \mathbf{\theta})hShows that maximum likelihood estimators are functions of sufficient statistics because the maximisation of $L(\mathbf{\theta}, \mathbf{x})$ wrt $\mathbf{\theta}$ depends on $\mathbf{x}$ only through the sufficient statistic.
Conditioning
Let $E_1$ and $E_2$ be two experiments with same parameter space $\Theta$ and with densities $f_1$ and $f_2$ , with same unknown parameter. Let $E$ be the mixture experiment composed of $(E_1, x_1)$ with probability $p$ and $(E_2, x_2)$ with probability $(1-p)$ .
The conditionality principle: If we observe $(E_i, x_i)$ , then the information from $(E, (E_i, x_i)$ , is the same as that of $(E_i, x_i)$ . The idea is to condition on (ie. treat as fixed) random variables that don’t contain any info about $\theta$ – very important for frequentest, as defines what “repitition of experiment” means
The likelihood principle
The likelihood principle: If $x_1$ observed from $E_1$ and $x_2$ observed from $E_2$ have the same likelihood functions (to within a constant) then the “information content” wrt inference about $\theta$ is the same.
Implies both the sufficiency and conditionality principles. However, it is not particularly palatable to frequentests because it implies information content does not depend on the notion of sample spaces and repeatability of the experiment.
We will perform inference based on likelihoods because for almost all practical problem it (or some modification of it) works well. Later we will see that MLE leads to estimators with good asymptotic performance.