Least squares estimation

At heart, a geometric problem – want to minimise `(y - X beta)^T A (y- X beta)`, ie. when `beta = beta^** = (X^T A X)^(-1) X^T A y`.

used almost exclusively with additive error models

For OLS, where `Y = X beta + sigma epsi`, `hat beta = (X^T X)^(-1) X^T Y`. Attach statistical properties to `hat beta` through Gauss-Markov theorem which states that `hat beta` is UMVU among all estimators that are linear functions of `Y`. `cov(hat beta) = (X^T X)^(-1) sigma^2`

For WLS, `Y_i = X beta + (sigma / sqrt(W)) epsi`, which leads to `hat beta = (X^T WX)^(-1) X^T W Y` and `cov(hat beta) = (X^T W X)^(-1) sigma^2`

For GLS, Gauss-Markov non-longer applies but we can use an iterative method to get similar results:

Calculate initial estimates `beta^((0))`
Calculate `n xx n` diagonal weight matrix `W^((j))`, `w_i(beta^((j))) = g^2_2(x^T_i beta^((j)), theta)`
Calculate `V^((j))` matrix, `V^((j))_(i,k) = {: del/(del beta_k) g_1(x_i, beta) |_(beta = beta^((j))`
Calculate `Y^((j))_i = Y_i - g_1(x_i, beta^((j)))`
Calculate step `delta^((j)) = (V^T W V)^(-1) V^T W Y` and update parameters `beta = beta + delta`

Attach statistical properties through fundamental theorem of generalised least squares which states that if `beta^((0))` is `n^(1/2)`-consistent then `beta^((j))` will converge to true values, with asymptotically normal distributions. `hat sigma^2 = 1/(n-p) sum ( (Y_i - g_1(x_i, hat beta)) / (g_2(x_i, hat beta, theta)))^2` and `hat cov(hat beta) = hat sigma^2/n [ 1/n sum (v(x_i, hat beta) v(x_i, hat beta)^T )/(g^2_2(x_i, hat beta, theta)]^(-1)`.