Decision theory

Deicision rules. Loss functions and risks

D6.1.1 A statistical decision problem consists of the following elements:

The goal of decision theory is to find the best decision rule given a loss function.

If `ccP` is parametric, then often write `L(theta, d(x))`

D6.1.2 A dr is `d_1`:

Let `ccT` be a class of decision rules. A dr d is `ccT`-optimal if `d` is as good as any other dr in `ccT`. If `ccT` contains all the possible dr's, then `d` is optimal if `d` is `ccT`-optimal

D6.1.2

Given a statistical decision problem, let `bbbD = {"all nrdr"}`, `bbbD_b=("all bdr")`. An nrdr is a degenerate bdr with `delta_d(x, A) = I_A(d(x))`. `bbbD sub bbbD_b`

D6.1.3 A randomised decision rule (rdr) `bar delta` is a pm on `(bbbD, ccD)`. The risk is `R(P, bar delta) = int_bbbD R(P, g) d bar delta(g)`

Remarks:

Admissiability and geometry of decision rules

D6.2.1 Let `ccT` be a class of drs. A dr `delta in ccT` is `ccT`-admissible if there is not dr `in ccT` that is better than `delta`.

The notion of admissibility is a retreat from `ccT`-optimality as the latter may not exist.

T6.2.1 Suppose `bbbA sub RR^k` is convex aand `delta in bbbD_b` st `int_bbbA ||a|| delta(x, d(a)) < oo quad AA x in ccX`. Let `d(x) = int_bbbA a delta(x, d(x)) quad AA x in ccX` (a nrdr). Then:

Geometry of decision rules

A helpful device to understand basics of decision rules. Assume `ccP = {P_1, ..., P_k}` a finite collectioon of pms. Given a dr `delta` define the k-dim risk profile `y_delta = (R(P_1, delta), ..., (R(P_k, delta)))`. Let `ccR_(k,r) = {y_delta in RR^k, AA delta in bbbD_r}`, `ccR_(k,r) = {y_delta in RR^k, AA delta in bbbD_b}`

T6.2.2 `ccR_(k,r)` and `ccR(b,r)` are convex.

D6.2.2 Let `X = (X_1, ..., X_k )` and `Y = (Y_1, ..., Y_k)`:

D6.2.3 `X in RR^k`, the lower quadrant of X is the set `Q_x = {Y in RR^k; Y <= X}`

D6.2.4

T6.2.3 `y_delta in ccR_(k,b)`. If `y_delta in lambda(ccR_(k,b))` then `delta` is admissible. The converse is true if `ccR_(k,b)` is closed.

Complete classes of decision rules

D6.3.1 Let `ccG sub bbbD_b` be a class of drs.

T6.3.1 Let `A(bbbD_b)` be the set of admissable drs in `bbbD_b`. If a MCC exists then it is `A(bbbD_b)`

T6.3.2 If `A(bbbD_b)` is complete, then it is a MCC.

T6.3.3 Suppose `ccP = {P_1, ..., P_k}` is finite. If `bbbD_b` is continuous from below, then `bbbD_0 {delta in bDDD: Y_delta in lambda(ccR_(k,b))}` is MCC.

T6.3.4 If `T(X)` is sufficient for `ccP` and `delta in bbbD_b` then `delta'(X,A) = E(delta(X,A) | T(X)) in bbbD_b` and `R(P, delta') < R(P, delta) quad AA P in ccP`

L6.3.5 Suppose `bbbA sub RR^k` is convex and `d_1, d_2 in bbbD`. Let `d(x) = (1/2)(d_1 + d_2)` then:

C6.3.6 Suppose `bbbA sub RR^k` is convex and `d_1, d_2 in bbbD` with same rish `AA P in ccP`. If loss is convex in `a quad AA P in ccP` and strictly convex for one `P_0 in ccP` and `R(P_0, d_1) = R(P_0, d_2) < oo` and `P(d_1 != d_2) > 0` then `d_1` and `d_2` are inadmissable.

T6.3.7 (Rao-Blackwell theorem). Let `bbbA` be a convex subset of `bbbR^k` and `T` be a sufficient statistic for `ccP`. Let `d` be an nrdr with `E_p||d(x)|| < oo quad AA P in ccP` and `d_0(x) = E_p(d |T)(x)`. Then

Bayes and minimax rules

So far we have compared drs `delta_1` and `delta_2` via their risk vectors/profiles, however, this multivariate comparison may not produce a "better" rule. Bayes and minimax rules used a univariate measure of risk.

D6.4.1 Given a statistical decision problem `(bbbX, ccX, ccP)`, `(bbbA, ccA)`, `L(p,a)` `delta in bbbD` which produce a risk `R(P delta)`. The Bays risk wrt a prior pm `Pi` on `(ccP, ccF_P)` is `R_Pi(delta) = int_ccP R(P, delta) Pi(dP)`.

Remarks:

D6.4.2 Let `ccT` be a set of drs. A dr `delta_0 in ccT` is the `ccT`-Bayes rule if `R_Pi(delta) = int_(delta in ccT) R_Pi(delta)`

T6.4.1 Let `ccP = {P_1, ..., P_k}` and `ccT` a family of drs. If `delta_0 ` is `ccT`-Bayes wrt to a prior `PI = (pi_1, ..., pi_k)`, `pi_i > 0 sum pi_i = 1` then `delta_0` is admissable.

T6.4.2 Let `ccP = {P_theta; theta in Theta sub RR^k}` st tevery open ball in `Theta` has a non-empty intersection with the interior with positive `Pi`-probability and `R(theta, delta) := R(P_theta, delta)` is continuous wrt `theta` on `Theta` for each `delta in ccT`. If:

then `delta_0` is `ccT`-admissable.

T6.4.5 If `ccP` is finite, and `delta` is `ccT`-admissable then there exists a prior `Pi` on `(ccP, ccF_P)` st `delta` is `ccT`-Bayes wrt `Pi`.

P6.4.4 (Lehmann's theorem) Let `T: (Omega, ccF) -> (Lambda, ccG)` measurable, `phi: (Omega, ccF) -> (RR^k, B(RR^k))` measurable. Then `phi` is `(Omega, sigma(T)) -> (RR^k, B(RR^k))` iff there exists a `psi: (Lambda, ccG) -> (RR, B(RR^k))` st `phi = psi * T`

D6.4.3 The conditional expectation of `X | Y=y` for some `y in RR^k` is `E(X|Y=y) = h(y)`

P6.4.5 Let `X` and `Y` be n- and m-dimensiional r. vectors. Suppose `P_((x,y))`, the pm of `(X,Y)` is dominated by `vu xx lambda` with density `f(x,y)` where `nu` and `lambda` are `sigma`-finite measure of `(RR^n, B(RR^n))` and `(RR^m, B(RR^m))`. Let `g(x,y): (RR^(n+m), B(RR^(n+m))) -> (RR, B(RR))` be measurable, st `E|g(x,y)| < oo`. Then `E(g(X,Y)) = (int g(x,Y) f(x,Y) nu(dx))/(int f(x,Y) nu(dx))` and `E(g(X,Y) | Y=y) = (int g(x,y) f(x,y) nu(dx))/(int f(x,y) nu(dx))`.

T6.4.6 (Existence of conditional distribution in a general case). Let `X` be a n-dim r. vec on `(Omega, ccF, ccP)`. `Y: (Omega, ccF) -> (Lambda, ccG)` measurable. Then there exists a regular cond pm `P_(X|Y)( * | y)` called the conditional distribution of `X|Y=y`, st

Furthermore, if `E|g(X,Y)| < oo` for `g: RR^n xx Lambda -> RR` measurable, then `E(g(X,Y) | Y=y) = E(g(X, y) | Y=y) = int_(RR^n) g(x,y) dP_(X|Y)(x|y) "wp1" P_Y`

Remark:

Construction of Bayes Rules

T6.4.7 (Bayes formula). Assume `ccP = {P_theta; theta in Theta}` is dominated by `sigma`-finite `nu` and `f_theta(x) = (dP_(X|theta)(x | theta))/(dnu)` is the cond density which is a function of `(x, theta)` is measurable on `(bbbX xx Theta, sigma(ccX xx ccF_Theta))`. Let `Pi` by a prior pm on `(Theta, ccF_Theta)` Assume `m(x) = int_Theta f_theta(x) dPi > 0 ` then:

Remarks:

T6.4.8 Under the conditions of T6.4.6 and `Theta, bbbA` convex. Then

Minimax rules

D6.4.4 A dr `delta_o in ccT` is the minimax rule if `spr_(P in ccP) R(P, delta_0) = inf_(delta in T) spr_(P in ccP) R(P, delta)`. A minimax rule has the smallest worst-case risk.

D6.4.5 If a dr `delta` has constant risk if it is an equiliser rule, ie `R(P, delta) = c quad AA P in ccF`

T6.4.6 If `delta` is an equiliser rule and is admissible then it is minimax.

T6.4.9 Suppose `{delta_i}_{i>=1}` is a sequence of drs, and each `delta_i` is Bayes wrt `Pi_i`. If `R_(Pi_i) -> c < oo` and `delta_0` is a dr with `R(P, delta_0) <= c quad AA P in ccP` then `delta_0` is minimax.

C6.4.10 If `delta_0` is an equiliser rule and is Bayes wrt `PI_0` then it is minimax.

Remarks:

D6.4.6 A prior `Pi_0` is least favourable if `R_(Pi_0) = sup_Pi R_Pi`.

If a dr `delta` which is least favourable has the best chance of being minimax.

C6.4.11 If `delta` is Bayes wrt `Pi` and `R(p, delta) <= R_Pi = int_(delta in T) R_Pi(delta)` then `delta` is minimax and `Pi` is least favourable.

Unbiased estimators and invariance drs

D6.5.1 `(bbbX, ccX, ccP = {P_theta, theta in Theta})`

T6.5.1 (Lehmann-Scheffe) Suppose that:

If there exists an UE of a parametric fn `gamma(theta)` then there exists a best UE (BUE) of `gamma(theta)`

Invariance drs

Would be nice to have decision rules that are invariant to transformations of r. obs.

D6.5.2 Let `G != o/`, and `@` be a binary operator on `G`. `(G, @)` is called a groupp if:

`e` is unqiue and is the identity element of `G`. `h` is the inverse of `g`.

D6.5.3 Let `(bbbX, ccX)` be a measurable space. `G` is a group of measurable transformations (GMT) if :

D6.5.4 Let `(bbbX, ccX, {P_theta, theta in Theta})` be ps for r. obs `X` and `G` is a GMT on `(bbbX, ccX)`, then `ccP` is invariant under `G` if `AA theta in Theta, g in G` there exists a unique `theta' in Theta` st `P_theta(g^(-1)(A)) = P_(theta')(A) quad AA in bbbX`