Advanced calculus
Elementary set theory
A set is a collection of objects, usually defined by a common defining property.
Set operations
- Subset: `B sub A` if `x in B => x in A`
- Union: `A uu B = { omega : omega in A or omega in B }`
- Intersection: `A nn B = {omega: omega in A and omega in B}`
- Complement: `A^c = {omega: omega !in A}`
- Symmetric difference: `A delta B = (A^c nn B) uu (A nn B^c)`
- Cartesian product: `A xx B` is all ordered pairs `(omega_1 in A, omega_2 in B)`
- Product space: Let `{Omega_alpha : alpha in I}` be a collection of nonempty sets, then `X_(alpha in I} Omega_alpha` is the product space defined as `{f : f " defined on" I "st" AA alpha f(alpha) in Omega_alpha}`. Axiom of choice states that this space is non-empty
Equivalence relationships
Let `G` define an relationship (`G sub Omega xx Omega`). It is an equivalence relationship if it is :
- reflexive: `AA x in Omega, (x,x) in G`
- symmetric: `x ~ y => y ~ x`
- transitive: ` x ~ y, y~z => x~z`
Functions
A function is a correspondence between elements in a set `X` (domain) and `Y` (range). `AA x in X` there is a unique `y in Y` that corresponds to it: `y = f(x)`.
- onto: if `f(X) = Y`
- one to one: if `AA y in f(X) EE "a unique" x in X`
- if `f` is onto and one-to-one then `X` and `Y` have same cardinality
- monotone: if `x > y => f(x) > f(y)`
A set `X` is:
- finite if `EE n in NN` st `X` and `Y = {1,2, ..., n}` have the same cardinality
- countable if `X` and `NN` have the same cardinality
- uncountable if not finite or countable
Induction
The set `NN` of natural numbers has the well ordering property such that nonempty subset of `NN` has a smallest element `s` st `s in A` and `a in A => a >= s`. Principle of induction is consequence of this property, and it states:
Let `{ P(n) : n in NN}` be a collection of propositions which are either true or false. If `P(1)` is true and `P(n) "is true" => P(n+1)` then `P(n)` is true for all `n`.
Real numbers
At least two approaches to defining the real numbers:
Start with `NN`, then construct `Z = ( NN uu {0} uu (- NN) )`, then `QQ`, then `RR` as either the set of all Cauchy sequences of rationals or as Dedekind cuts.
OR, can define the real numbers as the set that satisfies three axioms: `RR` is a field (algebraic), ordered and complete, ie (complete ordered field).
Algebraic axioms:
- commutative group under +
- multiplicative group under `**` (excluding 0)
- distributive axioms
Order axiom states that there is set `P sub R` (positive numbers) st:
- `x, y in P => xy in P, x+y in P`
- `x in P => -x !in P`
- `x in R => x = 0 or x in P or x in -P`
Given such a `P`, can define an order on `R` by defining `x < y` to mean `x-y in P`, and it follows that `AA x, y in R`, either `x=y, or x lt y or x > y` (linear order)
Let `A sub R`. `M` is an upperbound of `A` if `a in A => a <= M`. supremum `A` means the least upperbound of `A` (`sprA` is an upperbound for `A`, but if `K < sprA` then `K` is not an upperbound for `A`)
- `a in A => a <= sprA`
- `K < sprA => EE a in A "st" K < a`
Completeness axiom states that if `A sub R` has upperbound `< oo` then `EE stackrel~M < oo "st" stackrel~M = sprA`. Consequences: (Axiom of Eudoxus) `AA x in R EE n in NN "st" x < n`; `Q` is dense in `R`.
Extended real numbers `barR = {-oo, R, +oo}` and extend definitions of `+, *, <` intuitively.
Sequences and limits
A sequence of real numbers is the range of a function `f: NN -> RR`
`lim_n x_n = a in R if AA epsilon > 0 EE n_epsilon "st" n >= N_epsilon => |x_n - a| < epsilon`Let `{x_n} sub R`, `y_n = spr{x_j : j >= n}` then `y_n` is a non-decreasing sequence, and can be shown that all monotone nonincreasing sequences in R converge to `+oo` or a finite real number, so `lim_n y_n = y` exists and is called lim sup and written `bar{lim_n} x_n = inf_n (spr_(j>=n) x_j)`.
An alternative definition of `bar(lim_n)` is `bar(lim_n) = a in R if AA epsilon > 0 EE N_epsilon "st" n > N_epsilon => x_n < a + epsilon, and AA k EE n >= > k "st" x_n > a-epsi`.
A sequence `{x_n} in R` is a Cauchy sequence if `AA epsilon > 0, EE N_epsilon "st" n,m >= N_epsilon => |x_m - x_n| < epsilon`
Prop: if `{x_n} in R` is convergent in `R <=> {x_n}` is Cauchy. (proof requires completeness axiom).
Series
Let `{x_n}_{n>=1}` be a sequence of real numbers. Let `s_n = sum_{j=1}^n x_j, n=> 1` be the nth partial sum. The series `sum_1^oo x_j` converges to `s` in `R` if `lim_n s_n = s`. If `x_j >= 0` then `lim_n s_n = s in R or oo`. A series `sum_1^oo x_j` converges absolutely if `sum_1^oo |x_j|` converges. A series converges uniformly if its seuqence of partial sums converges uniformly.
Power series
Let `{a_n}_(n=>0)` be sequence of real numbers. The series `sum_1^oo a_n x^n` is called a power series and is said to be convergent on `B` if it converges `AA x in B`.
The radius of convergence `R = (barlim |a_n|^(1/n))^-1`. Power series converges when `|x| < R` and diverges to `oo` when `|x| > R`.
Taylor Series
`f: (a - eta, a + eta) -> RR, a in RR`. Suppose `f` is `n` times differentiable in `(a - eta, a+eta)`. Let `a_n = (f^(n)(a))/(n!)`. Then the power series `sum a_n (x- a)_n` is called the Taylor series of `f` at `a`. Let `a_n = (f^(n)(a))/(n!)`, then the power series `sum a_n(x-a)^n` is called the Taylor series of f. The Taylor remainder theorem says `AA x in I` and any `n >= 1' `|f(x) – sum a_j xj| <= |f((n+1))(y_n)|/((n+1)!) |x-a|` for some `y_n in I`. So if `lambda_k = sum |f(k)(y)|` and `sum lambda_k/(k!)` converges then the remainder `-> 0`.Metric spaces
Let `S != O/` and `d: S x S -> R^+` then if:
- `d(x,y) = d(y,x)`
- `d(x,z) <= d(x,y) + d(y,z)`
- `d(x,y) = 0 iff x=y`
then `d` is called metric on `S` and the pair `(d,S)` is a metric space
A sequence `{x_n} sub (S, d)` converges to `x in S` if `AA epsilon > 0 EE N_epsilon "st" n => N_epsi => d(x_n, x) < epsi` and is written `lim_n x_n = x`. A sequence in a metric space is Cauchy if `AA epsi > 0 EE N_epsi "st" n,m >= N_epsi => d(x_n, x_m) < epsi`. (any convergent sequence is Cauchy). A metric space is complete is every Cauchy sequence converges.
Continuous functions
Let `(S,d) and (T, rho)` be two metric spaces and `f:S -> T` a map between them. `f` is:
- continuous at a point, `p in S` if `AA epsi > 0 EE delta > 0 "st" d(x,p) < delta_epsi => rho(f(x), f(y)) < epsi`,
- continuous on a set, `B` if continuous at every `b in B`,
- uniformly continuous if `AA epsi > 0 EE delta > 0 "st" AA x,y in S, d(x,y) < delta => rho(f(x), f(y)) < epsi`
A map `f: S -> T` is continuous iff `AA O` open in `T` `f^{-1}(O)` is open in `(S,d)`
A collection of open sets `{O_alpha : alpha in I}` is an open cover for a set `B sub (S,d)` if `AA x in B, EE a in I "st" x in O_alpha`.
A set is compact if a finite subcollection of an open cover is an open cover.
Any `K sub R` is compact iif it is bounded and closed.
Measures
Classes of sets
`Omega != O/`, `P(Omega) = {A: A sub Omega}`, `ccF sub P(Omega)`
`pi` system | |
`A, B in ccF => A nn B in ccF` | If `C` is a `pi`-system, then `lambda( C ) = sigma(: C :)`. If `C` is a `pi`-system, and `ccL` is a `lambda`-system containing `C` then `ccL sup sigma(: C :)` |
Semi-algebra | |
`A, B in C => A nn B in C` `AA A in C, A^c = uu_(i=1)^(k The smallest algebra containing a semi-algebra is the class of finite unions of sets from C. |
|
Algebra | |
`Omega in ccF` `A in ccF => A^c in ccF` `A, B in ccF => A uu B in ccF` |
An intersection of `sigma`-algebras is always a `sigma`-algebra, but the union may not even be an algebra. If `sfF sub P(Omega)` then the `sigma`-algebra generated by `ccF`, `sigma(: ccF :) = ` intersection of all `sigma`-algebras containg `ccF`. |
`sigma`-Algebra | |
`ccF` is an algebra closed under countable (monotone) unions |
A useful class of `sigma`-algebra are those generated by open sets of a topological space. A topological space is a pair `(S,T)` where `S` is a non-empty set and `T` is a collection of subset st. `S in T`, and `T` is closed under intersection and uncountable unions. Elements of `T` are called open sets. The Borel `sigma`-algebra on a topological space `S` (in particular a metric or Euclidean space) is defined as the `sigma`-algebra generated by the collection of all open sets in `S` = `sigma(:T:)`. |
`lambda` system | |
`Omega in ccF` `A, B in ccF => B \\ A in ccF` `A_n in ccF, A_n sub A_(n+1) AA n>= 1 =>` `U_(n>=1) A_n in ccF` |
Every `sigma`-algebra is a `lambda`-system. The intersection of two `lambda`-systems is a `lambda`-system `lambda( C )` defined similarly to `sigma(:C:)` |
Measures
A set function is an extended real-valued function on a class of subsets of `Omega`. A set function `mu : A -> [0, oo]` is a measure if:
- `mu(O/) = 0`
- countably additive or monotone continuous from below.
A measure is infinite if `mu(Omega) = oo`, finite otherwise. A measure with `mu(Omega) = 1` is a probability measure. A measure is `sigma`-finite if there is a countable collection of sets such that `uu A_n = Omega` and `mu(A_n) < oo`.
Common measures:
- counting measure: `mu(A) = |A|`, `mu` is finite iff `Omega` is finite, `sigma`-finite if `Omega` is countable
- discrete probability mass: point mass on integers
- Lebesgue-Stieltjes measures on `RR`: arise from any nondecreasing function `F: RR -> RR`, such that `mu_F (a,b] = F(b+) - F(a+)`, all `sigma`-finite.
A measure `mu` on an algebra `F` has the following properties:
- mononticity: `A sub B => mu(A) <= mu(B)`
- finite subadditivity: `mu(A_1 uu .. uu A_k) <= mu(A_1) + ... + mu(A_k)`
- inclusion exclusion formula: `mu(A uu B) = mu(A) + mu(B) - mu(A nn B)`
- monotone continuity from above if `mu(A_(n_0)) < oo` for some `n_0 in NN`
- countable subadditivity
Uniqueness of measures: `mu_1, mu_2` two measures on a measurable space `(Omega, F)`. Let `C sub F` be a `pi`-system st. `F = signa(:C:)`. If `mu_1(B) = mu_2(B) AA B in C` then `mu_1(A) = mu_2(A) AA A in F`.
The extension theorems and Lebesgue-Stieltjes measures
A set function `mu: C -> [0, oo]` defined on a semi-algebra is a measure if:
- `mu(O/) = 0`
- countably additive where possible
Given a measure on a semi-algebra `C`, the outer measure induced by `mu` is the set function `mu^**`, defined on `P(Omega)` defined as `mu^**(A) = inf{ sum mu(A_n) : {A_n}_{n>=1} sub C, A sub uu_{n>=1} A_n}`. (approximate `mu` as the least upper bound of the measure of all possible covers). This is exact on `C` and `A`. A set `A` is exactly `mu^**` measurable if `mu^**(E) = mu^**(E nn A) + mu^**(E nn A^c) AA E sub Omega`.
`mu^**` satisfies: (any set functions with these properties is called an outer measure)
- `mu^**(O/) = 0`
- monotonicity: `A sub B => mu^**(A) <= mu^**(B)`
- countable subadditivity
Let `mu^**` be an outer measure on `Omega`. Let `ccM = ccM_{mu^**} = {A: A " is " mu^** "measurable"}`. Then:
- `ccM` is a `sigma`-algebra
- `mu^**` restricted to `ccM` is a measure
- `mu^**(A) = 0 => P(A) sub ccM`
Caratheodory's extension theorem*: Let `mu` be a measure on a semialgebra `C` and `mu^**` be defined as above. Then:
- `mu^**` is an outer measure
- `C sub M_{mu^**}`
- `mu^** = mu` on `C`
`A in M_(mu^**)`, `mu^**(A) < oo`, then `AA epsi >0 EE B_1, B_2, ..., B_k in C, k < oo`, `B` mutually disjoint, s.t. `mu^**(A o+ U B_j) < epsi`, where `E_1 o+ E_2` is the symmetric difference. That is measure, any measure on R can be apprommixtaely by a finite number of intervals, or every `mu^**` measurable set of finite measure is nearly a finite union of disjoint elements from the semialgebra C.
Lebesgue-Stieltjes measures on R
Let `F: RR -> RR` be nondecreasing. Let `C = { (a,b]: -oo <= a <= b < oo} uu { (a, oo): -oo <= a < oo}` and define `mu_F (a,b] = F(b+) - F(a+)`, then:
- `C` is a semialgebra
- `mu_F` is a measure on `C` (requires Heine-Borel theorem)
- `(RR, M_{mu_F^**}, mu_F^**)` is the complete measure space constructed with the Caratheodory extension method, and is called the Lebesgue-Stieltjes measure generated by `F`.
`F(x) = x` is known as the Lebesgue measure.
A Radon measure is finite on bounded intervals. `mu` is a radon measure iff it is a Lebesgue-Stieltjes measure.
Completeness of measures
A measure space is called complete if `AA A in F, m(A) = 0 => P(A) sub F`. The measure space generated by the exetension theory is complete.
It is always possible to complete an incomplete measure: Let `(Omega, F, mu)` be a measure space. Let `barF = {A : B_1 sup A sup B_2}` for some `B_1, B_2 in F` st `mu(B_1 \ B_2) = 0`. For any `A in barF` set `bar mu(A) = bar mu(B_1) = bar mu(B_2)`. Then:
- `barF` is a `sigma`-algebra and `F sub barF`
- `bar mu` is well defined
- `(Omega, barF, bar mu)` is a complete measure space and `bar mu = mu "on" F`.
Integration
Measurable transformations
Let `(Omega, F, P)` be a probability space. Then `f: Omega -> RR` is called a random variable if `X^(-1)(-oo, a] = {omega: X(omega} <= a} in F`. This is equivalent to the stronger condition that `X^(-1)(A) in F` for all `A in B(RR)`.
A mapping `T: Omega_1 -> Omega_2` between two probability spaces is measurable with respect to the `sigma`-algebras if `T^(-1)(A) in F_1 AA A in F_2`. (and will be measurable for any `barF_1 sup F_1` and `barF_2 sub F_2`).
In general, this is a difficult property to prove directly, but:
- if `F_2 = sigma(:A:)` and `T^(-1)(A) in F_1` then T is `(: F_1, F_2 :)` measurable
- and if `T_1: Omega_1 -> Omega_2` is `(: F_1, F_2 :)`-measurable, and `T:_2 Omega_2 -> Omega_3` is `(: F_2, F_3 :)`-measurable, then `T = T_1(T_2)` is `(:F_1, F_3:)`-measurable.
- any continuous function is measurable (converse not true, but can find an approximation that is exact except over a set of small measure)
- the identity function is measurable
And if `AA n in N`, `f_n : Omega -> barRR` is `(:F, B(barRR) :)` measurable, then:
- `spr_n f_n`, `inf_n f_n`, `barlim f_n`, `ul lim f_n` are all measurable
- `A = {omega: lim_(n->oo) f_n(omega) < oo}` lies in `F` and `h = [lim_(n->oo) f_n] * I_A` is measurable.
Let `{ f_lambda: lambda in Lambda }` be a family of mappings from `Omega_1 -> Omega_2`, and `F_2` a `sigma`-algebra on `Omega_2`, then `sigma(: {f_lambda^{-1} (A) : A in F_2, lambda in Lambda} :)` is called the `sigma`-algebra generated by `{f_lambda: lambda in Lambda}`. This is the smallest `sigma`-algebra such that all `f_lambda` are measurable.
Let `{f_lambda: lambda in Lambda}` be an uncountable collection of maps `Omega_1 -> Omega_2` then `AA B in sigma(: {f_lambda: lambda in Lambda} :)` then exists a countable set `Lambda_B in Lambda` st. `B in sigma(: {f_lambda: lambda in Lambda_B} :)` (ie. a countable cover)
Induced measures, distribution functions
Suppose X is an rv defined on `(Omega, F, P)`, then `P` governs probabilities assigned to events like `X^(-1)[a,b]`. Since `X` takes values in the real line, it should be possible to express such probabilities as function of `[a, b]`, eg. `P_X(A) = P(X^(-1)(A))`. Is this a probability measure?
Let `(Omega_i, F_i)`, `i=1,2` be measurable spaces, and let `T: Omega_1 -> Omega_2` be a `(: F_1, F_2 :)` measurable function. Then for any measure `mu` on `(Omega_1, Omega_2)`, `mu_T = mu T^(-1) (A) = mu(T^(-1) (A))` is a measure on `F_2`.
For a rv `X` defined on `(Omega, F, P)` the probability distribution of X is the induced measure of `X` under `P` on `RR`.
The cumulative distribution function (CDF) = `F_X(x) = P_X(-oo, x] = P{omega: X(omega) <= x}`, and has the following properties: (any function with these properties is called a cdf)
- `x_1 < x_2 => F(x_1) <= F(x_2)`
- right continuous
- `F(oo) = 1`, `F(-oo) = 0`
Given any cdf it is possible to construct a probability space and a random variable st. it is the cdf of X. (eg. `(RR, B(RR), mu_F)`, `f(x) = x`).
A random variable is discrete if there exists a countable set `A sub RR` st `P(X in A) = 1`, or continuous if `P(X=x)=0 AA x in RR`. The decomposition theorem states that any cdf can be written as the weighted sum of a continuous and a discrete cdf.
Generalisation to higher dimensions is straightforward, eg. for `k=2`
- `AA x = (x_1, x_2), y = (y_1, y_2) => F(y_1, y_2) - F(x_1, y_2) - F(y_1, x_2) + F(x_1, x_2) >= 0` (monotone)
- right continuous
- `F(oo, oo) = 1`, `F(-oo, a) = F(a, -oo) = 0`
- `F(a, oo) = P(X_1 <= a)`
Integration
Let `(Omega, F, mu)` be a measure space, and `f: Omega -> R` be a measurable function.
A function `f: Omega -> barRR` is called simple if there exists a finite set `{c_1, c_2, ..., c_k} in barRR` and sets `A_1, A_2, ..., A_k in F, k in NN` st. `f = sum c_i I_(A_i)`. The intergal of a simple function `int_Omega f dmu = sum c_i mu(A_i)`. Note: `0 <= int f dmu <= oo`
If `f` and `g` are two simple non-negative functions then:
- linearity: for `alpha,beta >= 0` `int (alpha*f + beta*g) dmu = alpha int f dmu + beta int g dmu`
- monotonicity: if `f >= g "a.e." (mu)` then `int f dmu >= int g dmu`
- if `f = g "a.e." (mu)` then `int f dmu = int g dmu`
Can extend this definition to any non-negative function by discretising and taking limits. (Need to confirm that all admissable sequences take the same form) ie. `lim_(n->oo) int f_n dmu = int lim_(n->oo) f_n dmu = int f mu`
The monotone convergence theorem proves that `int f dmu = lim_(n->oo) int f_n dmu`, for any increasing sequence of `f_n`. Corollaries:
- `int sum h_n dmu = sum int h_n dmu`
- `nu(A) = int f I_A dmu` is a measure.
Can extend to any function by defining `f = f^+ - f^-` (where `f^+ = max(0, f)`, `f^- = max(0, -f)`), or more generally `f = f_1 + f_2` for some `f_1, f_2 >= 0`. Said to be integrable if `int |f| < oo`.
Theory | Hypotheses | Results |
---|---|---|
MCT | `f_n >= 0` `f_n uarr f "ae" (mu)` |
`int f_n uArr int f` |
Fatou | `f_n >= 0` | `ullim int f_n >= int ullim f_n` |
LDCT | `|f_n| <=g` `g in L^1(mu)` `f_n -> f` |
`f in L^1(mu)` `int f_n -> int f` `int|f_n - f| -> 0` |
BCT | `mu(Omega) < oo` `|f_n| <= K` `f_n -> f` |
`f in L^1(mu)` `int |f_n - f| -> 0` |
`L^p` spaces
- `L^p(Omega, F, P) = {f: int |f|^p dmu < oo}`
- `L^oo(Omega, F, P) = {f: mu( {|f| >} ) = 0 "for some" K in R^+)}`
If `f` and `g` are integrable (ie. `in L^1(Omega, F, P)`) then:
- linearity: for `alpha,beta in RR` `int (alpha*f + beta*g) dmu = alpha int f dmu + beta int g dmu`
- monotonicity: if `f >= g "a.e." (mu)` then `int f dmu >= int g dmu`
- if `f = g "a.e." (mu)` then `int f dmu = int g dmu`
Riemann and Lebesgue integrals
Let `f` be a real value function bounded on a bounded interval. Let `P = {x_0, x_1, ..., x_n}` be a finite partition of `[a,b]`, and `Delta = Delta(P) = max{(x_1 - x_0), ...}` be the diameter of `P`. Let `M_i = spr{f(x): x_i <= x <= x_(i+1)}` and `m_i = inf{f(x): x_i <= x <= x_(i+1)}`.
The upper and lower Riemann sums of `f` wrt `P` are defined as:
- `U(f, P) = sum M_i * (x_(x+1) - x_i)`
- `L(f, P) = sum m_i * (x_(x+1) - x_i)`
The upper and lower Rieman integrals are defined as:
- `bar int f = inf_(P in sfP) U(f, P)`
- `ul int f = spr_(P in sfP) U(f, P)`
Where `sfP` is the set of all partitions.
`f` is said to be Riemann-integrable if `bar int f = ul int f`. `f` is a bounded function on a bounded interval. `f` is Riemann integrable on `[a,b]` if `f` is continuous on `[a,b]` and `f` is Lebesgue integrable and the two integrals are equal.More on convergence
Name | Symbol | Definition |
---|---|---|
Pointwise | `f_n -> f` | `lim_(n->oo) f_n(omega) = f(omega) AA omega in Omega` |
Almost everywhere | `f_n -> f "ae" mu` | `lim_(n->oo) f_n(omega) = f(omega) AA omega in B^c, mu(B) = 0` |
In measure | `f_n stackrel m -> f` | `lim_(n->oo) mu{|f_n - f| > epsi} = 0` |
In `L^p` | `f_n stackrel (L^p) -> f` | `lim_(n->oo) int |f_n - f|^p = 0` or `{:|| f_n - f||:}_p -> 0` where `{:|| g ||:}_p = (int |g|^p dmu)^(1/p)` |
Uniformly | `lim_(n->oo) spr_{omega in Omega} {|f_n(omega) - f(omega)|} = 0` | |
Nearly (almost) uniformly | `AA epsilon > 0 EE A in F "st" mu(A) lt epsilon` and on `A^c` f converges uniformly |
- If `mu(Omega) < oo` then convergence almost everywhere implies convergence in measure
- If `f_n` converges in measure, then there is some subsequence that converges almost everywhere
- If `f_n` converges in `L^p` `f_n` converges in measure
- If `f_n >= 0, f_n in L^1, int f_n -> int f` then `f_n -> f "ae"` or `f_n stackrel m -> f` implies convergence in `L^p`
Uniform integrability
Let `a_f(t) = int_{|f| > t} |f|`. If `f in L^1` then `a_lambda(t) -> 0`, that is `AA epsilon > 0 EE t_epsilon "st" t >= t_epsilon => a_f(t)`. A collection of functions `{f_lambda : lambda in Lambda}` is uniformly integrable if `AA epsilon > 0 EE t_epsi "st" t > t_epsi => spr_(lambda in Lambda) a_(f_lambda)(t) <= epsilon`.
- If `|Lambda| < oo` then `{f_lambda: lambda in Lambda}` is UI
- If `K = spr{ int |f_lambda|^(1+epsilon) dmu: lambda in Lambda} < oo` for some `epsilon > 0` then `{f_lambda: lambda in Lambda}` is UI
- If `|f_lambda| <= g "ae" (mu)` and `g in L^1(mu)` then `{f_lambda: lambda in Lambda}` is UI
- If `{f_lambda: lambda in Lambda}` and `{g_gamma: gamma in Gamma}` are UI, then so it `{f_lambda + g_gamma: lambda in Lambda, gamma in Gamma}` is UI
- If `{f_lambda: lambda in Lambda}` is UI and `mu(Omega) < oo` then `spr_{lambda in Lambda} int |f_lambda| dmu < oo`
If `mu(Omega) < oo` and let `{f_n : n >= 1} sub L^1` st `f_n -> f "ae" (mu)`, f measurable. If `{f_n}_(n>=1)` is UI then `lim_(n->oo) int |f_n - f| dmu = 0`
Ergorov’s theorem: Let `f_n -> f "ae" (mu)`, `mu(Omega) < oo`, then `f_n -> f` almost uniformly.
`L^p` spaces
Inequalities
Markov's inequality (T3.1.1, p58) | |
`f` non-negative measurable function `0 < t < oo` |
`mu{f >= t} <= (int f dmu)/t` |
(C3.1.2, p58) | |
X an RV on `(Omega, F, P)` | `P(|X| >= t) <= (E|X|^r)/(t^r)` |
Chebychev's inequality (C3.1.3, p58) | |
X an RV `E[x] = mu` `"Var"(X) = sigma^2 < oo` |
`P(|X - mu| >= k sigma) <= 1/(k^2)` |
(C3.1.4, p58) | |
`phi: RR^+ -> RR^+`, non-decreasing | `P(|X| >= t) <= (E phi|X|)/(phi(t))` |
Cramer's inequality (C3.1.5, p58) | |
X an RV, `t>0` | `P(|X| >= t) <= inf_(theta >0) (E(e^(theta |X|)))/(e^(theta t))` |
A function `phi: (a,b) -> RR` is called convex if `AA 0 <= lambda <= 1, a < x <= y < b` `phi(lambda x + (1-lambda)y <= lambda phi(x) + (1-lambda)phi(y)` (chords rorate counterclockwise). Or equivalently ` (phi(x_2) - phi(x_1))/(x_2-x_1) <= (phi(x_3) - phi(x_2))/(x_3-x_2)` for `a < x_1 < x_2 < x_3 < b`.
- left (`phi_-^'`) and right (`phi_+^'`) derivatives exist and are finite
- and are both non-decreasing
- `phi'(x)` exists except on countable set of discontinuity points
- for any `a < c < d < b`, `|phi(x) - phi(y)| <= K|x-y|` for some `K in RR`, aka uniform continuity over every subinterval
- `a < x, x < b`, `phi(x) - phi(c) => phi_+^'(c)(x-c)` and `phi(x) - phi(c) => phi_-^'(c)(x-c)`
If `phi` is twice differentiable on `(a,b)`, and `phi''(x) >= 0` then `phi` is convex.
- `sum p_i exp(a_i) >= exp(sum p_i a_i)`
- arithmetric sum `>=` geometric sum
Holder's inequality T3.1.11, p61 | |
`1 < p < oo, 1/p + 1/q = 1` `f in L^p(mu), g in L^q(mu) |
`||fg|| <= {: ||f|| :}_p {: ||g|| :}_q` |
Cachy-Schwarz inequality C3.1.12, p62 | |
`f,g in L^2(mu)` | `||f g|| <= ||f||_2 ||g||_2` |
C3.1.13, p62 | |
`1 < p < oo`, `f,g in L^p(mu)` | `sum |a_i b_i| c_i <= (sum |a_i|^p c_i)^(1/p) (sum |b_i|^q c_i)^(1/q)` |
Minkowski inequality C3.1.14, p62 | |
`1 < p < oo`, `f,g in L^p(mu)` | `||f + g||_p <= ||f||_p + ||g||_p` |
`L^p` spaces
`L^p` is a vector space. Usual metric is `d_p(f,g) = ||f-g||_p`. If `f` is equivalent to `g` if `f=g "ae" (mu)`. Partitions `L^p(mu)`. Forms a complete metric space.Dual spaces
`1 <= p < oo`, `1/p + 1/q = 1`, `g in L^q(mu)`, `T_g(f) = int fg du`, `f in L^p(mu)`. Clearly `T_g` is linear, and such a function `L^p(mu) -> RR` is called a *linear functional- bounded if `EE c in (0, oo)` st. `|T(f)| <= |c| ||f||_p`
- the norm of a linear functional is defined as `||T|| = spr{ |Tf|: f in L^p(mu), ||f||_p = 1}`
- `|T_g(f)| <= ||g||_q ||f||_p` by Holder’s inequality
- `T_g` is uniformly continuous of the metric space `(L^p(mu), d_p)`.
The set of all conitunous linear functional on `L^p(mu)` is called the dual space of `L^p(mu)` and denoted `L^p(mu)^**`.
Riesz representation theorem T3.2.3, p57 | |
`1 <= < oo` `1/q + 1/p = 1` `T: L^p(mu) -> RR` linear and continuous |
`EE g in L^q(mu) "st" T(f) = T_g(f)`. Not valid for `p=oo`. |
Banach and Hilbert spaces
Banach spces
A Banach space is a complete normed vector space. All `L^p(mu)` spaces are Banach spaces. A closed subspace of a Banach space is also a Banach space.
A norm must satisfy:
- `v_1, v_2 in V => ||v_1 + v_2|| <= ||v_1|| + ||v_2||`
- `alpha in RR, v in V => ||alpha v|| = |alpha| ||v||`
- `||v|| = 0 iff v = 0`.
A linear transformation is a function `f: V_1 -> V_2` if `a_1, a_2 in RR, x,y in V_1 => T(alpha_1 x + alpha_2 y) = alpha_1 T(x) + alpha_2 T(y)`.
Hilbert spaces
A vector space is a real innerproduct space if `EE f: V xx V -> RR`, denoted by `f(x,y) = (:x,y:)`, that satisfies:
- `(: x,y :) = (: y,x :) AA x,y in V`
- `(: alpha_1 x_1, alpha_2 x_2 :) = alpha_1 (: x_1, y :) + alpha_2 (: x_2, y :)`
- `(: x,x :) >= 0 AA x in V`, equality `iff x = 0`
A Hilbert space is a complete real inner product space. Every Hilbert space is a `L^2(Omega, F, mu)` space for some `(Omega, F, mu)`. Called separable if `EE` dense countable subset.
Orthogonal vectors `x _|_ y iff (: x, y :) = 0`. Orthnormal if `||x|| = 1`. If `B sub H` is orthogonal and `H` is separable, then `B` is at most countable. Can convert any set to an orthonormal set using Gram-Schmidt organalisation.
The fourier coefficients of a vector `x in V` wrt ON `B sub V` is `{(:x, b:): b in B}`.
Bessel’s inequality: `{b_i}_{i>=1}` ON in an IPS `(V, (: *, * :))`, `AA x in V sum (: x, b_i :)^2 <= ||x||^2`.
Following are equivalent:
- `B` is complete
- `B` is maximal (ie. `B sub B' => B = B'`
- `B` is an ON basis for `H` (the linear space of `B` is dense in `H`)
- `AA x EE {b_i}_{i>=1} sub B => ||x||^2 = sum (: x, b_i :)^2`
Riesz representation theorem: `H` a separable Hilbert space, then every bounded linear functional `T: H -> RR` can be represented as `T ~= T_(x_0) = (: y, x_0 :)` for some `x_0 in V`.
Differentiation
The Lebesgue-Radon-Nikodym theorem
Let `(Omega, F)` be a measurable space and let `mu` and `nu` be two measures on `(Omega, F)`. `mu` is dominated by `nu`, `mu << nu` if `nu(A) = 0 => mu(A) = 0 AA A in F`.
Let `f` be a non-negative measurable function, `mu(A) = int_A f dnu AA A in F`, then `mu` is also a measure and `mu << nu`.
`mu` is singular wrt to `nu`, `mu _|_ nu` if `EE B in F "st" mu(B) = 0 and v(B^c) = 0`. If `mu` and `nu` are mutally singular, then for `B` as above `mu(A) = mu(A nn B^c)` and `nu(A) = nu(A nn B)`.- Lebesgue decomposition theorem: `mu` can be uniquely decomposed as `mu = mu_a + mu_s` where `mu_a << nu` and `mu_s _|_ nu` where `mu_a` and `mu_s` a `sigma`-finite.
- Radon Nikodym theorem: there exists a non-negative measurable function `h` on `(Omega, F)` st. `mu_s(A) = int_A h dnu AA A in F`
Let `h` be a non-negative measurable function st `mu(A) = int_A h dnu`, then `h` is called the Radon-Nikodym derivative` of `mu` wrt `nu` and is written `dmu/dnu`.
Let `nu, mu_1, mu_2, ...` be `sigma`-finite measures, then:
- `mu_1 << mu_2, mu_2 << mu_3`, then `mu_1 << mu_3` and `dmu_1/dmu_2 = dmu_1/dmu_2 dmu_2/dmu_3 ae (mu_3)`
- `mu_1 << mu_3`, `mu_2 << mu_3`, then `AA a,b in RR` `a mu_1 + beta mu_2 << mu_3` and `d(alpha mu_1 + beta mu_2)/dmu_3 = alpha dmu_1/dmu_3 + beta dmu_2/dmu_3 ae (mu_3)`
- `mu << nu` and `dmu/dnu > 0 ae (nu)` then `nu << mu` and `dnu/dmu = (dmu/dnu)^(-1)`
- Let `{mu_n}_(n>=1)` be a sequence of measures and `{a_n}_(n>=1) sub RR^+`, define `mu = sum a_n mu_n` then `mu << nu iff mu_n << nu AA n` and `dmu/dnu = sum a_n dmu_n/dnu ae (nu)`, `mu _|_ nu iff mu_n |_ n AA n >=1`
Signed measures
Let `mu_1, mu_2` be finite measures on MS `(Omega, F)`. Let `nu(A) = mu_1(A) - mu_2(A) AA A in F`. A finite signed measure satisfies:
- `nu(o\) = 0`
- `v(A) = sum v(A_i)`, `A_i` a partition of `A`
- `||nu|| = spr_"all partitions" {sum |v(A_i)|} < oo`.
Let `nu` be a finite signed measure and `|nu|(A) = spr_"all partitions" {sum |v(A_i)|}`, then `|nu|` is a finite measure.
A set function `nu` is a finite signed measure iff `EE mu_1, mu_2 "st" nu = mu_1 - mu_2` OR `EE mu` and `f in L^1(mu)` st `AA A in F v(A) = int_A f dmu`.
`A in F` is called a *negative set for `nu` if for any `B sub A` `nu(B) <= 0`.Hahn decomposition theorem: `nu` a finite signed measure, then `EE` a positive set `Omega^+` and negative set `Omega^-` st `Omega = Omega^+ uu Omega^-` and `Omega^+ nn Omega^- = O/`. Jordan decomposition theorem: `nu = nu^+ - nu^_`.
`S = {nu: nu "is a finite signed measure"}` is a vector space, with total variation norm `||nu|| = |nu|(Omega)`.Functions of bounded variation
`f: [a,b] -> RR`, then for some partition `Q`:- positive variation of f `P(f, Q) = sum (f(x_i) - f(x_(i+1))_+`
- negative variation of f `N(f, Q) = sum (f(x_i) - f(x_(i+1))_-`
- total variation of f `P(f, Q) = sum |f(x_i) - f(x_(i+1)|`
And for without partition take supremum over all possible partitions. `f` is said to be of bounded variation if `T(f, [a,b]) < oo`. If `f in BV[a,b]` and `f_1 = N(f, [a,b]), f_2 = P(f, [a,b])` then `f_1, f_2` are non-decreasing and `f(x) = f_1(x) - f_2(x)`. `f in BV[a,b]` iff `EE` a finite signed measure `mu` on `(RR, B(RR)` st `f(x) = mu[a,x]`.
Absolutely continuous functions
A function is absolutely continuous if `AA epsi > 0 EE delta > 0 "st" I_j[a_j,b_j] "disjoint" and summ (b_j - a_j) < delta => sum |F(b_j) - F(a_j)| < epsi`. By mean value theorem, F differentiable and `F'` bounded, then F is ac.
A function `F:[a,b]->R` is absolutely continuous if `barF = F(a) I(x < a) + F(x) I(a<=x lt b) + F(b)I(x>=b)` is ac.
Fundamental theorem of Lebesgue integral calculus: `F: [a,b] -> RR` is ac iff there is a function `f: [a,b] -> RR` st `f` is Lebesgue measurable and integrable and `F(x) = F(a) + int_[[a,x]] f d_(mu_L) AA a<= x <= b`.
Product measures
Product spaces and product measures
Given two measure spaces `(Omega_i, F_i, mu_i)` is it possible to construct a measure `mu_1 xx mu_2` on the product space `Omega_1 xx Omega_2` st `mu(A xx B) = mu(A) * mu(B)` for `A in F_1, B in F_2`?
- `Omega_1 xx Omega_2` is the Cartesian product of `Omega_1` and `Omega_2`
- the set `A_1 xx A_2`, `A_1 in F_1, A_2 in F_2`, is called a measurable rectangle, `C = ` all measurable rectangles
- the product `sigma`-algebra is `F_1 xx F_2 = sigma (: {A_1 xx A_2: A_1 in F_1, A_2 in F_2} :) = sigma(: C :)`
Starting with `mu` defined on `C`, `mu(A_1 xx A_2) = mu_1(A_1) * mu_2(A_2)`, we can use the extension procedure to extend `mu` to all `F_1 xx F_2`. Another approach, which allows us to calculate the values directly, proceeds as follows.
- `A_(omega_1) -= {omega_2 in Omega_2 : (omega_1, omega_2) in A}`, called the `omega_1`-section of A and is in `F_2`
- If `f: (Omega_1 xx Omega_2) -> Omega_3` is a `(: F_1 xx F_2, F_3 :)` measurable function, then the `omega_1`-section of `f` is `f_(omega_1)(omega_2): Omega_2 -> Omega_3 = f(omega_1, omega_2), omega_2 in Omega_2` and is `(: F_2, F_3 :)`
- `mu_1(A_(omega_2)) and mu_2(A_(omega_1))` are `F_1` and `F_2` measurable
- `mu_12(A) = int_(Omega_1) mu_2(A_(omega_1)) mu_1(d omega_1)` and `mu_21(A)` are measures on `F_1 xx F_2` and `mu_21(E) = mu_12(E) AA E in F_1 xx F_2`
- `mu_12 = mu_21 -= mu` is `sigma`-finite and is the only measure satisfying `mu(A_1 xx A_2) = mu_1(A_1)*mu_2(A_2) AA A_1 xx A_2 in C`
- the unique measure on `F_1 xx F_2` is called the product measure and is denoted `mu_1 xx mu_2`
- the measure space `(Omega_1 xx Omega_2, F_1 xx F_2, mu_1 xx mu_2)` is called the product measure space
Product space may not be complete, even if both original measures spaces are complete.
Fubini-Tonelli Theorems
Integral over product space can be treated as iterated integral if `f: Omega_1 xx Omega_2 -> R^+` (Tonelli’s therem) or `f in L_1(mu)` (Fubini’s theorem)
Tonelli: Let `(Omega_i, F_i, mu_i)` be `sigma`-finite spaces, `f` non-negative. Then:
- `g_1(omega_1) = int_(Omega_2) f(omega_1, omega_2) mu_2(d omega_2) : Omega_1 -> R^+` is `(: F_1, B(bar RR) :)` measurable
- `int_(Omega_1 xx Omega_2) f dmu = int_(Omega_1) g_1 dmu_1 = int_(Omega_2) g_2 dmu_2`
Fubini: If `f in L^1(mu)` then there exist sets `B_i in F_i` st
- `mu_i(B_i^c) = 0` for `i = 1,2`
- for `omega_1 in B_1, f(omega_1, *) in L_1(Omega_2, F_2, mu_2)`
- `g_1(omega_1) = int_(Omega_2) f(omega_1, omega_2) I_(B_1)(w_1) mu_2(d omega_2)` is measurable and `int_(Omega_1) g_1 dmu_1 = int_(Omega_1 xx Omega_2) f d(mu_1 xx dmu_2)`
Integration by parts: Let `F_1, F_2` be `uarr`, right-continuous functions on `[a,b]` with no common points of discontinuity. Then `int_((a,b]) F_1(x) dF_2(x)` `= F_1(b)F_2(b) - F_1(a)F_2(b) - int_((a,b]) F_2(x) dF_1(x)`. If `F_1, F_2` are ac with non-negative densities `f_1, f_2` then `int_a^b F_1(x) f_2(x) dx = F_1(b)F_2(b) - F_1(b)F_2(a) - int_a^b F_2(x) f_1(x) dx`. (Can always decompose into two non-negative functions so this also holds for all Lebesgue intergrable functions).
Extension to products of higher order use extension procedure.
Convolutions
Sequences
Given `{a_n}, {b_n} in L^1`, let `c_n = a ** b = sum_0^n a_j b_(n-j)`, then:
- `{c_n} in L^1`
- `sum c_n = (sum a_n)(sum b_n)`
- `C(s) = sum c_n s^n = A(s) * B(s)`
Functions
`f, g in L^1` then:- `h_1(x,y) = f(x-y) g(y)`, `h_2(x,y) = f(x) g(y-x)` are both `RR^2 -> RR`, Borel measurable
- `EE B in B_1(RR) "st" mu_2(B_1^c) = 0, x in B_1 => h_1(x_1, *) in L^1(mu_L)`
- `psi(x) = (int h_1(x,y) dmu_L(dy)) I_(B_1) = (psi(x)) = (f ** g)(x) in L^1(mu)`, and `int psi(x) dmu_L = (int f dmu_L)*(int f dmu_L)`
Measures
`mu_1, mu_2` `sigma`-finite measures on `(RR, B(RR))`, `(mu_1 ** mu_2)(A) = int mu_1(A-y) mu_2(dy)`. `I_A(x+y) = h(x+y) = I_A oo phi : RR^2 -> RR`, `phi(x, y) = x+y`. `(mu_1 ** mu_2) = int int_(RR xx RR) I_A(x+y) mu_1(dx) mu_2(dy)` = `int_R (int_R I_A(x+y) mu_1(dx)) mu_2(dy)` (and vice versa, by Tonelli).Independence
Independent events and random variables
- finite collection of events: independent if P(nnn B_(i_j)) = Pi P(B_(i_j))`
- collection of events: independent if every finite subcollection is independent
- `A` nonempty, `AA alpha in A, G_alpha sub F` a collection of events, family `{G_alpha: alpha in A}`: independent if `AA B_alpha in G_alpha` for `alpha in A`, `{B_alpha: alpha in A}` is independent
- collection of random variables:
- independent if family of `sigma`-algebras is indepedent (if family of `pi`-systems is indepedent, then `sigma`-algebra generated from then will also be indepedent)
- indepedent if joint cdf is always the product of the marginals (if ac, equivalent can replace cdf with pdf)
- indepedent if `E[Pi h_i(X_i)] = Pi E[h_i(X_i)]` for all bounded Borel measurable functions `h_i: RR -> RR`
- If `X_1 and X_2` are independent and `E|X_1|, E|X_2| < oo` then `E|X_1 X_2| < oo` and `E[X_1 X_2] = E[X_1] * E[X_2]`
Borel-Cantelli lemmas, tail `sigma`-algebras and Kolmogorov’s zero-one law
Let `(Omega, F)` be a measurable space, and `{A_n}_(n>=1)` be a sequence of sets, then:
- `barlim A_n = nnn_(k=1)^(oo)(uuu_(n>=k) A_k ) = {omega: omega in A_n "infinitely often"}`
- `ul lim A_n = uuu_(k=1)^(oo)(nnn_(n>=k) A_k ) = {omega: omega in A_n "for all but finitely many" n}`
- If `sum P(A_n) < oo` then `P(barlim A_n) = 0`.
- If `sum P(A_n) = oo` and `{A_n}` are pairwises independent, then `P(barlim A_n) = 1`.
Called the zero-one law.
- If `sum P(|X_n| > epsi) < oo AA epsi > 0` then `P(lim_(n->oo) X_n = 0) = 1`
- If `{X_n}` are pairwise independent, and `P(lim_(n->oo) X_n = 0) = 1` then `sum P(|X_n| > epsi) < oo AA epsi > 0`
The tail `sigma`-algebra of a sequence of random variables `{X_n}` on a ps, is `tau = nnn_(n=1)^(oo) sigma(: {X_j: j >= n} :)`, any `A in tau` is called a tail event, and any `tau`-measurable rv is called a tail random variable. Tail events are determined by behaviour for large n and remain unchanged if any finite subcollection is dropped or changed.
Kolmogorov’s 0-1 law: Let `{X_n}_(n>=1)` be a sequence of independent rv’s on a probabily space `(Omega, F, P)`, and `tau` the tail `sigma`-algebra on `{X_n}`, then `P(A) = 0 or 1 AA A in tau`.
Let `tau` be the tail `sigma`-algebra of `{X_n}_(n>=1)`, and let `X` be a tail random variable `X: Omega -> barRR` then `EE c in barRR "st" P(X = c) = 1`.
Probability spaces
Kolmogorov’s probability model
- finite sample spaces
- countably infinite sample spaces
- uncountable sample spaces: random variables, random vectors, random trajectories
Random variables and random vectors
Change of variable formula: Let `X` be an rv on `(Omega, F, P)`, and `h: RR -> RR` be measurable. Let `Y = h(X)` then:
- `int_Omega |Y| dP = int_RR |h(x)| P_X(dx) = int_RR |y| P_Y(dy)`
- if `int_Omega |Y| dP < oo` then `int_Omega Y dP = int_RR h(x) P_X(dx) = int_RR y P_Y(dy)`
- if `h>=0` relationship still holds even if `EY = oo`
Moments
For any positive integer `n`, the nth moment, `mu_n` of an rv X is defined by `mu_n = EX^n`. The moment generating function is defined as `M_X(t) = E(e^(tX)) AA t in RR`. Since `e^(tX) > 0`, `E(e^(tX))` is well defined. If `X` is a non-negative rv then `M_X(t) = sum_(n=0)^oo (t^n mu_n)/(n!)`.
`X` an rv, `M_X(t) < oo AA |t| < epsi` for some `epsi > 0`, then:- `E|X^n| < oo AA n >= 1`
- `M_X(t) = sum (t^n mu_n)/(n!) AA |t| < epsi`
- `M_X(t)` is infinitely differentiable on `(-epsi, +epsi)` and the `r`th derivative is `E(e^(tX) X^r)`.
If the mgf is finite `AA |t| < epsi, epsi >0` then `M_X(t)` has a power series expansion in `t` around 0, and `mu_n/n!` is simply the coefficient of `t^n`. Note, the MGF uniquely defines the moments, but the moments do not uniquely define the MGF without extra conditions eg. `sum mu_2n^(-1/2n) = oo` or distribution has bounded support.
Product moments
Let `vecX = (X_1, X_2, ..., X_k)` be a random vector. The product moment of order `r = (r_1, r_2, ..., r_k)` `mu_r = mu_(r_1, r_2, ..., r_k) = E(X_1^(r_1) X_2^(r_2) ... X_k^(r_k))`. Similar properties as above apply.
Kolmogorov’s consistency theorem
A stochastic process with index set `A` is a family `{X_alpha : alpha in A}` of random variables defined on a ps. May also be viewed as a random real valued function on the set `A` by the identification `omega -> f(omega, *)` where `f(omega, alpha) = X_alpha (omega)` for `alpha in A`.
The family `{mu_((alpha_1, alpha_2, ..., alpha_k))}(A) = P((X_(alpha_1), X_(alpha_2), ...,, X_(alpha_k) in A) : a_i in A)` of probability distributions is called family of finite dimensional distributions (fdd) associated with `{X_alpha: alpha in A}`. Satisfies the following consistency conditions:
- `mu_((alpha_1, alpha_2, ..., alpha_k))(B_1 xx B_2 xx ... xx B_(k-1) xx RR) = mu_((alpha_1, alpha_2, ..., alpha_(k_1))(B_1 xx B_2 xx ... xx B_(k-1))`
- for any permutation `(i_1, i_2, ..., i_k)` of `(1, 2, ..., k)`, `mu((alpha_(i_1), ..., alpha_(i_k)))(B_(i_1) xx ... xx B_(i_k)) = mu(alpha_1, ..., alpha_k)(B_1 xx ... xx B_k)`
(If `A` is countable, and indexed with `NN`, these conditions are equivalent to: `mu_n` is a pm on `(RR^n, B(RR^n))`, and `mu_(n+1)(B xx RR) = mu_n(B), AA n in NN`)
Given a family of probability distributions `Q_A`, does there exist a real-valued stochastic process `{X_alpha: alpha in A}` st. its fdds coincides with `Q_A`. Kolmogorov’s consistency theorem. Let `A != O/`, `Q_A = {nu_((alpha_1, ..., alpha_k)) : alpha_i in A}` st. `nu_(...)` is a probability distribution on `(RR^k, B(RR^k))` and the consistency conditions hold. Then there exists a probability space and a stochastic process st `Q_A` is the fdds associated with the stochastic process.
In other words, if `Q_A` satisfies the appropriate conditions, there exists `f: A xx Omega -> RR` st `AA omega, f(*, omega)` is a function on `A`, and for each `(alpha_1, ..., alpha_k) in A^k` the vector `(f(alpha_1, omega), ...,, f(alpha_k, omega))` is a random vector with probability distribution `nu(alpha_1, ..., alpha_k)`
Sketch proof
`A = {a}`
`Q_A = {nu_a}`, a single probability distribution. Take `(Omega=RR, F = B(RR), PX^(-1) = nu_a), X(omega) = omega`. Then `X` is an rv on `(Omega, F, P)``A = {a_1, a_2, ..., a_k}, k < oo`
`Q_A = {nu_(a_(alpha_1), ..., a_(alpha_k)): alpha_1, ..., alpha_k in (1,2,...k)}`. Take `(Omega = RR^k, F = B(RR^k), P = nu_(a_1, ..., a_k)), X(omega) = omega`.`A = NN`
`Omega = RR^NN = {omega: omega in (x_1, x_2, ...)}`. `F = sigma (: C :)`, where `C` is the semi-algebra of fd events, where an event is a fd event in `EE n_0 < oo and B in RR^(n_0) “st” A = {omega: omega = (x_1, x_2, ..., x_(n_0)), (x_1, ..., x_(n_0)) in B}, aka a finite dimensional cylinder set.Use the given fd distribution to define `P` on `C`. Then apply extension theorem, checking the conditions on `(C, P)`.
Take `(Omega = RR^NN, F = sigma(: C :), P)), X_n(omega) = x_n if (omega = (x_1, ...))`. Then `{X_n(omega); n=1,...}` is a stochastic process on `(Omega, F, P)` and has fdds `Q_NN`.
`A != O/`
Let `Q_A = ` family of fdds satistfying consistency theorems. Want to construct a ps `(Omega, F, P)` and a family of random variables `{X_alpha: alpha in A}` with `Q_A` as its fdds.
- A subset `D sub RR^A` is a finite dimensional cylinder set if there exists a finite subset `A_1 sub A`, `A_1 = {alpha_1, ..., alpha_k}` and a Borel set `B in B(RR^k)` st `C = {f: f in RR^A and (f(alpha_1),..., f(alpha_k)) in B}`. `B` is called a base for `C`. `C` is an algebra, and if `A` is finite, then `C` is a `sigma`-algebra.
- `R^A = sigma(: C :)`, is called the product algebra on `RR^A`.
- A projection map `pi_(alpha_1, ..., alpha_k)(f) = (f(alpha_1), ..., f(alpha_k)): RR^A -> RR_k` and for `alpha in A` `pi_alpha(f) = f(alpha)` is called a coordinate map. Projection and coordinate maps are measurable.
Let `Omega = RR^A`, and `F = R^A`. Define a set function `P(D) = mu_(alpha_1, ..., alpha_k)(B)` for a `D in C` with representation `D = {omega: omega in RR^A, (omega(alpha_1), ..., omega(alpha_k)) in B}`. Now show that `P(D)` is indepedent of the representation of `D`, and is countably additive of `C`. Then by the extension procedure, there exists a unique extension of `P` to `F` st. `(Omega, F, P)` is a ps. Define `X_alpha(omega) = pi_alpha(omega) = omega(alpha)` for `alpha in A` yields a stochastic process with fdds `Q_A`.
Limitations
`Omega = RR^A` is rather large, and `F = B(RR)^A` is rather small and it can be shown that `F` coincides with the class of events that depend only a countable number of coordinates of `omega`. When `A` is an interval on `RR` this can be overcome (eg.) by restricting `Omega` to continuous functions.Convergence in distribution
Defintitions and basic properties
Let `{X_n}_(n>=0)` be a collection of rv, and let `F_n` denote the cdf of `X_n`. Then `{X_n}_(n>=1)` is said to converge in distribution, or weakly, written `X_n ->^d X_0` if:
- `lim_(n->oo) F_n(x) = F_0(x) quad AA x in C(F_0)` where `C(F_0) = {x in RR: F_0 "continuous at" x}`, or
- `mu_n(a, b] -> mu(a, b]`
Does not require that random vairables be defined on a common PS.
Prop: If `X_n ->^p X_0` then `X_n ->^d X_0`. Converse false in general, but if `X_n ->^d X_0` and `P(X_0 = c) = 1, c in RR`, then `X_n ->^p c`
Prop: If a cdf `F` is continuous on `RR` then it is uniformly continuous on `RR`.
Th: `{X_n}_(n>=0)` a collection of rv, with cdfs `{F_n}_(n>=0)`.
Then `X_n ->^d X` iff there exists a dense set `D sub R` st `lim_(n->oo) F_n(x) = F_0(x) quad AA x in D`.
Polya's th: `{X_n}_(n>=0)` a collection of rvs, with cdfs `{F_n}_(n>=0)`, if `F_0` is continuous on `RR` then `spr_(x in RR) |F_n(x) - F_0(x)| -> 0` as `n->oo`.
Slutysky's th: `{X_n}_(n>=1), {Y_n}_(n>=1)` sequences of rv, st. `(X_n, Y_n)` is defined on a PS `(Omega_n), F_n, P_n)`.
If `X_n ->^d X_0` and `Y_n ->_p a in RR` then
- `X_n + Y_n ->^d X + a`
- `X_n Y_n ->^d aX`
- `X_n/Y_n ->^d X / a` provided `a != 0`.
Asymptotic normality
A special case of `F_n -> F_0`. A seq of rv's `{X_n}_(n>=1)` is said to be asymptotically normal with asymptotic mean `mu_n` and variance `sigma_n^2 > 0` if for sufficient large `n` (`EE n_0 > 0 "st" AA n >= n_0`) `(X_n - mu_n)/(sigma_n) ->^d N(0,1) "as" n->oo`. Write `X_n` as `"AN"(mu_n, sigma_n^2)`.
- `{mu_n}, {sigma_n^2}` are not necessarily the mean and variance of `X_n`, (`X_n` might not have moments)
- if `X_n` is `AN(mu_n, sigma_n^2)` it is uncertain what `x_n` will converge to, eg. if `mu_n = mu = 0`, `sigma_n -> 0`, `X_n ->p 0`.
- if `X_n` is `AN(mu_n, sigma_n^2)` then `x_n` is `AN(bar mu_n, bar sigma_n^2)` iff `(sigma_n)/(bar sigma_n) -> 1` and `(bar mu_n - mu_n)/(sigma_n) -> 0`
- `X_n` is `AN(mu_n, sigma_n^2)` the sequence of asymptotic means and variances are not unique
- `X_n` is `AN(mu_n, sigma_n^2)` then `a_n x_n + b_n` is `AN(mu_n, sigma_n^2)` iff `a_n -> 1` and `(mu_n(a_n -1) + b_n)/(sigma_n) -> 0`
Vague convergence, Helly-Bray theorems and tightness
Bolzano-Weisenstraus th: If `A sub [0,1]` is infinite, then `EE {x_n}_(n>=1) "st" lim_(n->oo) x_n = x` exists in `[0,1]` (but not necessarily in A unless A is closed). There is an anologue of this for sub-probability measures (ie. `mu(RR) <= 1`).
`{mu_n}_(n>=1), mu` subprobability measures on `(RR, B(RR))`. `mu_n ->^v mu` converges vaguely if `EE D sub RR, D "dense"` and `mu_n(a,b] -> mu(a,b] qquad AA a,b in D`. For probability measures, `->^d <=> ->^v`.
Helly's selection th: If `A` is an infinite collection of sub-probability measures on `(RR, B(RR))`.
Then there exists a sequence `{mu_n}_(n>=1) sub A` and a sub-probability measure `mu` st `mu_n ->^v mu`.
Helly-Bray theorem for vague convergence: `{mu}_(n>=1), mu` sub-pm on `(RR, B(RR))`. Then `mu_n ->^v mu` iff `int f dmu_n -> int f dmu quad AA f in C_0(RR)` , where `C_0(RR) = {g | g: RR -> RR " is continuous and " lim_(|x| -> oo) = 0}`.
Helly-Bray theorem for weak convergence: `{mu}_(n>=1), mu` pm on `(RR, B(RR))`. Then `mu_n ->^v mu` iff `int f dmu_n -> int f dmu AA f in C_B(RR) = {g | g: RR -> RR " is continuous and bounded"}`.
Tightness
A sequence of pm's on `(RR, B(RR))` is called tight if `AA epsi > 0 EE M_epsi in (0, oo) "st" spr mu_n[-M,M]^c < epsi`
A sequence of rv's is called tight or stochastically bdd if the sequence of probability dists is tight, ie `AA epsi > 0 EE M_epsi in (0, oo) "st" spr(P|X_n| > M) < epsi`. Denoted `X_n = O_p(1)`.
- `X_n ->^p 0` called stochastically small, `X_n = o_p(1)`
- any finite collection of pms is tight
- property of tightness analogous to notion of boundedness of sequence of real numbers
- a tight sequence may not converge, but will have one or more weakly convergent subsequences
In general, given a stochastic quantity `T_n`, the stochastic order of `T_n - E(T_n)` is determined by the order the variance, `sigma^2`, if it exists.
`P(|T_n - mu_n | /(sigma_n)| > m) = P(|T_n - mu_n| > m sigma_n) <= (sigma_n^2)/(m^2 sigma_n^2) = m^(-2)` (Chebychev).
` AA epsi > 0 quad EE m "st" m^(-2) < epsi^2 P(|T_n - mu_n|/(sigma_n) > m) < epsi => |T_n - mu_n|/(sigma_n) in O_p(1) => T_n - mu_n in O_p(sigma_n)`
If `sigma_n^2 -> 0` then `T_n - mu_n = o_p(1)`.
If `mu_n = 0`, then `T_n = O_p(sigma_n)`
T1.2.8 `{X_n}, {Y_n}` sequences of rv's. `X_n = O_p(1), Y_n = o_p(1)`.
Then:
- `X_n + Y_n = O_p(1)`
- `X_n Y_n = o_p(1)`
T1.2.9 Let `{mu_n}_(n>=1)` be pm's.
`{mu_n}` is tight iff it is relatively compact, ie for all subsequences `{mu_(n_i)}_(i>=1)` there exists a further subsequence `{mu_(m_i)}_(i>=1)` of `{mu_(n_i)}_(i>=1)` and pm `mu` on `(RR, B(RR))` st `mu_(m_i) ->^d mu`.
T1.2.10 `{mu_n}_(n>=1), mu`, pm's on `(RR, B(RR))`. Then `mu_n ->^d mu` iff `{mu_n}` is tight and all weakly convergent subsequences converge to `mu`.
Convergence of probability and sub-probability measures on general metric spaces
- interior: `A^@ = {x in A | EE epsi >0 "st" B(x, epsi) sub A}`
- closure: `barA = {x in S| EE {x_n}_(n>=1) sub A "st" x_n -> x}`
- boundary: `del A = barA - A^@`
- diameter: `delta(A) = spr_(x,y in A){d(x,y)}`
A set `A` is:
- bounded if `delta(A) < oo`
- compact if it is closed and bounded
A sequence is Cauchy is `AA epsi >0 EE N_epsi st. AA n,m > N_epsi, d(n,m) < epsi`
A metric space `(S, d)` is:
- complete if if every Cauchy sequence converges to a point in the space
- separable if `EE` a countable dense set `D sub S`
- Polish if complete and separable
D1.3.1 Let `{mu_n}, mu` be pm on `(S, ccS)`. If `int f dmu_n -> int f dmu quad AA f in C_B(S)` then `mu_n ->^d mu`.
L1.3.1 If `F` is closed in `(S, d)` then ` AA epsi >0 EE f in C_B(S)` st `f(x) = 1 if x in F, f(x)=0 if d(x, F) >= epsi) and f(x) in [0,1] "ow"`. The f can be uniformly continuous.
T1.3.1 Let `{mu_n}_(n>=1)` be pm on `(S, ccS)`. Then the following are equivalent:
- `mu_n ->^d mu`
- `lim_(n->oo) int f dmu_n = int f dmu AA f "bdd and uniformly continous"`
- `bar lim mu_n(F) < mu(F) AA F "closed"`
- `ul lim mu_n(G) > mu(G) AA G "open"`
- `lim_(n->oo) mu_n(B) = mu(B) AA B in ccS "st" mu(del B)=0`
D1.3.2
- A pm on `(S, ccS)` is tight if `AA epsi > 0` there exists a compact `K` st `mu(K) > 1-epsi`.
- Let `{mu_n}_(n>=1)` be a sequence of pms on `(S, ccS)`, `{mu_n}` is tight if `AA epsi > 0 quad EE "a compact" K "st" inf_(n>=1) mu_n(K) > 1- epsi`
- A sequence of random variables is tight if the associated sequence of probability measures is tight
D1.3.3 A family of probability measures `Pi` on `(S, ccS)` is relatively compact if every sequnece of pm's in `Pi` contains a weakly convergent subsequence `{mu_(n_i)}_(i>=1)` and a pm `mu` (not necessarily in `Pi`) st `mu_(n_i) ->^d mu`.
T1.3.3 (Prohorov direct half) For a family of pm's, tightness `=>` relative compactness. T1.3.4 (Prohorov converse half) If `(S, ccS)` is Polish, then relative compactness `=>` tightness.
Skorokhod's construction and continuous mapping theories
Let `F` be a df on `RR`, and for any `0 < p < 1` we define the quartile function `F^(-1)(p) = inf{x | F(x)>=p} = spr{x | F(x) < p}`.
L1.4.1 Let F be a df, then `F^(-1)` is non-decreasing and left-continuous, also saticcying:
- `F^(-1)(F(x)) <= x quad AA x in RR`
- `F(F^(-1)(t)) >t quad AA t in [0,1]`
- `F(x) >= t iff x >= F^(-1)(t)`
L1.4.2 If `F_n -> F` then the set `D = {t | t in [0,1], F_n^(-1) !-> F^(-1)}` is at most countable.
T1.4.3 (Skorokhod). Let `{X_n}_(n>=1)` and `X` be rv's on `(RR, B(RR))` st `X_n ->^d X`.
Then there exists rv's `{Y_n}_(n>=1)` and `Y` on `((0,1), B(0,1), "LM")` st `X_n =^d Y_n` and `X =^d Y` and `Y_n ->^(wp1) Y`
- valid for more general space
- for a df `F`, if `U ~ U(0,1)` then `F^(-1)(U)` is an rv with `F` as its df
- we know `X_n ->^(wp1) X => X_n ->^p X => X_n ->^d X`, T1.4.3 is a converse of this in a sense
Continuous mapping theorems
`f: RR -> RR`, Borel measurable st `P(D_f) = 0`
P1.4.4 If `X_n ->_("or p")^(wp1) X` and then `f(X_n) ->_("or p")^(wp1) f(X)` respectively.
T1.4.5 If `X_n ->^d X` then `f(X_n) ->^d f(X)`.
Convergence of moments
`X_n ->^d X iff Ef(X_n) -> Ef(X) quad AA f in C_B(RR)`. However to ensure `E|X_n|^k -> E|X|^k` we need extra conditions.
D1.5.1 A sequence of random variables `{X_n}_(n>=1)` is uniformly integrable if `lim_(A->oo) spr_n E(|X_n| I(|X_n| > A)) = 0`, or `lim_(A->oo) int_(|X_n| > A) dP = 0` uniformly over `n`.
L1.5.1 A sequence of random variables is u.i. iff:
- `spr_n E|X_n| < oo` and
- `AA epsi >0 EE delta_epsi >0 " st " AA E in ccF P(E) < delta => int_E |X_n| dP < epsi AA n`
L1.5.2 If `EE epsi > 0 "st" spr_n E|X_n|^(1+epsi) < oo` then `{X_n}` is u.i.
T1.5.3 If `X_n ->^d X "in" (RR, B(RR))` and `{X_n}^r` is u.i. for some `r > 0` then:
- `E|X|^r < oo`
- `EX_n^r -> EX^r`
- `E|X_n|^r -> E|X|^r`
T1.5.4 If `X_n ->^d X` and `E|X_n|^r -> E|X|^r < oo, r > 0` then `{X_n}^r` is u.i.
T1.5.5 (Frechet-Shoket). Let `{X_n}` be a sequence of random variables st `EX^k -> m_k
Sufficient conditions for convergence
- `C_B^oo = {f | f "has bdd derivatives of all orders on" RR}`.
- `n_h(x) = 1/(sqrt(2pi)h) e^(-x^2/(2h^2))`
- `f_h(x) = int^oo_(-oo) f(x-y) h_n(y) dy = int_(-oo)^oo h_n(x+y) f(x) dy`
T2.5.1 `f in C_B => AA h > 0 f_h in C_B`. `f in C_BU => f_h -> f` uniformly in `RR` as `h -> 0`.
T2.5.2 Let `{mu_n}` and `mu` be pms of `(RR, B(RR))`. If `AA f in C_B^oo` `int f dmu_n -> int f dmu` then `mu_n ->^d mu`.
Characteristic functions
Definition and basic properties
Let `X` be a random variable on `(RR, B(RR))` with probability measure `mu` and distriubtion function `F`
D2.1.1 The characteristic function of X is: `phi_X(t) = E(e^(itX)) = int_Omega e^(i t omega) dP(omega) = int_R e^(itx) dmu(x) = int_(-oo)^(oo) e^(itx) dF(x) AA t in RR`
- `phi_X(t)` is an integral of a complex function over the real domain
- if `g(x) = g_1(x) + ig_2(x)` then `int g(x) = int g_1(x) + i int g_2(x)`
- every characeristic function has an associated probability measure and random variable
Properties:
- `|phi_X(t)| <= phi_X(0) = 1 quad AA t in RR `
- `bar(phi_X(t)) = phi_X(-t) quad AA t in RR `
- `phi_X` is uniformly continuous on `RR`
- if `{phi_i}` are cf's, and `sum p_i = 1` then `sum p_i phi_i` is a cf
- if `phi_X` is a cf, then so is `bar(phi_X), |phi_X|^2, Re(phi_X)`
P2.1.2 `|phi_X(t_0)| = 1 iff X "is a lattice rv"`
P2.1.3 If `F` is absolutely continuous, then `lim_(|t|->oo) phi_X(t) = 0`
Inversion formula
- `AA y > 0 0 <= sgn(x) int_0^y sin (alpha x) / (x) dx <= int_0^oo (sinx)/x dx`
- `int_0^oo sin (alphax)/x = pi/2 sgn(alpha)`
- `int_0^oo (1-cos (alphax))/(x^2) dx = pi/2 alpha`
T2.2.1 `mu(x_1, x_2) + 1/2(mu{x_1} + mu{x_2}) = lim_(T->oo) int_(-T)^T (e^(-itx_1) - e^(-itx_2))/(it) phi_X(t) dt`
- `int_(-oo)^oo (e^(-itx_1) - e^(-itx_2))/(it) phi_X(t) dt` will only exist if `phi_X` is integrable, in which case we can replace the limit above
C2.2.2 If `x_1, x_2 in C(F)` then `mu(x_1, x_2) = lim_(T->oo) int_(-T)^T (e^(-itx_1) - e^(-itx_2))/(it) phi_X(t) dt`
T2.2.3 (Uniqueness theorem) If 2 pm's `mu_1, mu_2` have the same cf's `phi_1, phi_2` then `mu_1 = mu_2`.
T2.2.4 If `phi_X in L^1` then `F` is absolutely continuous with density `f(x) = 1/(2pi) int_R e^(-itx) phi_X(t) dt) < oo`.
T2.2.6 `AA x_0 in R mu{x_0} = lim_(T->oo) int_(-T)^T e^(itx_0) phi_X(t) dt`.
C2.2.7 `sum_(x in RR) mu{x}^2 = lim_(T->oo) 1/(2T) int_(-T)^T |phi_X(t)|^2 dt`
Convergence theorems and applications
Let `{X_n}_(n>=1), X` be random variables on `(RR, B(RR))` with characteristic functions `{phi_n}_(n>=1), phi)` respectively.
T2.3.1 If `X_n ->^d X` then `phi_n -> phi` uniformly for any finite interval, ie. `spr_(|t| < K) |phi_n(t) - phi(t)| -> 0`
L2.3.1 `AA delta > 0 mu({x | x delta > 2}) <= 1/delta int_(-delta)^delta |1 - phi(x)| dx`
L2.3.2 If `phi_n(t) -> phi(t) < oo AA |t| <= delta_0`, and `phi` is continuous at 0, then `{mu_n}` is tight.
T2.3.4 (Levy-Cramer) if `lim_(n->oo) phi_n(t) = phi(t) in R` and `phi` is continuous at 0, then:
- `mu_n ->^d mu`, `mu` a pm
- `phi` is the characteristic function of `mu`
T2.3.5 `If E|X|^r < oo, r in NN` then `phi_x` is r-times continuously differentiable and `phi_X^((r))(t) = E( (iX)^r e^(itx))`. Conversely, if `phi_X^((r))` exists for even integer `r` then `X` has finite rth order (absolute) moments.
T2.3.6 If `E|X|^r < oo`, `r > 1`, then `phi_X` admits the follow Tayloring expansion around `t=0`:
- `phi_X(t) = sum_(j=0)^r ((it)^j)/(j!) EX^j + o(|t|^r)`
- `phi_X(t) = sum_(j=0)^(r-1) ((it)^j)/(j!) + theta^r/(r!) E|X|^r t^r`, `theta in (0,1)`
P2.3.7 Suppose `{c_n} sub CC -> c ` then `lim_(n->oo) (1+(c_n)/n)^n = e^c`.
Characteristic functions in `RR^k`
D2.4.1 Let `X = (X_1, ..., X_k)` be a random vector on `RR_k` with pm. The characteristic function of `X` is `phi_X(vec t) = E e^(it^T X) = int e^(it^T x) mu(dx_1, ..., dx_2)`.
- inversion formula generalises
- uniqueness theorem generalixses
- multivariate version of convergence is valid
P2.4.1 A pm `mu` on `(RR^k, B(RR^k))` is determined by its values assigned to `ccH = {H_(ac) | H_(ac) = {X in R^k, a'X <= c, AA a in RR^k, c in RR}}`
T2.4.2 (Cramer-Wald). Let `{X_n}` be a sequence of random vectors and `X` an rvec on `(RR^k, B(RR^k))` then `X_n ->^d X iff a'X_n ->^d a'X quad AA a in RR^k`.
Central limit theorems
Liapounov's theorem
D3.1.1 For each `n >= 1`, let `{X_(n_1), X_(n_2), ..., X_(n_k)}` be a collection of rvs on `(Omega_n, F_n, P_n)` such that `X_(n_1), ...` are independent, where `k_n -> oo` as `n->oo`. Then `{X_(nj), 1 <= j <= k_n}_(n>=1)` is called a double array (DA). If `k_n = n`, called a triangular array.
- `S_n = sum_(j=1)^(k_n) X_(nj)`
- `alpha_(nj) = E[X_(nj)]`, `alpha_n = sum a_(nj)`
- `sigma_(nj)^2 = var(X_(nj))`, `sigma_n^2 = sum sigma_(nj)^2`
- `gamma_(nj) = E|X_(nj) - a_(nj)|^3`, `gamma_n = sum gamma_(nj)`
Want to establish `(S_n - a_n)/(b_n) ->^d N(0,1)` for `{a_n}, {b_n} sub RR`.
L3.1.1 Let `{theta_(nj)}` be a DA of complex number such that as `n -> oo`:
- `max_(1<= j <= k) |theta_(nj)| -> 0`
- `S_n <= M < oo quad AA n`
- `S_n -> theta < oo`
then `prod (1 + theta_(nj)) -> e^theta`
Proof: Use fact that `log(1 + z) = sum (-1^(m-1) z^m)/(m!)`. Show that `|log(1 + theta_n) - theta_n|` bdd uniformly by 1, hence `log(1+ theta_j) < theta_(nj) + k|theta_(nj)|^2`. Look at sum, and note first part `-> theta` and second part `-> 0`
T3.1.2 (Liapounov) For a DA `{x_(nj)}`, `gamma_n < oo AA n`, if `(gamma_n)/(sigma_n^3) -> 0` then `(S_n - alpha_n)/(sigma_n) ->^d N(0,1)`
Proof: Let `phi_X(T)` be cdf of `(X_(nj) - alpha_(nj))/(sigma_n)`. Show `phi_X(t) - 1` meets assumptions of L3.1.1, and hence converges to `e^(-t^2/2)`
T3.1.3 `{X_n}_(n>=1)` a sequence of rvs, `gamma_n < oo`, if `(sum gamma_j)/(sigma_n^3) -> 0` then `(S_n - sum alpha_j)/(sigma_n) -> N(0,1)`
C3.1.4 For a DA `{X_(nj)}` if `|X_n - alpha_j| <= M_(nj)` wp1 and `lim_(n->oo) max M_(nj) = 0` then `|(S_n - E(S_n))/(delta_n)| ->^d N(0,1)`.
D3.1.2 A null array saticcies one of the following conditions:
- `AAj lim_(n->oo) P(|X_(nj) - alpha_(nj)| > epsi sigma_n) = 0`
- `lim_(n->oo) max_j P(|X_(nj) - alpha_(nj)| > epsi sigma_n) = 0`
- `lim_(n->oo) P(max_j |X_(nj) - alpha_(nj)| > epsi sigma_n) = 0`
- `lim_(n->oo) sum_j P(|X_(nj) - alpha_(nj)| > epsi sigma_n) = 0`
- `lim_(n->oo) max_j |phi_(nj)(t) - 1| = 0`
Lindeburg-Feller CLT
D3.2.1 A DA satisfies the Lindeburg condition (LC) if `lim_(n->oo) 1/(sigma_n^2) sum_(j=1)^k E(X_(nj)^2 I(|X_(nj)| > epsi sigma_n^2)) = 0 quad AA epsi > 0`
L3.2.1 Let `u(m,n): NN xx NN -> RR` st `AA m lim_(n->oo) u(m,n) -> 0`. Then there exists a `uarr` sequence `{M_n}_(n>=1) -> oo` st `lim_(n>oo) u(M_n,n) = 0`
T3.2.1 (Lindeburg-Feller). For a DA. Assume `Var(X_(nj)) < sigma_(nj)^2 < oo AA n`, `alpha_(nj) = 0`. If LC then:
- `(S_n - alpha_n)/(sigma_n) ->^d N(0,1)`
- the DA is a null array
D3.2.1 A sequence of rv `{X_n}` is m-dependent if `EE m in NN => AA >=1, j >=m => X_n+j "is independent of " F_n=sigma{X_j:1<=j<=n}`. If `m = 0` then `{X_n}` is independent.
T3.2.3 Let `{X_n}` be a sequence of `m`-depedendent uniformly bounded random variables st `(sigma_n)/(n^(1/3)) -> oo` then `(S_n - alpha_n)/(sigma_n) ->^d N(0,1)`.
Proof. Split up into large and small blocks. Show `S_n = S_n^' + S_n^('') + S_n^(''')`, show `(S_n^(''))/(sigma_n) "and" (S_n^('''))/(sigma_n) -> 0`, then `(sigma_n^')/(sigma_n) ->0 " and" (S_n^')/(sigma_n^') - >^d N(0,1)` as LC condition fulfilled.
Functional central limit theorem
D3.3.1 The Weiner measure is a probability measure on `(C, ccC)` corresponding to a stochastic prcoess `X_t quad t in [0,1]` having two properties:
- `W(X_j <= alpha) = 1/sqrt(2 pi t) int_RR e^(-y^2/(2t)) dy` ie ~ N(0,t)
- For a finite number of times `0 <= t_1 < t_2 < ... < t_k = 1` the random variables `X_(t_1) - X_(t_2), ..., X_(t_(k-1))-X_(t_k)` are independent under W (independent increments)
Remarks:
- the stochastic `X = {X_t}_(t>=1)` is called a Weiner process or Brownian motion, commonly denoted `W = {W_t}_(t>=1)`
- `W` is the analogue of `Z`
- a stationary normal proceess has the following properties:
- any finite dimenional distribution `(N_(t_1), ..., N_(t_k))` is k-variate norma
- `mu = E(N_t) AA t`, `sigma^2(h) = Cov(N_t, N_(t+h))`
- `W_t ~ N(0, t)`, `(W_(t_1), ..., W_(t_k)) ~ N(0, Sigma_k)`, `Sigma_k = ((min{t_i, t_j}))_(k xx k)`
T3.3.1 Let `{P_n}_(n>=1)` and `P` be probability measures. If all finite dimensional distributions of `P_n` converge to `P` and `{P_n}` is tight, then `P_n ->^d P`
T3.3.2 Consider `{X_n}` and `X` continous random functions on `(C, ccC)` and if
- `(X_(nt_1), ..., X_(nt_k) ->^d (X_(t_1), ..., X_(t_k))`
- `{X_n}` is tight
then `X_n ->^d X`
T3.3.3 If `{X_n0}_(n>=1)` is tight and `AA epsi, eta > 0 EE delta in (0,1) "and" n_0 "st" AA n >= n_0 1/delta P(sup_(t <= s <= t+delta) |X_n(s) - X_n(t)| > epsi) < eta AA t in [0,1]` then `{X_n}` is tight.
C3.3.4 Let `{X_n}` and `X` be random functions on `(C, ccC)`, if
- `(X_(nt_1), ..., X_(nt_k)) ->^d (X_(t_1), ..., X_(t_k))`
- `AA epsi, eta > 0 EE delta in (0,1) "and" n_0 "st" AA n >= n_0 1/delta P(sup_(t <= s <= t+delta) |X_n(s) - X_n(t)| > epsi) < eta AA t in [0,1]`
then `X_n ->^d X`
Donska. `Z_n ->^d W`. Mapping can be extended to `(C, ccC)`
Conditional expectation and probability
Conditional expectation
D4.1.1 Let `(Omega, ccF, P)` be a ps, and `ccG sub ccF` a sub-`sigma`-field. Let `X` be a random variable with `E|X| < oo`, then the conditional expectation of `X` given `ccG`, `E[X|ccG] : Omega -> RR` and:
- `E(X|ccG)` is `ccG`-measurable
- `int_A E(X|ccG) dP = int_A X dP quad AA A in ccG`
Remarks:
- `Omega in ccG` so `E(E(X|ccG)) = E(X)`
- any `ccG` measurable function which differs from `E(X|ccG)` on a set of measure 0 also qualifies as a conditional expection of `X`
P4.1.1 For any rv `X` with `E|X| < oo` and a sub-`sigma`-field `ccG`, `E(X|ccG)` exists and is unique wp1.
P4.1.2 If `X` an rv `E|X| < oo`, `ccG` a sub `sigma`-field.
If `Z` is an integrable `ccG`-measurable rv st for a `pi`-class `D` with `sigma(D) = ccG` `EX = EZ` and `int_A Z dP = int_A X dP quad AA A in D` then `Z = E(X|ccG) "wp1"`.
C4.1.3 Let `ccG_1, ccG_2` be sub `sigma`-fields of `ccF` and `X, Y` be integrable rvs.
- if `sigma(X)` and `ccG_1` are indepdendent, then `E(X|G_1) = E(X) "wp1"`
- if `int_(A_1 nn A_2) X dP = int_(A_1 nn A_2) Y dP quad AA A_i sub ccG_i`, then `E(X|sigma(ccG_1 uu ccG_2)) = E(Y | sigma(ccG_1 uu ccG_2))`
P4.1.4 (Properties of CE). Let `X, Y` be random variables on `(Omega, ccF, P)`, `ccG sub ccF` a sub-`sigma`-field. Then:
- `E(1 | ccG) = 1`
- `E(X | ccG) >= 0 "wp1"` if `X >= 0 "wp1"`
- `E(cX | ccG) = c E(X|ccG) "wp1"` if `|EX| <= oo`, `c in RR`
- `E(X + Y | ccG) = E(X | ccG) + E(Y | ccG) "wp1"` if `E|X+Y| <= oo`
- `E(X|ccG) = X "wp1"` if `X` is `ccG`-measurable
- If `ccG_1 sub ccG_2 sub F`, `|EX| <= oo` then `E(E(X|G_2)|G_1) = E(X|G_1) = E(E(X|G_1)|G_2)`
T4.1.5 `X` an rv, `|EX| < oo` and `ccG sub ccF`. If `Y` is a finite valued `ccG`-measurable random variable st `|E(X|Y)| <= oo`, then `E(XY | ccG) = Y E(X | ccG) "wp1"`.
T4.1.6 `{X_n}`, Y be random variables with `E|Y| < oo` and `ccG sub ccF` on `(Omega, ccF, P)` then:
- (MCT) If `Y <= X_n uarr "wp1"` then `E(X_n | ccG) -> E(X | ccG) "wp1"`
- (Fatou) If `Y <= X_n quad AA n >=1 "wp1"` then `E(ul lim X_n | ccG) <= ul lim E(X_n | ccG) "wp1"`
- (DCT) If `X_n ->^"wp1" X` and `|X_n| <= |Y| "wp1" quad AA n >= 1` then `E(X_n | ccG) -> E(X | ccG) "wp1"`
T4.1.7 (Jensen). `Y` a `ccF`-measurable with `|EY| <= oo` and `g` be a finite convex function on `RR` with `|Eg(Y)| <= oo`. If for some `sigma`-field `ccG`:
- `X = E(Y | ccG) "wp1"` OR
- `X` is `ccF`-measurable with `X <= E(X|ccG) "wp1"` and `g uarr`
then `g(X) <= E(g(X) | ccG "wp1"`
D4.1.2 If `EY^2 < oo`, the conditional variance of `Y` given sub-`sigma`-field `ccG` `Var(Y | ccG) = E(Y^2 | ccG) + E^2(Y | ccG)`
T4.1.8 Let `EY^2 < oo` then `Var(Y) = Var(E(Y| ccG)) + E(Var(Y|ccG))`
Conditional probability
D4.2.1 Let `(Omega, ccF, P)` be a ps. For a `B in ccF` and sub `sigma`-field `ccG sub ccF`, the conditional probability of `B` given `ccG`, `P(B | ccG) = E(I_B | ccG)`.
Remarks:
- conditional probability is a just a conditioanl expection of an indicator rv, satisfying:
- `P(B | ccG)` is `ccG`-measurable
- `int_A P(B | ccG) dP = int_A I_B dP = int I_(A nn B) dP quad AA A in ccG`, ie. `P(A nn B) = E( P(B|ccG)I_A)`
The properties of conditional expectation lead to the following properties of conditional probability:
- `0 <= P(A | ccG) <= 1 "wp1"`
- `P(A|ccG) = 0 "wp1" iff P(A)=0`
- `P(A|ccG) = 1 "wp1" iff P(A)=1`
- If `{A_n} sub ccF` are disjoint sets, then `P(UU_i A_i | ccG) = sum_i P(A_i | ccG) "wp1"`
- If `A_n in F quad n >= 1` and `lim_(n->oo) A_n = A` then `lim_(n->oo) P(A_n | ccG) = P(A | ccG) "wp1"`
Above suggests that `mu_(ccG)(*) = P(* | ccG)` is sub-additive wp1 for a given set of `{A_i}`. However, this may not be subadditive in general wp1, as the probability 1 set may change from set to set. Hence we may not be able to find a common probability 1 set `=> P(* | ccG)` may not be a pm on `ccF`.
D4.2.2 Let `ccF_1, ccG` be sub-`sigma`-fields of events in `(Omega, ccF, P)`. A regular conidtional probability on `F_1|ccG` is a function `mu: ccF_1 xx Omega -> [0,1]` satisfying:
- `AA B in ccF_1`, `mu(B, omega) = P(B | ccG)(omega)`
- `AA omega in Omega`, `mu(*, omega)` is a pm on `ccF_1`
T4.2.1 Let `P_omega(A) := P(A, w)` be a regular conditional probability. Given `ccG`, `X` is `ccF`-measuralbe with `|EX| <= oo` then `E(X|ccG)(omega) = int_Omega X dP_omega`
Fundamentals of statistical inference
Basic concepts
- underlying probability space: `(Omega, ccF, P)`
- sampling probability space: `(bbX, ccx, ccP^x)`
- statistics: `(RR^k, B(RR^k))`
Parametric vs non-parametric porbability models:
- if `ccP^x = {P_theta, theta in Theta sub RR^d, d in NN^+}` then `ccP^x` indexed by a finite-dimensional paramter, and `Theta` is called the parameter space
- `ccP^x` is non-parametric if it can't be indexed by a finite-dimensional parameter
D5.1.1 (Exponential family). A parameter family of pms `{P_theta}_(theta in Theta)` is dominated by a `sigma`-finite measure `nu` on `(bbX, ccx)` is called an exponential family iff `(dP_theta)/(dnu)(omega) = exp[ (eta(theta))^T T(omega) - zeta(theta)] h(omega)`, `omega in bbX`, where:
- `T` is a random venctor on `RR^p` where `p in NN^+`
- `eta: Theta -> RR^p`
- `h: bbX, ccx -> RR` is non-negative, measurable
- `zeta: Theta -> RR` is a normalising constant to make the RHS a real density
Remarks:
- often `nu` is counting or Lebesgue measure
the representation is not unique as any transformation `tilde eta(theta) = D eta(theta)`, `D` `p xx p`, invertible, will give naother representation.
natural exponential family, `eta = eta(theta)` the natural parameter, and `Xi = {eta(theta) | theta in Theta}` the natural parameter space, then `f_eta = dP/dnu (x) = exp{eta^T T(X) - xi(eta)} h(x)`. If `Xi` contains an open set, then is of full rank
D5.1.2 (Location-scale family) Let `P` be a known pm on `(RR^k, B(RR^k))` and `nu sub RR^k` and `M_k = {k xx k " positive definite matrices"}`. The family of pms `{P_((mu, sigma_k)) = P(Sigma^(-1/2)( * - mu), mu in nu, Sigma in M_k}` is called a location-scale family on `(RR^k, B(RR^k))`
- `{ P_((mu, I_k)), mu in vu}` is called a location family
- `{ P_((0, Sigma_k)), Sigma_k in M_k}` is called a scale family
Sufficiency and completeness
D5.2.1 Given a random obsverable `X`, measurable function `T: bbX -> RR^d, d in NN^+` is called a statistic if `T(X)` is known whenever `X` is. `sigma(T(X)) sub sigma(X)`
D5.2.2 Let `(bbbX, sfX, sfP)` be an ops of `X`, and let `sfG` be a sub-`sigma`-field. `sfG` is sufficient for `sfP` if `AA in sfX quad P(A|sfG) = E_P(I_A | sfG)` does not depend on `P in sfP`. That is the conditional probability of `A | sfG` is the same for all `P in sfP`.
P5.2.1 `sfG` is sufficient for `sfP` iff for any bounded `sfX`-measurable function `f: bbbX -> RR` there exists a `sfG`-measurable function st `g = E_P(f | sfG) quad AA P in sfP`
L5.2.2 Suppose the ops is dominated by a `sigma`-finite measure `lambda` then there exits a countable subset `sfP_0 sub sfP` st `sfP << sfP_0`
C5.2.3 If a family of pm on `(bbbX, sfX)` is dominated by a `sigma`-finite measure `lambda` then `sfP` is dominated by a pm `Q = sum_i c_i P_i quad c_i >0, sum c_i = 1, P_i in sfP`
T5.2.4 (Halmann-Savage) Let `(bbbX, sfX, sfP^x)` be a dominated ops and `sfB` be a sub-`sigma`-field.
- if there exists a pm `mu` "st" `sfP << mu` and `AA P in sfP` and `dP/dmu` is `sfB`-measurable then `sfB` is sufficient for `sfP`
- conversely, if `sfB` is sufficient then there exits a pm `mu` st `sfP ~ mu` and `dP/dmu` is `sfB`-measurable `AA P in sfP`
T5.2.5 (Factorisation theorem). Suppose `(bbbX, sfX, sfP)` an ops, `sfP << sigma-"finite" lambda`. Then `T(x)` is sufficient for `sfP` iff there exists non-negative measurable function `h: bbbX -> RR^d` does not depend on `P in sfP` and a non-negative `g_P` is measurable on `sigma(T(X))` st `dP/dlambda (x) = g_P(T(x))h(X) quad AA x in bbbX`
Sufficiency is very much determined by the structure of `sfP`. If `sfP` is not a proper set of models the discussion of sufficiency is quite hypothetical.
Minimal sufficient
D5.2.3 Let `(bbbX, sfX, sfP^x)` be an ops and `sfC sub sfX` be a sub-`sigma`-field. `sfC` is necessary for `sfP` if for all given sufficient `sigma`-fields `sfB` for `sfP` and any `C in sfC` there exists `B in sfB` st `P(B o+ C) = 0 quad AA P in sfP` (`o+` = XOR).
- `sfC = {o/, Omega}` is necessary
- If `sfC` is necessary and `sfB` is sufficient then `sfC sub sfB` up to P-null set in the sense `C in sfC, b in sfB => P(C) = P(C nn B)`
D5.2.2 A statistic `T:(bbbX, sfX) -> (RR^d,B(RR^d))` is necessary for `sfP` if for any sufficient statistic `S: (bbbX, sfX) -> (RR^d, B(RR^d))` there exists a measurable `H: RR^q -> RR^d` st `T = H(S) "wp1" P quad AA P in Q, q in NN^+, d in NN^+ uu {oo}`
A `sigma`-field is minimal sufficient if it is sufficient and necessary.
T5.2.6 Let `(bbbX, sfX, sfP)` be an ops, dominated by `sigma`-finite `lambda`. Then a minimal sufficient `sigma`-field for `sfP` exists.
T5.2.7 Let `(bbbX, sfX, sfP)` be an ops and `sfP_1 sub sfP` st `sfP << sfP_1`. If `T(X)` is sufficient for `sfP` and minimal sufficent for `sfP_1` then `T` is minimal sufficient for `sfP`
T5.2.8 Let `(bbbX, sfX, sfP)` be an ops where `sfP = {P_0, ..., P_k}` is a finite set of pms with densities `{f_1, ..., f_k}` wrt a `sigma`-finite `lambda`. Then `T(X) = ((f_1(x))/(f_0(x)), ..., (f_k(x))/(f_0(x)))I_A` is a minimal sufficient statistic for `sfP`
Completeness
D5.3.1 Let `(bbbX, sfX, sfP)` be an ops, and `sfB` be a sub-`sigma` field of `sfX`.
- `sfB` is complete for the family `sfP` if `AA sfB`-measurable function f, with `E_P f < oo quad AA P in sfP`, `int f dP = 0 quad AA P in sfP => f = 0 "wp1" P quad AA P in sfP`
- `sfB` is bounded complete for `sfP` if the above holds for all bounded measurable f.
- A statistic `T:bbbX -> RR^d` is bounded complete if `sigma(T)` is bounded complete
- Let `(bbbY, sfY)` be a measurable space and `sfP^y` be a set of pms on `(bbbY, sfY)`. `sfP^y` is complete if `AA sfY`-measurable real valued functions f (with `int |f| P <0 quad AA P in sfP`) st `int f dP = 0 quad AA P in sfP^y => f = 0 "wp1" P quad AA P in sfP^y`
P5.3.1 Let `Y: bbbX -> bbbY` measurable, st `sigma(Y)` is complete for `sfP`. Let `sfP_y = {P @ Y^(-1)}`, then `sfP_y` is complete.
L5.3.2 Let `sfP = {P_y: eta in Xi}` be a natural expoential family woth density `(dP_eta)/(dnu)(x) = exp(eta^T T(x) - xi(eta))h(x) quad x in bbbX`. Suppose `T(X) = (Y(X), U(X))` and `eta = (theta, phi)` where `Y` and `theta` have same dimension. Then `Y` has density `f_y(y) = exp(theta^T Y - xi(y))` with a `sigma`-finite measure `lambda` depending on `phi`
If `T=Y` then `T(X)` has distriubtion in natural exponential family form.
Given an rv `X` its moment generating function (MGF) is defined as `psi_X(t) = E(e^(t^T X))`. Has similar properties to characteristic functions but `psi_X(t) ` can take value of `oo`
L5.3.3 Let `X` and `Y` be rvs in `RR^k`. If `psi_X(t) = psi_Y(t) < oo` for `|t| < delta, delta > 0`, then `X` and `Y` have the same distribution.
T5.3.4 Let `sfP = {P_eta; eta in Xi}` be a natural exponential family of full rank with density `(dP_eta)/(dnu)(x) = exp(eta^T T(x) - xi(eta))h(x)`. Then `T(X)` is complete.
T5.3.5 Let `(bbbX, sfX, sfP)` be an ops. If `S sub sfX` is bounded complete, and sufficient `sigma`-field for `sfP` then `S` is minimal sufficient for `sfP`
D5.3.2 Let `(bbbX, sfX, sfP)` be an ops. A `sigma`-field `sfB sub sfX` is ancilliary for `sfP` if `AA sfB`-measurable statistic `V(X)` the distribution of `V` does not depend on `P in sfP`
T5.3.6 If `sfB` is complete and sufficient and `sfC` is ancillary `sigma`-field for `sfP_1`, then `sfB` and `sfC` are independent for `P in sfP`.
Let `V(X)` and `T(X)` be two statistics on `X` on `(bbbX, sfX, sfP)`. If `V` is ancillary and `T(X)` is complete and sufficient for `sfP` then `V(X)` and `T(X)` are indepdendent for `P in sfP`
Decision theory
Deicision rules. Loss functions and risks
D6.1.1 A statistical decision problem consists of the following elements:
- a ps `(bbbX, ccX, ccP)` for a random observable `X`
- an action space `(bbbA, ccA)`. `bbbA` is the set of allowable actions, and `ccA` is the `sigma`-field on `bbbA`
- a decision rule (dr) `d: bbbX -> bbbA`, measurable
- a loss function `L(P, d(x))` which specifies the loss associated with picking `d(x)` when underlying model is `P in ccP`, `L: ccP xx bbbA -> [0, oo)`, measurable wrt to `ccA` for each fixed `P`
- the risk of a decision rule `R(P, d) = E_P(L(P, d(x)))`
The goal of decision theory is to find the best decision rule given a loss function.
If `ccP` is parametric, then often write `L(theta, d(x))`
D6.1.2 A dr is `d_1`:
- as good as `d_2` if `R(P, d_1) <= R(P, d_2) quad AA P in ccP`
- better than `d_2` if it as good as and `R(P, d_1) < R(P, d_2)` for at least one `P in ccP`
- equivalent to `d_2` if `R(P, d_1) = R(P, d_2) quad AA P in ccP`
Let `ccT` be a class of decision rules. A dr d is `ccT`-optimal if `d` is as good as any other dr in `ccT`. If `ccT` contains all the possible dr's, then `d` is optimal if `d` is `ccT`-optimal
D6.1.2
- A non-random decision rule (nrdr) `d:bbbX -> bbbA` is a measurable function
A behavioural decision rule (bdr) is a function `delta: bbbX xx bbbA -> [0,1]` t:
- `AA X in bbbX` `delta(x , *)` is a pm on `(bbbA, ccA)`
- `AA A in bbbA` `delta(*, x)` is `ccX`-measurable
The loss function of a bdr is `L(P, delta)(x) = int_bbbA L(P, a) delta(x, da)`. The risk is `R(P, delta) = int_bbbX L(P, delta)(x) dP(x)`
D6.1.3 A randomised decision rule (rdr) `bar delta` is a pm on `(bbbD, ccD)`. The risk is `R(P, bar delta) = int_bbbD R(P, g) d bar delta(g)`
Remarks:
- rdr and bdr called random drs
- `R(P, bar delta)` is a further average of `R(P, g)` wrt a pm on all `g in bbbD`. `bar delta` is a model for preference of nrdrs - like a prior on `bbbD`
- A bdr is more natural, but a rdr is easy to analyse.
- It can be shown that if `bbbA` is a complete separable (Polish) metric space and `ccA = B(bbbA)` then `AA delta in bbbD_b EE bar delta in bbbD_r "st" R(P, delta) = R(P, bar delta) quad AA P in ccP` for a given decision problem
Admissiability and geometry of decision rules
D6.2.1 Let `ccT` be a class of drs. A dr `delta in ccT` is `ccT`-admissible if there is not dr `in ccT` that is better than `delta`.
The notion of admissibility is a retreat from `ccT`-optimality as the latter may not exist.
T6.2.1 Suppose `bbbA sub RR^k` is convex aand `delta in bbbD_b` st `int_bbbA ||a|| delta(x, d(a)) < oo quad AA x in ccX`. Let `d(x) = int_bbbA a delta(x, d(x)) quad AA x in ccX` (a nrdr). Then:
- If `L(P, a)` is convex then `R(P, d) <= R(P,delta)`
- If `L(P, a)` is strictly convex in `a`, `P(P, delta) < oo` and `P({x: delta(x) "is not degenerate"}) > 0` then `R(P, d) < R(P,delta)`
Geometry of decision rules
A helpful device to understand basics of decision rules. Assume `ccP = {P_1, ..., P_k}` a finite collectioon of pms. Given a dr `delta` define the k-dim risk profile `y_delta = (R(P_1, delta), ..., (R(P_k, delta)))`. Let `ccR_(k,r) = {y_delta in RR^k, AA delta in bbbD_r}`, `ccR_(k,r) = {y_delta in RR^k, AA delta in bbbD_b}`
T6.2.2 `ccR_(k,r)` and `ccR(b,r)` are convex.
D6.2.2 Let `X = (X_1, ..., X_k )` and `Y = (Y_1, ..., Y_k)`:
- `X <= Y` if `X_i <= Y_i quad AA i = 1, ..., k`
- `X < Y` if `X <= Y` and `EE i_0 "st" X_i < Y_i`
D6.2.3 `X in RR^k`, the lower quadrant of X is the set `Q_x = {Y in RR^k; Y <= X}`
D6.2.4
- `X in RR^k` is a lower boundary point of a set `A sub RR^k` if `Q_x nn A = {x}`
- `lambda(A) = {x ; Q_x nn A = {x}}`
- a set `A sub RR^k` is closed from below if `lambda(A) sub A`
T6.2.3 `y_delta in ccR_(k,b)`. If `y_delta in lambda(ccR_(k,b))` then `delta` is admissible. The converse is true if `ccR_(k,b)` is closed.
Complete classes of decision rules
D6.3.1 Let `ccG sub bbbD_b` be a class of drs.
- `ccC` is a complete class (CC) if `AA delta !in ccC EE delta_1 in ccC "st" delta_1 > delta`
- `ccC` is an essentially complete class (ECC) if `AA delta !in ccC EE delta_1 in ccC "st" delta_1 >= delta`
- `ccC` is a minimal complete class (MCC) if it is complete and is a subset of every other CC.
- a minimal essentially complete class is defined similarly
T6.3.1 Let `A(bbbD_b)` be the set of admissable drs in `bbbD_b`. If a MCC exists then it is `A(bbbD_b)`
T6.3.2 If `A(bbbD_b)` is complete, then it is a MCC.
T6.3.3 Suppose `ccP = {P_1, ..., P_k}` is finite. If `bbbD_b` is continuous from below, then `bbbD_0 {delta in bDDD: Y_delta in lambda(ccR_(k,b))}` is MCC.
T6.3.4 If `T(X)` is sufficient for `ccP` and `delta in bbbD_b` then `delta'(X,A) = E(delta(X,A) | T(X)) in bbbD_b` and `R(P, delta') < R(P, delta) quad AA P in ccP`
L6.3.5 Suppose `bbbA sub RR^k` is convex and `d_1, d_2 in bbbD`. Let `d(x) = (1/2)(d_1 + d_2)` then:
- `d in bbbD`
- If `L(P,a)` is convex in `a quad AA P in ccP` and `R(P, d_1) = R(P, d_2)`, then `R(P, d) <= R(P, d_1)`
- If `L(P,a)` is strictly convex in `a quad AA P in ccP` and `R(P, d_1) = R(P, d_2) < oo` and `P(d_1 != d_2) > 0` then `R(P, d) < R(P, d_1)`
C6.3.6 Suppose `bbbA sub RR^k` is convex and `d_1, d_2 in bbbD` with same rish `AA P in ccP`. If loss is convex in `a quad AA P in ccP` and strictly convex for one `P_0 in ccP` and `R(P_0, d_1) = R(P_0, d_2) < oo` and `P(d_1 != d_2) > 0` then `d_1` and `d_2` are inadmissable.
T6.3.7 (Rao-Blackwell theorem). Let `bbbA` be a convex subset of `bbbR^k` and `T` be a sufficient statistic for `ccP`. Let `d` be an nrdr with `E_p||d(x)|| < oo quad AA P in ccP` and `d_0(x) = E_p(d |T)(x)`. Then
- `d_0` is an nrdr st `E_p(d_0) < oo`
- if `L(P,a)` is convex in `a quad AA P in ccP`, `R(P, d_0) <= R(P, d)`
- if `L(P,a)` is strictly convex for in `a quad AA P in ccP` and `R(P, d) < oo` and `P(d_0 != d) > 0` then `R(P, d_0) < R(P, d)`
Bayes and minimax rules
So far we have compared drs `delta_1` and `delta_2` via their risk vectors/profiles, however, this multivariate comparison may not produce a "better" rule. Bayes and minimax rules used a univariate measure of risk.
D6.4.1 Given a statistical decision problem `(bbbX, ccX, ccP)`, `(bbbA, ccA)`, `L(p,a)` `delta in bbbD` which produce a risk `R(P delta)`. The Bays risk wrt a prior pm `Pi` on `(ccP, ccF_P)` is `R_Pi(delta) = int_ccP R(P, delta) Pi(dP)`.
Remarks:
- if `ccP` is finite, then this is equivalent to a weighted average
- if `ccP = { P_theta, theta in Theta}`, a parametric space, `ccF_theta` as `sigma`-field on `Theta`, the prior `Pi` can be regarded as a pm on `(Theta, ccF_theta)` and `R_Pi(delta) = int_Theta R(P_theta, delta) Pi(d theta)`
- Bayes risk has the same favlour as the risk of an rdr, but averaging over `d in bbbD` rather than `P in ccP`
D6.4.2 Let `ccT` be a set of drs. A dr `delta_0 in ccT` is the `ccT`-Bayes rule if `R_Pi(delta) = int_(delta in ccT) R_Pi(delta)`
T6.4.1 Let `ccP = {P_1, ..., P_k}` and `ccT` a family of drs. If `delta_0 ` is `ccT`-Bayes wrt to a prior `PI = (pi_1, ..., pi_k)`, `pi_i > 0 sum pi_i = 1` then `delta_0` is admissable.
T6.4.2 Let `ccP = {P_theta; theta in Theta sub RR^k}` st tevery open ball in `Theta` has a non-empty intersection with the interior with positive `Pi`-probability and `R(theta, delta) := R(P_theta, delta)` is continuous wrt `theta` on `Theta` for each `delta in ccT`. If:
- `delta_0` is `ccT`-Bayes wrt to a prior `Pi`
- `R_pi(delta_0) < oo`
then `delta_0` is `ccT`-admissable.
T6.4.5 If `ccP` is finite, and `delta` is `ccT`-admissable then there exists a prior `Pi` on `(ccP, ccF_P)` st `delta` is `ccT`-Bayes wrt `Pi`.
P6.4.4 (Lehmann's theorem) Let `T: (Omega, ccF) -> (Lambda, ccG)` measurable, `phi: (Omega, ccF) -> (RR^k, B(RR^k))` measurable. Then `phi` is `(Omega, sigma(T)) -> (RR^k, B(RR^k))` iff there exists a `psi: (Lambda, ccG) -> (RR, B(RR^k))` st `phi = psi * T`
D6.4.3 The conditional expectation of `X | Y=y` for some `y in RR^k` is `E(X|Y=y) = h(y)`
P6.4.5 Let `X` and `Y` be n- and m-dimensiional r. vectors. Suppose `P_((x,y))`, the pm of `(X,Y)` is dominated by `vu xx lambda` with density `f(x,y)` where `nu` and `lambda` are `sigma`-finite measure of `(RR^n, B(RR^n))` and `(RR^m, B(RR^m))`. Let `g(x,y): (RR^(n+m), B(RR^(n+m))) -> (RR, B(RR))` be measurable, st `E|g(x,y)| < oo`. Then `E(g(X,Y)) = (int g(x,Y) f(x,Y) nu(dx))/(int f(x,Y) nu(dx))` and `E(g(X,Y) | Y=y) = (int g(x,y) f(x,y) nu(dx))/(int f(x,y) nu(dx))`.
T6.4.6 (Existence of conditional distribution in a general case). Let `X` be a n-dim r. vec on `(Omega, ccF, ccP)`. `Y: (Omega, ccF) -> (Lambda, ccG)` measurable. Then there exists a regular cond pm `P_(X|Y)( * | y)` called the conditional distribution of `X|Y=y`, st
- `P_(X|Y)(B | y) = P(X in B | Y = y) "wp1" P_Y quad AA B in B(RR^n)`
- `P_(X|Y)(* | y)` is a pm on `(RR^n, B(RR^n)) quad AA y in Lambda`
Furthermore, if `E|g(X,Y)| < oo` for `g: RR^n xx Lambda -> RR` measurable, then `E(g(X,Y) | Y=y) = E(g(X, y) | Y=y) = int_(RR^n) g(x,y) dP_(X|Y)(x|y) "wp1" P_Y`
Remark:
- the theorem assures existence of cond dist for a wide range of cases, and free sus from the task of proving P6.4.5 (but P6.4.5 does give more details of the cond dist)
- cond dist is a regualr cond prob of a rv given another rv `Y` at `y`.
Construction of Bayes Rules
T6.4.7 (Bayes formula). Assume `ccP = {P_theta; theta in Theta}` is dominated by `sigma`-finite `nu` and `f_theta(x) = (dP_(X|theta)(x | theta))/(dnu)` is the cond density which is a function of `(x, theta)` is measurable on `(bbbX xx Theta, sigma(ccX xx ccF_Theta))`. Let `Pi` by a prior pm on `(Theta, ccF_Theta)` Assume `m(x) = int_Theta f_theta(x) dPi > 0 ` then:
- the posterior distribution of `theta | X` is `P_(theta|X) << Pi ` and `(dP_(theta|X))/dPi = (f_theta(x)) / (m(x))`
- if `Pi << lambda` and `dPi/dlambda = pi(theta)` for `sigma`-finite `lambda` then `(dP_(theta|X))/(dlambda) = (f_theta(x) pi(theta))/(m(x))`
Remarks:
- `M(A) = int_bbbA m(x) nu(dx) quad AA A in ccX` is the marginal pdf of `X`
the posterior distribution `P_(theta|X)` plays a pivotal role in Bayesian inference and derives Bayes rule in most situations.
For an nrdr `R_Pi(delta) = int_bbbX E(L(theta, delta(x)) | X=x) dM(x)` where `E(L(theta, delta(x)) | X=x)` is the posterior expected value given `X=x`
T6.4.8 Under the conditions of T6.4.6 and `Theta, bbbA` convex. Then
- with squared error loss, and `PI "st" int theta^2 dPi < oo`, the Bayes rule is `d(x) = E(theta | X = x)`
- with `L_1` loss, and `PI "st" int theta dPi < oo`, the Bayes rule is `d(x) = "median of" P_(theta|X)`
Minimax rules
D6.4.4 A dr `delta_o in ccT` is the minimax rule if `spr_(P in ccP) R(P, delta_0) = inf_(delta in T) spr_(P in ccP) R(P, delta)`. A minimax rule has the smallest worst-case risk.
D6.4.5 If a dr `delta` has constant risk if it is an equiliser rule, ie `R(P, delta) = c quad AA P in ccF`
T6.4.6 If `delta` is an equiliser rule and is admissible then it is minimax.
T6.4.9 Suppose `{delta_i}_{i>=1}` is a sequence of drs, and each `delta_i` is Bayes wrt `Pi_i`. If `R_(Pi_i) -> c < oo` and `delta_0` is a dr with `R(P, delta_0) <= c quad AA P in ccP` then `delta_0` is minimax.
C6.4.10 If `delta_0` is an equiliser rule and is Bayes wrt `PI_0` then it is minimax.
Remarks:
- constant risk functions and either admissibility or Bayes implies minimax, but converse may not be true
- `ul V = sup_Pi inf_(delta in ccT) R_Pi(delta)`, `bar V = inf_ccT sup_ccP R(P,delta)`. Let `R_Pi = int_(delta in ccT) R_Pi(delta)` (the minimum Bayes risk wrt `Pi`). It can be shown that `R_Pi <= bar V`
D6.4.6 A prior `Pi_0` is least favourable if `R_(Pi_0) = sup_Pi R_Pi`.
If a dr `delta` which is least favourable has the best chance of being minimax.
C6.4.11 If `delta` is Bayes wrt `Pi` and `R(p, delta) <= R_Pi = int_(delta in T) R_Pi(delta)` then `delta` is minimax and `Pi` is least favourable.
Unbiased estimators and invariance drs
D6.5.1 `(bbbX, ccX, ccP = {P_theta, theta in Theta})`
- a measurable function `r: Theta -> RR^k` is called a parametric function
- an nrdr `d(x)` is an unbiased estimator (UE) of a parametric fn `gamma(theta)` if `E_p(d(X)) = gamma(theta) quad AA theta in Theta`
- a parametric function `gamma(theta)` is U-estimable if there exists a UE for `gamma(theta)`
T6.5.1 (Lehmann-Scheffe) Suppose that:
- `ccB` is complete and sufficient for `ccP`
- `bbbA sub RR^k` is convex
- `L(theta, a)` is convex in `a AA theta in Theta`
If there exists an UE of a parametric fn `gamma(theta)` then there exists a best UE (BUE) of `gamma(theta)`
Invariance drs
Would be nice to have decision rules that are invariant to transformations of r. obs.
D6.5.2 Let `G != o/`, and `@` be a binary operator on `G`. `(G, @)` is called a groupp if:
- `AA g_1, g_2 in G` `g_1 @ g_2 in G`
- `AA g_1, g_2, g_2 in G` `(g_1 @ g_2) @ g_3 = g_1 @ (g_2 @ g_3) `
- `EE e in G` st `g @ e = e @ g = g quad AA g in G`
- `AA g in G EE h in G "st" g @ h = h @ g = e`
`e` is unqiue and is the identity element of `G`. `h` is the inverse of `g`.
D6.5.3 Let `(bbbX, ccX)` be a measurable space. `G` is a group of measurable transformations (GMT) if :
- `G sub {g | g:bbbX -> bbbX, "measurable and one-to-one"}`
- `(G, @)` is a group, where `@` is the composition operator
D6.5.4 Let `(bbbX, ccX, {P_theta, theta in Theta})` be ps for r. obs `X` and `G` is a GMT on `(bbbX, ccX)`, then `ccP` is invariant under `G` if `AA theta in Theta, g in G` there exists a unique `theta' in Theta` st `P_theta(g^(-1)(A)) = P_(theta')(A) quad AA in bbbX`
- we use `bar g(theta)` to denote the unique `theta'`