Advanced calculus

Elementary set theory

A set is a collection of objects, usually defined by a common defining property.

Set operations

Equivalence relationships

Let `G` define an relationship (`G sub Omega xx Omega`). It is an equivalence relationship if it is :

`[x]` defines the equivalence class `{y: x~y}`. Equivalence classes form a partition over `Omega`

Functions

A function is a correspondence between elements in a set `X` (domain) and `Y` (range). `AA x in X` there is a unique `y in Y` that corresponds to it: `y = f(x)`.

A set `X` is:

Induction

The set `NN` of natural numbers has the well ordering property such that nonempty subset of `NN` has a smallest element `s` st `s in A` and `a in A => a >= s`. Principle of induction is consequence of this property, and it states:

Let `{ P(n) : n in NN}` be a collection of propositions which are either true or false. If `P(1)` is true and `P(n) "is true" => P(n+1)` then `P(n)` is true for all `n`.

Real numbers

At least two approaches to defining the real numbers:

Start with `NN`, then construct `Z = ( NN uu {0} uu (- NN) )`, then `QQ`, then `RR` as either the set of all Cauchy sequences of rationals or as Dedekind cuts.

OR, can define the real numbers as the set that satisfies three axioms: `RR` is a field (algebraic), ordered and complete, ie (complete ordered field).

Algebraic axioms:

Order axiom states that there is set `P sub R` (positive numbers) st:

Given such a `P`, can define an order on `R` by defining `x < y` to mean `x-y in P`, and it follows that `AA x, y in R`, either `x=y, or x lt y or x > y` (linear order)

Let `A sub R`. `M` is an upperbound of `A` if `a in A => a <= M`. supremum `A` means the least upperbound of `A` (`sprA` is an upperbound for `A`, but if `K < sprA` then `K` is not an upperbound for `A`)

Completeness axiom states that if `A sub R` has upperbound `< oo` then `EE stackrel~M < oo "st" stackrel~M = sprA`. Consequences: (Axiom of Eudoxus) `AA x in R EE n in NN "st" x < n`; `Q` is dense in `R`.

Extended real numbers `barR = {-oo, R, +oo}` and extend definitions of `+, *, <` intuitively.

Sequences and limits

A sequence of real numbers is the range of a function `f: NN -> RR`

`lim_n x_n = a in R if AA epsilon > 0 EE n_epsilon "st" n >= N_epsilon => |x_n - a| < epsilon`

Let `{x_n} sub R`, `y_n = spr{x_j : j >= n}` then `y_n` is a non-decreasing sequence, and can be shown that all monotone nonincreasing sequences in R converge to `+oo` or a finite real number, so `lim_n y_n = y` exists and is called lim sup and written `bar{lim_n} x_n = inf_n (spr_(j>=n) x_j)`.

An alternative definition of `bar(lim_n)` is `bar(lim_n) = a in R if AA epsilon > 0 EE N_epsilon "st" n > N_epsilon => x_n < a + epsilon, and AA k EE n >= > k "st" x_n > a-epsi`.

A sequence `{x_n} in R` is a Cauchy sequence if `AA epsilon > 0, EE N_epsilon "st" n,m >= N_epsilon => |x_m - x_n| < epsilon`

Prop: if `{x_n} in R` is convergent in `R <=> {x_n}` is Cauchy. (proof requires completeness axiom).

Series

Let `{x_n}_{n>=1}` be a sequence of real numbers. Let `s_n = sum_{j=1}^n x_j, n=> 1` be the nth partial sum. The series `sum_1^oo x_j` converges to `s` in `R` if `lim_n s_n = s`. If `x_j >= 0` then `lim_n s_n = s in R or oo`. A series `sum_1^oo x_j` converges absolutely if `sum_1^oo |x_j|` converges. A series converges uniformly if its seuqence of partial sums converges uniformly.

Power series

Let `{a_n}_(n=>0)` be sequence of real numbers. The series `sum_1^oo a_n x^n` is called a power series and is said to be convergent on `B` if it converges `AA x in B`.

The radius of convergence `R = (barlim |a_n|^(1/n))^-1`. Power series converges when `|x| < R` and diverges to `oo` when `|x| > R`.

Taylor Series

`f: (a - eta, a + eta) -> RR, a in RR`. Suppose `f` is `n` times differentiable in `(a - eta, a+eta)`. Let `a_n = (f^(n)(a))/(n!)`. Then the power series `sum a_n (x- a)_n` is called the Taylor series of `f` at `a`. Let `a_n = (f^(n)(a))/(n!)`, then the power series `sum a_n(x-a)^n` is called the Taylor series of f. The Taylor remainder theorem says `AA x in I` and any `n >= 1' `|f(x) – sum a_j xj| <= |f((n+1))(y_n)|/((n+1)!) |x-a|` for some `y_n in I`. So if `lambda_k = sum |f(k)(y)|` and `sum lambda_k/(k!)` converges then the remainder `-> 0`.

Metric spaces

Let `S != O/` and `d: S x S -> R^+` then if:

then `d` is called metric on `S` and the pair `(d,S)` is a metric space

A sequence `{x_n} sub (S, d)` converges to `x in S` if `AA epsilon > 0 EE N_epsilon "st" n => N_epsi => d(x_n, x) < epsi` and is written `lim_n x_n = x`. A sequence in a metric space is Cauchy if `AA epsi > 0 EE N_epsi "st" n,m >= N_epsi => d(x_n, x_m) < epsi`. (any convergent sequence is Cauchy). A metric space is complete is every Cauchy sequence converges.

Continuous functions

Let `(S,d) and (T, rho)` be two metric spaces and `f:S -> T` a map between them. `f` is:

`O sub (S,d)` is open if `x in O => EE delta > 0 "st" d(x,y) < delta => y in O`. ie. at every `x in O` an open ball `B_x(delta)` of positive radius `delta` is `sup O`. A set is closed if its complement is open.

A map `f: S -> T` is continuous iff `AA O` open in `T` `f^{-1}(O)` is open in `(S,d)`

A collection of open sets `{O_alpha : alpha in I}` is an open cover for a set `B sub (S,d)` if `AA x in B, EE a in I "st" x in O_alpha`.

A set is compact if a finite subcollection of an open cover is an open cover.

Any `K sub R` is compact iif it is bounded and closed.

Measures

Classes of sets

`Omega != O/`, `P(Omega) = {A: A sub Omega}`, `ccF sub P(Omega)`

`pi` system
`A, B in ccF => A nn B in ccF` If `C` is a `pi`-system, then `lambda( C ) = sigma(: C :)`.
If `C` is a `pi`-system, and `ccL` is a `lambda`-system containing `C` then `ccL sup sigma(: C :)`
Semi-algebra
`A, B in C => A nn B in C`
`AA A in C, A^c = uu_(i=1)^(k
The smallest algebra containing a semi-algebra is the class of finite unions of sets from C.
Algebra
`Omega in ccF`
`A in ccF => A^c in ccF`
`A, B in ccF => A uu B in ccF`
An intersection of `sigma`-algebras is always a `sigma`-algebra, but the union may not even be an algebra. If `sfF sub P(Omega)` then the `sigma`-algebra generated by `ccF`, `sigma(: ccF :) = ` intersection of all `sigma`-algebras containg `ccF`.
`sigma`-Algebra
`ccF` is an algebra
closed under countable (monotone) unions
A useful class of `sigma`-algebra are those generated by open sets of a topological space. A topological space is a pair `(S,T)` where `S` is a non-empty set and `T` is a collection of subset st. `S in T`, and `T` is closed under intersection and uncountable unions. Elements of `T` are called open sets. The Borel `sigma`-algebra on a topological space `S` (in particular a metric or Euclidean space) is defined as the `sigma`-algebra generated by the collection of all open sets in `S` = `sigma(:T:)`.
`lambda` system
`Omega in ccF`
`A, B in ccF => B \\ A in ccF`
`A_n in ccF, A_n sub A_(n+1) AA n>= 1 =>` `U_(n>=1) A_n in ccF`
Every `sigma`-algebra is a `lambda`-system.
The intersection of two `lambda`-systems is a `lambda`-system
`lambda( C )` defined similarly to `sigma(:C:)`

Measures

A set function is an extended real-valued function on a class of subsets of `Omega`. A set function `mu : A -> [0, oo]` is a measure if:

A measure is infinite if `mu(Omega) = oo`, finite otherwise. A measure with `mu(Omega) = 1` is a probability measure. A measure is `sigma`-finite if there is a countable collection of sets such that `uu A_n = Omega` and `mu(A_n) < oo`.

Common measures:

A measure `mu` on an algebra `F` has the following properties:

Uniqueness of measures: `mu_1, mu_2` two measures on a measurable space `(Omega, F)`. Let `C sub F` be a `pi`-system st. `F = signa(:C:)`. If `mu_1(B) = mu_2(B) AA B in C` then `mu_1(A) = mu_2(A) AA A in F`.

The extension theorems and Lebesgue-Stieltjes measures

A set function `mu: C -> [0, oo]` defined on a semi-algebra is a measure if:

Given a measure on a semi-algebra `C`, the outer measure induced by `mu` is the set function `mu^**`, defined on `P(Omega)` defined as `mu^**(A) = inf{ sum mu(A_n) : {A_n}_{n>=1} sub C, A sub uu_{n>=1} A_n}`. (approximate `mu` as the least upper bound of the measure of all possible covers). This is exact on `C` and `A`. A set `A` is exactly `mu^**` measurable if `mu^**(E) = mu^**(E nn A) + mu^**(E nn A^c) AA E sub Omega`.

`mu^**` satisfies: (any set functions with these properties is called an outer measure)

Let `mu^**` be an outer measure on `Omega`. Let `ccM = ccM_{mu^**} = {A: A " is " mu^** "measurable"}`. Then:

Caratheodory's extension theorem*: Let `mu` be a measure on a semialgebra `C` and `mu^**` be defined as above. Then:

`A in M_(mu^**)`, `mu^**(A) < oo`, then `AA epsi >0 EE B_1, B_2, ..., B_k in C, k < oo`, `B` mutually disjoint, s.t. `mu^**(A o+ U B_j) < epsi`, where `E_1 o+ E_2` is the symmetric difference. That is measure, any measure on R can be apprommixtaely by a finite number of intervals, or every `mu^**` measurable set of finite measure is nearly a finite union of disjoint elements from the semialgebra C.

Lebesgue-Stieltjes measures on R

Let `F: RR -> RR` be nondecreasing. Let `C = { (a,b]: -oo <= a <= b < oo} uu { (a, oo): -oo <= a < oo}` and define `mu_F (a,b] = F(b+) - F(a+)`, then:

`F(x) = x` is known as the Lebesgue measure.

A Radon measure is finite on bounded intervals. `mu` is a radon measure iff it is a Lebesgue-Stieltjes measure.

Completeness of measures

A measure space is called complete if `AA A in F, m(A) = 0 => P(A) sub F`. The measure space generated by the exetension theory is complete.

It is always possible to complete an incomplete measure: Let `(Omega, F, mu)` be a measure space. Let `barF = {A : B_1 sup A sup B_2}` for some `B_1, B_2 in F` st `mu(B_1 \ B_2) = 0`. For any `A in barF` set `bar mu(A) = bar mu(B_1) = bar mu(B_2)`. Then:

Integration

Measurable transformations

Let `(Omega, F, P)` be a probability space. Then `f: Omega -> RR` is called a random variable if `X^(-1)(-oo, a] = {omega: X(omega} <= a} in F`. This is equivalent to the stronger condition that `X^(-1)(A) in F` for all `A in B(RR)`.

A mapping `T: Omega_1 -> Omega_2` between two probability spaces is measurable with respect to the `sigma`-algebras if `T^(-1)(A) in F_1 AA A in F_2`. (and will be measurable for any `barF_1 sup F_1` and `barF_2 sub F_2`).

In general, this is a difficult property to prove directly, but:

And if `AA n in N`, `f_n : Omega -> barRR` is `(:F, B(barRR) :)` measurable, then:

Let `{ f_lambda: lambda in Lambda }` be a family of mappings from `Omega_1 -> Omega_2`, and `F_2` a `sigma`-algebra on `Omega_2`, then `sigma(: {f_lambda^{-1} (A) : A in F_2, lambda in Lambda} :)` is called the `sigma`-algebra generated by `{f_lambda: lambda in Lambda}`. This is the smallest `sigma`-algebra such that all `f_lambda` are measurable.

Let `{f_lambda: lambda in Lambda}` be an uncountable collection of maps `Omega_1 -> Omega_2` then `AA B in sigma(: {f_lambda: lambda in Lambda} :)` then exists a countable set `Lambda_B in Lambda` st. `B in sigma(: {f_lambda: lambda in Lambda_B} :)` (ie. a countable cover)

Induced measures, distribution functions

Suppose X is an rv defined on `(Omega, F, P)`, then `P` governs probabilities assigned to events like `X^(-1)[a,b]`. Since `X` takes values in the real line, it should be possible to express such probabilities as function of `[a, b]`, eg. `P_X(A) = P(X^(-1)(A))`. Is this a probability measure?

Let `(Omega_i, F_i)`, `i=1,2` be measurable spaces, and let `T: Omega_1 -> Omega_2` be a `(: F_1, F_2 :)` measurable function. Then for any measure `mu` on `(Omega_1, Omega_2)`, `mu_T = mu T^(-1) (A) = mu(T^(-1) (A))` is a measure on `F_2`.

For a rv `X` defined on `(Omega, F, P)` the probability distribution of X is the induced measure of `X` under `P` on `RR`.

The cumulative distribution function (CDF) = `F_X(x) = P_X(-oo, x] = P{omega: X(omega) <= x}`, and has the following properties: (any function with these properties is called a cdf)

Given any cdf it is possible to construct a probability space and a random variable st. it is the cdf of X. (eg. `(RR, B(RR), mu_F)`, `f(x) = x`).

A random variable is discrete if there exists a countable set `A sub RR` st `P(X in A) = 1`, or continuous if `P(X=x)=0 AA x in RR`. The decomposition theorem states that any cdf can be written as the weighted sum of a continuous and a discrete cdf.

Generalisation to higher dimensions is straightforward, eg. for `k=2`

Integration

Let `(Omega, F, mu)` be a measure space, and `f: Omega -> R` be a measurable function.

A function `f: Omega -> barRR` is called simple if there exists a finite set `{c_1, c_2, ..., c_k} in barRR` and sets `A_1, A_2, ..., A_k in F, k in NN` st. `f = sum c_i I_(A_i)`. The intergal of a simple function `int_Omega f dmu = sum c_i mu(A_i)`. Note: `0 <= int f dmu <= oo`

If `f` and `g` are two simple non-negative functions then:

Can extend this definition to any non-negative function by discretising and taking limits. (Need to confirm that all admissable sequences take the same form) ie. `lim_(n->oo) int f_n dmu = int lim_(n->oo) f_n dmu = int f mu`

The monotone convergence theorem proves that `int f dmu = lim_(n->oo) int f_n dmu`, for any increasing sequence of `f_n`. Corollaries:

Can extend to any function by defining `f = f^+ - f^-` (where `f^+ = max(0, f)`, `f^- = max(0, -f)`), or more generally `f = f_1 + f_2` for some `f_1, f_2 >= 0`. Said to be integrable if `int |f| < oo`.

Theory Hypotheses Results
MCT `f_n >= 0`
`f_n uarr f "ae" (mu)`
`int f_n uArr int f`
Fatou `f_n >= 0` `ullim int f_n >= int ullim f_n`
LDCT `|f_n| <=g`
`g in L^1(mu)`
`f_n -> f`
`f in L^1(mu)`
`int f_n -> int f`
`int|f_n - f| -> 0`
BCT `mu(Omega) < oo`
`|f_n| <= K`
`f_n -> f`
`f in L^1(mu)`
`int |f_n - f| -> 0`

`L^p` spaces

If `f` and `g` are integrable (ie. `in L^1(Omega, F, P)`) then:

Riemann and Lebesgue integrals

Let `f` be a real value function bounded on a bounded interval. Let `P = {x_0, x_1, ..., x_n}` be a finite partition of `[a,b]`, and `Delta = Delta(P) = max{(x_1 - x_0), ...}` be the diameter of `P`. Let `M_i = spr{f(x): x_i <= x <= x_(i+1)}` and `m_i = inf{f(x): x_i <= x <= x_(i+1)}`.

The upper and lower Riemann sums of `f` wrt `P` are defined as:

The upper and lower Rieman integrals are defined as:

Where `sfP` is the set of all partitions.

`f` is said to be Riemann-integrable if `bar int f = ul int f`. `f` is a bounded function on a bounded interval. `f` is Riemann integrable on `[a,b]` if `f` is continuous on `[a,b]` and `f` is Lebesgue integrable and the two integrals are equal.

More on convergence

NameSymbolDefinition
Pointwise `f_n -> f` `lim_(n->oo) f_n(omega) = f(omega) AA omega in Omega`
Almost everywhere `f_n -> f "ae" mu` `lim_(n->oo) f_n(omega) = f(omega) AA omega in B^c, mu(B) = 0`
In measure `f_n stackrel m -> f` `lim_(n->oo) mu{|f_n - f| > epsi} = 0`
In `L^p` `f_n stackrel (L^p) -> f` `lim_(n->oo) int |f_n - f|^p = 0` or `{:|| f_n - f||:}_p -> 0` where `{:|| g ||:}_p = (int |g|^p dmu)^(1/p)`
Uniformly `lim_(n->oo) spr_{omega in Omega} {|f_n(omega) - f(omega)|} = 0`
Nearly (almost) uniformly `AA epsilon > 0 EE A in F "st" mu(A) lt epsilon` and on `A^c` f converges uniformly

Uniform integrability

Let `a_f(t) = int_{|f| > t} |f|`. If `f in L^1` then `a_lambda(t) -> 0`, that is `AA epsilon > 0 EE t_epsilon "st" t >= t_epsilon => a_f(t)`. A collection of functions `{f_lambda : lambda in Lambda}` is uniformly integrable if `AA epsilon > 0 EE t_epsi "st" t > t_epsi => spr_(lambda in Lambda) a_(f_lambda)(t) <= epsilon`.

If `mu(Omega) < oo` and let `{f_n : n >= 1} sub L^1` st `f_n -> f "ae" (mu)`, f measurable. If `{f_n}_(n>=1)` is UI then `lim_(n->oo) int |f_n - f| dmu = 0`

Ergorov’s theorem: Let `f_n -> f "ae" (mu)`, `mu(Omega) < oo`, then `f_n -> f` almost uniformly.

`L^p` spaces

Inequalities

Markov's inequality (T3.1.1, p58)
`f` non-negative measurable function
`0 < t < oo`
`mu{f >= t} <= (int f dmu)/t`
(C3.1.2, p58)
X an RV on `(Omega, F, P)` `P(|X| >= t) <= (E|X|^r)/(t^r)`
Chebychev's inequality (C3.1.3, p58)
X an RV
`E[x] = mu`
`"Var"(X) = sigma^2 < oo`
`P(|X - mu| >= k sigma) <= 1/(k^2)`
(C3.1.4, p58)
`phi: RR^+ -> RR^+`, non-decreasing `P(|X| >= t) <= (E phi|X|)/(phi(t))`
Cramer's inequality (C3.1.5, p58)
X an RV, `t>0` `P(|X| >= t) <= inf_(theta >0) (E(e^(theta |X|)))/(e^(theta t))`

A function `phi: (a,b) -> RR` is called convex if `AA 0 <= lambda <= 1, a < x <= y < b` `phi(lambda x + (1-lambda)y <= lambda phi(x) + (1-lambda)phi(y)` (chords rorate counterclockwise). Or equivalently ` (phi(x_2) - phi(x_1))/(x_2-x_1) <= (phi(x_3) - phi(x_2))/(x_3-x_2)` for `a < x_1 < x_2 < x_3 < b`.

If `phi` is twice differentiable on `(a,b)`, and `phi''(x) >= 0` then `phi` is convex.

Holder's inequality T3.1.11, p61
`1 < p < oo, 1/p + 1/q = 1`
`f in L^p(mu), g in L^q(mu)
`||fg|| <= {: ||f|| :}_p {: ||g|| :}_q`
Cachy-Schwarz inequality C3.1.12, p62
`f,g in L^2(mu)` `||f g|| <= ||f||_2 ||g||_2`
C3.1.13, p62
`1 < p < oo`, `f,g in L^p(mu)` `sum |a_i b_i| c_i <= (sum |a_i|^p c_i)^(1/p) (sum |b_i|^q c_i)^(1/q)`
Minkowski inequality C3.1.14, p62
`1 < p < oo`, `f,g in L^p(mu)` `||f + g||_p <= ||f||_p + ||g||_p`

`L^p` spaces

`L^p` is a vector space. Usual metric is `d_p(f,g) = ||f-g||_p`. If `f` is equivalent to `g` if `f=g "ae" (mu)`. Partitions `L^p(mu)`. Forms a complete metric space.

Dual spaces

`1 <= p < oo`, `1/p + 1/q = 1`, `g in L^q(mu)`, `T_g(f) = int fg du`, `f in L^p(mu)`. Clearly `T_g` is linear, and such a function `L^p(mu) -> RR` is called a *linear functional

The set of all conitunous linear functional on `L^p(mu)` is called the dual space of `L^p(mu)` and denoted `L^p(mu)^**`.

Riesz representation theorem T3.2.3, p57
`1 <= < oo`
`1/q + 1/p = 1`
`T: L^p(mu) -> RR` linear and continuous
`EE g in L^q(mu) "st" T(f) = T_g(f)`.
Not valid for `p=oo`.
`(L^p(mu)^**, ||T||)` is a normed linear vector space, and more `||phi(T)||_q = ||T|| AA T in (L^p(mu)^**`, it is an isometry.

Banach and Hilbert spaces

Banach spces

A Banach space is a complete normed vector space. All `L^p(mu)` spaces are Banach spaces. A closed subspace of a Banach space is also a Banach space.

A norm must satisfy:

A linear transformation is a function `f: V_1 -> V_2` if `a_1, a_2 in RR, x,y in V_1 => T(alpha_1 x + alpha_2 y) = alpha_1 T(x) + alpha_2 T(y)`.

Hilbert spaces

A vector space is a real innerproduct space if `EE f: V xx V -> RR`, denoted by `f(x,y) = (:x,y:)`, that satisfies:

A Hilbert space is a complete real inner product space. Every Hilbert space is a `L^2(Omega, F, mu)` space for some `(Omega, F, mu)`. Called separable if `EE` dense countable subset.

Orthogonal vectors `x _|_ y iff (: x, y :) = 0`. Orthnormal if `||x|| = 1`. If `B sub H` is orthogonal and `H` is separable, then `B` is at most countable. Can convert any set to an orthonormal set using Gram-Schmidt organalisation.

The fourier coefficients of a vector `x in V` wrt ON `B sub V` is `{(:x, b:): b in B}`.

Bessel’s inequality: `{b_i}_{i>=1}` ON in an IPS `(V, (: *, * :))`, `AA x in V sum (: x, b_i :)^2 <= ||x||^2`.

Following are equivalent:

Riesz representation theorem: `H` a separable Hilbert space, then every bounded linear functional `T: H -> RR` can be represented as `T ~= T_(x_0) = (: y, x_0 :)` for some `x_0 in V`.

Differentiation

The Lebesgue-Radon-Nikodym theorem

Let `(Omega, F)` be a measurable space and let `mu` and `nu` be two measures on `(Omega, F)`. `mu` is dominated by `nu`, `mu << nu` if `nu(A) = 0 => mu(A) = 0 AA A in F`.

Let `f` be a non-negative measurable function, `mu(A) = int_A f dnu AA A in F`, then `mu` is also a measure and `mu << nu`.

`mu` is singular wrt to `nu`, `mu _|_ nu` if `EE B in F "st" mu(B) = 0 and v(B^c) = 0`. If `mu` and `nu` are mutally singular, then for `B` as above `mu(A) = mu(A nn B^c)` and `nu(A) = nu(A nn B)`.

Let `h` be a non-negative measurable function st `mu(A) = int_A h dnu`, then `h` is called the Radon-Nikodym derivative` of `mu` wrt `nu` and is written `dmu/dnu`.

Let `nu, mu_1, mu_2, ...` be `sigma`-finite measures, then:

Signed measures

Let `mu_1, mu_2` be finite measures on MS `(Omega, F)`. Let `nu(A) = mu_1(A) - mu_2(A) AA A in F`. A finite signed measure satisfies:

Let `nu` be a finite signed measure and `|nu|(A) = spr_"all partitions" {sum |v(A_i)|}`, then `|nu|` is a finite measure.

A set function `nu` is a finite signed measure iff `EE mu_1, mu_2 "st" nu = mu_1 - mu_2` OR `EE mu` and `f in L^1(mu)` st `AA A in F v(A) = int_A f dmu`.

`A in F` is called a *negative set
for `nu` if for any `B sub A` `nu(B) <= 0`.

Hahn decomposition theorem: `nu` a finite signed measure, then `EE` a positive set `Omega^+` and negative set `Omega^-` st `Omega = Omega^+ uu Omega^-` and `Omega^+ nn Omega^- = O/`. Jordan decomposition theorem: `nu = nu^+ - nu^_`.

`S = {nu: nu "is a finite signed measure"}` is a vector space, with total variation norm `||nu|| = |nu|(Omega)`.

Functions of bounded variation

`f: [a,b] -> RR`, then for some partition `Q`:

And for without partition take supremum over all possible partitions. `f` is said to be of bounded variation if `T(f, [a,b]) < oo`. If `f in BV[a,b]` and `f_1 = N(f, [a,b]), f_2 = P(f, [a,b])` then `f_1, f_2` are non-decreasing and `f(x) = f_1(x) - f_2(x)`. `f in BV[a,b]` iff `EE` a finite signed measure `mu` on `(RR, B(RR)` st `f(x) = mu[a,x]`.

Absolutely continuous functions

A function is absolutely continuous if `AA epsi > 0 EE delta > 0 "st" I_j[a_j,b_j] "disjoint" and summ (b_j - a_j) < delta => sum |F(b_j) - F(a_j)| < epsi`. By mean value theorem, F differentiable and `F'` bounded, then F is ac.

A function `F:[a,b]->R` is absolutely continuous if `barF = F(a) I(x < a) + F(x) I(a<=x lt b) + F(b)I(x>=b)` is ac.

Fundamental theorem of Lebesgue integral calculus: `F: [a,b] -> RR` is ac iff there is a function `f: [a,b] -> RR` st `f` is Lebesgue measurable and integrable and `F(x) = F(a) + int_[[a,x]] f d_(mu_L) AA a<= x <= b`.

Product measures

Product spaces and product measures

Given two measure spaces `(Omega_i, F_i, mu_i)` is it possible to construct a measure `mu_1 xx mu_2` on the product space `Omega_1 xx Omega_2` st `mu(A xx B) = mu(A) * mu(B)` for `A in F_1, B in F_2`?

Starting with `mu` defined on `C`, `mu(A_1 xx A_2) = mu_1(A_1) * mu_2(A_2)`, we can use the extension procedure to extend `mu` to all `F_1 xx F_2`. Another approach, which allows us to calculate the values directly, proceeds as follows.

`(Omega_i, F_i, mu_i)` `mu_i` `sigma`-finite then:

Product space may not be complete, even if both original measures spaces are complete.

Fubini-Tonelli Theorems

Integral over product space can be treated as iterated integral if `f: Omega_1 xx Omega_2 -> R^+` (Tonelli’s therem) or `f in L_1(mu)` (Fubini’s theorem)

Tonelli: Let `(Omega_i, F_i, mu_i)` be `sigma`-finite spaces, `f` non-negative. Then:

Fubini: If `f in L^1(mu)` then there exist sets `B_i in F_i` st

Integration by parts: Let `F_1, F_2` be `uarr`, right-continuous functions on `[a,b]` with no common points of discontinuity. Then `int_((a,b]) F_1(x) dF_2(x)` `= F_1(b)F_2(b) - F_1(a)F_2(b) - int_((a,b]) F_2(x) dF_1(x)`. If `F_1, F_2` are ac with non-negative densities `f_1, f_2` then `int_a^b F_1(x) f_2(x) dx = F_1(b)F_2(b) - F_1(b)F_2(a) - int_a^b F_2(x) f_1(x) dx`. (Can always decompose into two non-negative functions so this also holds for all Lebesgue intergrable functions).

Extension to products of higher order use extension procedure.

Convolutions

Sequences

Given `{a_n}, {b_n} in L^1`, let `c_n = a ** b = sum_0^n a_j b_(n-j)`, then:

Functions

`f, g in L^1` then:

Measures

`mu_1, mu_2` `sigma`-finite measures on `(RR, B(RR))`, `(mu_1 ** mu_2)(A) = int mu_1(A-y) mu_2(dy)`. `I_A(x+y) = h(x+y) = I_A oo phi : RR^2 -> RR`, `phi(x, y) = x+y`. `(mu_1 ** mu_2) = int int_(RR xx RR) I_A(x+y) mu_1(dx) mu_2(dy)` = `int_R (int_R I_A(x+y) mu_1(dx)) mu_2(dy)` (and vice versa, by Tonelli).

Independence

Independent events and random variables

Borel-Cantelli lemmas, tail `sigma`-algebras and Kolmogorov’s zero-one law

Let `(Omega, F)` be a measurable space, and `{A_n}_(n>=1)` be a sequence of sets, then:

Called the zero-one law.

The tail `sigma`-algebra of a sequence of random variables `{X_n}` on a ps, is `tau = nnn_(n=1)^(oo) sigma(: {X_j: j >= n} :)`, any `A in tau` is called a tail event, and any `tau`-measurable rv is called a tail random variable. Tail events are determined by behaviour for large n and remain unchanged if any finite subcollection is dropped or changed.

Kolmogorov’s 0-1 law: Let `{X_n}_(n>=1)` be a sequence of independent rv’s on a probabily space `(Omega, F, P)`, and `tau` the tail `sigma`-algebra on `{X_n}`, then `P(A) = 0 or 1 AA A in tau`.

Let `tau` be the tail `sigma`-algebra of `{X_n}_(n>=1)`, and let `X` be a tail random variable `X: Omega -> barRR` then `EE c in barRR "st" P(X = c) = 1`.

Probability spaces

Kolmogorov’s probability model

Random variables and random vectors

Change of variable formula: Let `X` be an rv on `(Omega, F, P)`, and `h: RR -> RR` be measurable. Let `Y = h(X)` then:

Moments

For any positive integer `n`, the nth moment, `mu_n` of an rv X is defined by `mu_n = EX^n`. The moment generating function is defined as `M_X(t) = E(e^(tX)) AA t in RR`. Since `e^(tX) > 0`, `E(e^(tX))` is well defined. If `X` is a non-negative rv then `M_X(t) = sum_(n=0)^oo (t^n mu_n)/(n!)`.

`X` an rv, `M_X(t) < oo AA |t| < epsi` for some `epsi > 0`, then:

If the mgf is finite `AA |t| < epsi, epsi >0` then `M_X(t)` has a power series expansion in `t` around 0, and `mu_n/n!` is simply the coefficient of `t^n`. Note, the MGF uniquely defines the moments, but the moments do not uniquely define the MGF without extra conditions eg. `sum mu_2n^(-1/2n) = oo` or distribution has bounded support.

Product moments

Let `vecX = (X_1, X_2, ..., X_k)` be a random vector. The product moment of order `r = (r_1, r_2, ..., r_k)` `mu_r = mu_(r_1, r_2, ..., r_k) = E(X_1^(r_1) X_2^(r_2) ... X_k^(r_k))`. Similar properties as above apply.

Kolmogorov’s consistency theorem

A stochastic process with index set `A` is a family `{X_alpha : alpha in A}` of random variables defined on a ps. May also be viewed as a random real valued function on the set `A` by the identification `omega -> f(omega, *)` where `f(omega, alpha) = X_alpha (omega)` for `alpha in A`.

The family `{mu_((alpha_1, alpha_2, ..., alpha_k))}(A) = P((X_(alpha_1), X_(alpha_2), ...,, X_(alpha_k) in A) : a_i in A)` of probability distributions is called family of finite dimensional distributions (fdd) associated with `{X_alpha: alpha in A}`. Satisfies the following consistency conditions:

(If `A` is countable, and indexed with `NN`, these conditions are equivalent to: `mu_n` is a pm on `(RR^n, B(RR^n))`, and `mu_(n+1)(B xx RR) = mu_n(B), AA n in NN`)

Given a family of probability distributions `Q_A`, does there exist a real-valued stochastic process `{X_alpha: alpha in A}` st. its fdds coincides with `Q_A`. Kolmogorov’s consistency theorem. Let `A != O/`, `Q_A = {nu_((alpha_1, ..., alpha_k)) : alpha_i in A}` st. `nu_(...)` is a probability distribution on `(RR^k, B(RR^k))` and the consistency conditions hold. Then there exists a probability space and a stochastic process st `Q_A` is the fdds associated with the stochastic process.

In other words, if `Q_A` satisfies the appropriate conditions, there exists `f: A xx Omega -> RR` st `AA omega, f(*, omega)` is a function on `A`, and for each `(alpha_1, ..., alpha_k) in A^k` the vector `(f(alpha_1, omega), ...,, f(alpha_k, omega))` is a random vector with probability distribution `nu(alpha_1, ..., alpha_k)`

Sketch proof

`A = {a}`

`Q_A = {nu_a}`, a single probability distribution. Take `(Omega=RR, F = B(RR), PX^(-1) = nu_a), X(omega) = omega`. Then `X` is an rv on `(Omega, F, P)`

`A = {a_1, a_2, ..., a_k}, k < oo`

`Q_A = {nu_(a_(alpha_1), ..., a_(alpha_k)): alpha_1, ..., alpha_k in (1,2,...k)}`. Take `(Omega = RR^k, F = B(RR^k), P = nu_(a_1, ..., a_k)), X(omega) = omega`.

`A = NN`

`Omega = RR^NN = {omega: omega in (x_1, x_2, ...)}`. `F = sigma (: C :)`, where `C` is the semi-algebra of fd events, where an event is a fd event in `EE n_0 < oo and B in RR^(n_0) “st” A = {omega: omega = (x_1, x_2, ..., x_(n_0)), (x_1, ..., x_(n_0)) in B}, aka a finite dimensional cylinder set.

Use the given fd distribution to define `P` on `C`. Then apply extension theorem, checking the conditions on `(C, P)`.

Take `(Omega = RR^NN, F = sigma(: C :), P)), X_n(omega) = x_n if (omega = (x_1, ...))`. Then `{X_n(omega); n=1,...}` is a stochastic process on `(Omega, F, P)` and has fdds `Q_NN`.

`A != O/`

Let `Q_A = ` family of fdds satistfying consistency theorems. Want to construct a ps `(Omega, F, P)` and a family of random variables `{X_alpha: alpha in A}` with `Q_A` as its fdds.

Let `Omega = RR^A`, and `F = R^A`. Define a set function `P(D) = mu_(alpha_1, ..., alpha_k)(B)` for a `D in C` with representation `D = {omega: omega in RR^A, (omega(alpha_1), ..., omega(alpha_k)) in B}`. Now show that `P(D)` is indepedent of the representation of `D`, and is countably additive of `C`. Then by the extension procedure, there exists a unique extension of `P` to `F` st. `(Omega, F, P)` is a ps. Define `X_alpha(omega) = pi_alpha(omega) = omega(alpha)` for `alpha in A` yields a stochastic process with fdds `Q_A`.

Limitations

`Omega = RR^A` is rather large, and `F = B(RR)^A` is rather small and it can be shown that `F` coincides with the class of events that depend only a countable number of coordinates of `omega`. When `A` is an interval on `RR` this can be overcome (eg.) by restricting `Omega` to continuous functions.

Convergence in distribution

Defintitions and basic properties

Let `{X_n}_(n>=0)` be a collection of rv, and let `F_n` denote the cdf of `X_n`. Then `{X_n}_(n>=1)` is said to converge in distribution, or weakly, written `X_n ->^d X_0` if:

Does not require that random vairables be defined on a common PS.

Prop: If `X_n ->^p X_0` then `X_n ->^d X_0`. Converse false in general, but if `X_n ->^d X_0` and `P(X_0 = c) = 1, c in RR`, then `X_n ->^p c`

Prop: If a cdf `F` is continuous on `RR` then it is uniformly continuous on `RR`.

Th: `{X_n}_(n>=0)` a collection of rv, with cdfs `{F_n}_(n>=0)`.
Then `X_n ->^d X` iff there exists a dense set `D sub R` st `lim_(n->oo) F_n(x) = F_0(x) quad AA x in D`.

Polya's th: `{X_n}_(n>=0)` a collection of rvs, with cdfs `{F_n}_(n>=0)`, if `F_0` is continuous on `RR` then `spr_(x in RR) |F_n(x) - F_0(x)| -> 0` as `n->oo`.

Slutysky's th: `{X_n}_(n>=1), {Y_n}_(n>=1)` sequences of rv, st. `(X_n, Y_n)` is defined on a PS `(Omega_n), F_n, P_n)`.
If `X_n ->^d X_0` and `Y_n ->_p a in RR` then

Asymptotic normality

A special case of `F_n -> F_0`. A seq of rv's `{X_n}_(n>=1)` is said to be asymptotically normal with asymptotic mean `mu_n` and variance `sigma_n^2 > 0` if for sufficient large `n` (`EE n_0 > 0 "st" AA n >= n_0`) `(X_n - mu_n)/(sigma_n) ->^d N(0,1) "as" n->oo`. Write `X_n` as `"AN"(mu_n, sigma_n^2)`.

Vague convergence, Helly-Bray theorems and tightness

Bolzano-Weisenstraus th: If `A sub [0,1]` is infinite, then `EE {x_n}_(n>=1) "st" lim_(n->oo) x_n = x` exists in `[0,1]` (but not necessarily in A unless A is closed). There is an anologue of this for sub-probability measures (ie. `mu(RR) <= 1`).

`{mu_n}_(n>=1), mu` subprobability measures on `(RR, B(RR))`. `mu_n ->^v mu` converges vaguely if `EE D sub RR, D "dense"` and `mu_n(a,b] -> mu(a,b] qquad AA a,b in D`. For probability measures, `->^d <=> ->^v`.

Helly's selection th: If `A` is an infinite collection of sub-probability measures on `(RR, B(RR))`.
Then there exists a sequence `{mu_n}_(n>=1) sub A` and a sub-probability measure `mu` st `mu_n ->^v mu`.

Helly-Bray theorem for vague convergence: `{mu}_(n>=1), mu` sub-pm on `(RR, B(RR))`. Then `mu_n ->^v mu` iff `int f dmu_n -> int f dmu quad AA f in C_0(RR)` , where `C_0(RR) = {g | g: RR -> RR " is continuous and " lim_(|x| -> oo) = 0}`.

Helly-Bray theorem for weak convergence: `{mu}_(n>=1), mu` pm on `(RR, B(RR))`. Then `mu_n ->^v mu` iff `int f dmu_n -> int f dmu AA f in C_B(RR) = {g | g: RR -> RR " is continuous and bounded"}`.

Tightness

A sequence of pm's on `(RR, B(RR))` is called tight if `AA epsi > 0 EE M_epsi in (0, oo) "st" spr mu_n[-M,M]^c < epsi`

A sequence of rv's is called tight or stochastically bdd if the sequence of probability dists is tight, ie `AA epsi > 0 EE M_epsi in (0, oo) "st" spr(P|X_n| > M) < epsi`. Denoted `X_n = O_p(1)`.

In general, given a stochastic quantity `T_n`, the stochastic order of `T_n - E(T_n)` is determined by the order the variance, `sigma^2`, if it exists.

`P(|T_n - mu_n | /(sigma_n)| > m) = P(|T_n - mu_n| > m sigma_n) <= (sigma_n^2)/(m^2 sigma_n^2) = m^(-2)` (Chebychev). ` AA epsi > 0 quad EE m "st" m^(-2) < epsi^2 P(|T_n - mu_n|/(sigma_n) > m) < epsi => |T_n - mu_n|/(sigma_n) in O_p(1) => T_n - mu_n in O_p(sigma_n)`
If `sigma_n^2 -> 0` then `T_n - mu_n = o_p(1)`.
If `mu_n = 0`, then `T_n = O_p(sigma_n)`

T1.2.8 `{X_n}, {Y_n}` sequences of rv's. `X_n = O_p(1), Y_n = o_p(1)`.
Then:

T1.2.9 Let `{mu_n}_(n>=1)` be pm's.
`{mu_n}` is tight iff it is relatively compact, ie for all subsequences `{mu_(n_i)}_(i>=1)` there exists a further subsequence `{mu_(m_i)}_(i>=1)` of `{mu_(n_i)}_(i>=1)` and pm `mu` on `(RR, B(RR))` st `mu_(m_i) ->^d mu`.

T1.2.10 `{mu_n}_(n>=1), mu`, pm's on `(RR, B(RR))`. Then `mu_n ->^d mu` iff `{mu_n}` is tight and all weakly convergent subsequences converge to `mu`.

Convergence of probability and sub-probability measures on general metric spaces

A set `A` is:

A sequence is Cauchy is `AA epsi >0 EE N_epsi st. AA n,m > N_epsi, d(n,m) < epsi`

A metric space `(S, d)` is:

D1.3.1 Let `{mu_n}, mu` be pm on `(S, ccS)`. If `int f dmu_n -> int f dmu quad AA f in C_B(S)` then `mu_n ->^d mu`.

L1.3.1 If `F` is closed in `(S, d)` then ` AA epsi >0 EE f in C_B(S)` st `f(x) = 1 if x in F, f(x)=0 if d(x, F) >= epsi) and f(x) in [0,1] "ow"`. The f can be uniformly continuous.

T1.3.1 Let `{mu_n}_(n>=1)` be pm on `(S, ccS)`. Then the following are equivalent:

D1.3.2

D1.3.3 A family of probability measures `Pi` on `(S, ccS)` is relatively compact if every sequnece of pm's in `Pi` contains a weakly convergent subsequence `{mu_(n_i)}_(i>=1)` and a pm `mu` (not necessarily in `Pi`) st `mu_(n_i) ->^d mu`.

T1.3.3 (Prohorov direct half) For a family of pm's, tightness `=>` relative compactness. T1.3.4 (Prohorov converse half) If `(S, ccS)` is Polish, then relative compactness `=>` tightness.

Skorokhod's construction and continuous mapping theories

Let `F` be a df on `RR`, and for any `0 < p < 1` we define the quartile function `F^(-1)(p) = inf{x | F(x)>=p} = spr{x | F(x) < p}`.

L1.4.1 Let F be a df, then `F^(-1)` is non-decreasing and left-continuous, also saticcying:

L1.4.2 If `F_n -> F` then the set `D = {t | t in [0,1], F_n^(-1) !-> F^(-1)}` is at most countable.

T1.4.3 (Skorokhod). Let `{X_n}_(n>=1)` and `X` be rv's on `(RR, B(RR))` st `X_n ->^d X`.
Then there exists rv's `{Y_n}_(n>=1)` and `Y` on `((0,1), B(0,1), "LM")` st `X_n =^d Y_n` and `X =^d Y` and `Y_n ->^(wp1) Y`

Continuous mapping theorems

`f: RR -> RR`, Borel measurable st `P(D_f) = 0`

P1.4.4 If `X_n ->_("or p")^(wp1) X` and then `f(X_n) ->_("or p")^(wp1) f(X)` respectively.

T1.4.5 If `X_n ->^d X` then `f(X_n) ->^d f(X)`.

Convergence of moments

`X_n ->^d X iff Ef(X_n) -> Ef(X) quad AA f in C_B(RR)`. However to ensure `E|X_n|^k -> E|X|^k` we need extra conditions.

D1.5.1 A sequence of random variables `{X_n}_(n>=1)` is uniformly integrable if `lim_(A->oo) spr_n E(|X_n| I(|X_n| > A)) = 0`, or `lim_(A->oo) int_(|X_n| > A) dP = 0` uniformly over `n`.

L1.5.1 A sequence of random variables is u.i. iff:

L1.5.2 If `EE epsi > 0 "st" spr_n E|X_n|^(1+epsi) < oo` then `{X_n}` is u.i.

T1.5.3 If `X_n ->^d X "in" (RR, B(RR))` and `{X_n}^r` is u.i. for some `r > 0` then:

T1.5.4 If `X_n ->^d X` and `E|X_n|^r -> E|X|^r < oo, r > 0` then `{X_n}^r` is u.i.

T1.5.5 (Frechet-Shoket). Let `{X_n}` be a sequence of random variables st `EX^k -> m_k ^d X`

Sufficient conditions for convergence

T2.5.1 `f in C_B => AA h > 0 f_h in C_B`. `f in C_BU => f_h -> f` uniformly in `RR` as `h -> 0`.

T2.5.2 Let `{mu_n}` and `mu` be pms of `(RR, B(RR))`. If `AA f in C_B^oo` `int f dmu_n -> int f dmu` then `mu_n ->^d mu`.

Characteristic functions

Definition and basic properties

Let `X` be a random variable on `(RR, B(RR))` with probability measure `mu` and distriubtion function `F`

D2.1.1 The characteristic function of X is: `phi_X(t) = E(e^(itX)) = int_Omega e^(i t omega) dP(omega) = int_R e^(itx) dmu(x) = int_(-oo)^(oo) e^(itx) dF(x) AA t in RR`

Properties:

P2.1.2 `|phi_X(t_0)| = 1 iff X "is a lattice rv"`

P2.1.3 If `F` is absolutely continuous, then `lim_(|t|->oo) phi_X(t) = 0`

Inversion formula

T2.2.1 `mu(x_1, x_2) + 1/2(mu{x_1} + mu{x_2}) = lim_(T->oo) int_(-T)^T (e^(-itx_1) - e^(-itx_2))/(it) phi_X(t) dt`

C2.2.2 If `x_1, x_2 in C(F)` then `mu(x_1, x_2) = lim_(T->oo) int_(-T)^T (e^(-itx_1) - e^(-itx_2))/(it) phi_X(t) dt`

T2.2.3 (Uniqueness theorem) If 2 pm's `mu_1, mu_2` have the same cf's `phi_1, phi_2` then `mu_1 = mu_2`.

T2.2.4 If `phi_X in L^1` then `F` is absolutely continuous with density `f(x) = 1/(2pi) int_R e^(-itx) phi_X(t) dt) < oo`.

T2.2.6 `AA x_0 in R mu{x_0} = lim_(T->oo) int_(-T)^T e^(itx_0) phi_X(t) dt`.

C2.2.7 `sum_(x in RR) mu{x}^2 = lim_(T->oo) 1/(2T) int_(-T)^T |phi_X(t)|^2 dt`

Convergence theorems and applications

Let `{X_n}_(n>=1), X` be random variables on `(RR, B(RR))` with characteristic functions `{phi_n}_(n>=1), phi)` respectively.

T2.3.1 If `X_n ->^d X` then `phi_n -> phi` uniformly for any finite interval, ie. `spr_(|t| < K) |phi_n(t) - phi(t)| -> 0`

L2.3.1 `AA delta > 0 mu({x | x delta > 2}) <= 1/delta int_(-delta)^delta |1 - phi(x)| dx`

L2.3.2 If `phi_n(t) -> phi(t) < oo AA |t| <= delta_0`, and `phi` is continuous at 0, then `{mu_n}` is tight.

T2.3.4 (Levy-Cramer) if `lim_(n->oo) phi_n(t) = phi(t) in R` and `phi` is continuous at 0, then:

T2.3.5 `If E|X|^r < oo, r in NN` then `phi_x` is r-times continuously differentiable and `phi_X^((r))(t) = E( (iX)^r e^(itx))`. Conversely, if `phi_X^((r))` exists for even integer `r` then `X` has finite rth order (absolute) moments.

T2.3.6 If `E|X|^r < oo`, `r > 1`, then `phi_X` admits the follow Tayloring expansion around `t=0`:

P2.3.7 Suppose `{c_n} sub CC -> c ` then `lim_(n->oo) (1+(c_n)/n)^n = e^c`.

Characteristic functions in `RR^k`

D2.4.1 Let `X = (X_1, ..., X_k)` be a random vector on `RR_k` with pm. The characteristic function of `X` is `phi_X(vec t) = E e^(it^T X) = int e^(it^T x) mu(dx_1, ..., dx_2)`.

P2.4.1 A pm `mu` on `(RR^k, B(RR^k))` is determined by its values assigned to `ccH = {H_(ac) | H_(ac) = {X in R^k, a'X <= c, AA a in RR^k, c in RR}}`

T2.4.2 (Cramer-Wald). Let `{X_n}` be a sequence of random vectors and `X` an rvec on `(RR^k, B(RR^k))` then `X_n ->^d X iff a'X_n ->^d a'X quad AA a in RR^k`.

Central limit theorems

Liapounov's theorem

D3.1.1 For each `n >= 1`, let `{X_(n_1), X_(n_2), ..., X_(n_k)}` be a collection of rvs on `(Omega_n, F_n, P_n)` such that `X_(n_1), ...` are independent, where `k_n -> oo` as `n->oo`. Then `{X_(nj), 1 <= j <= k_n}_(n>=1)` is called a double array (DA). If `k_n = n`, called a triangular array.

Want to establish `(S_n - a_n)/(b_n) ->^d N(0,1)` for `{a_n}, {b_n} sub RR`.

L3.1.1 Let `{theta_(nj)}` be a DA of complex number such that as `n -> oo`:

then `prod (1 + theta_(nj)) -> e^theta`

Proof: Use fact that `log(1 + z) = sum (-1^(m-1) z^m)/(m!)`. Show that `|log(1 + theta_n) - theta_n|` bdd uniformly by 1, hence `log(1+ theta_j) < theta_(nj) + k|theta_(nj)|^2`. Look at sum, and note first part `-> theta` and second part `-> 0`

T3.1.2 (Liapounov) For a DA `{x_(nj)}`, `gamma_n < oo AA n`, if `(gamma_n)/(sigma_n^3) -> 0` then `(S_n - alpha_n)/(sigma_n) ->^d N(0,1)`

Proof: Let `phi_X(T)` be cdf of `(X_(nj) - alpha_(nj))/(sigma_n)`. Show `phi_X(t) - 1` meets assumptions of L3.1.1, and hence converges to `e^(-t^2/2)`

T3.1.3 `{X_n}_(n>=1)` a sequence of rvs, `gamma_n < oo`, if `(sum gamma_j)/(sigma_n^3) -> 0` then `(S_n - sum alpha_j)/(sigma_n) -> N(0,1)`

C3.1.4 For a DA `{X_(nj)}` if `|X_n - alpha_j| <= M_(nj)` wp1 and `lim_(n->oo) max M_(nj) = 0` then `|(S_n - E(S_n))/(delta_n)| ->^d N(0,1)`.

D3.1.2 A null array saticcies one of the following conditions:

Lindeburg-Feller CLT

D3.2.1 A DA satisfies the Lindeburg condition (LC) if `lim_(n->oo) 1/(sigma_n^2) sum_(j=1)^k E(X_(nj)^2 I(|X_(nj)| > epsi sigma_n^2)) = 0 quad AA epsi > 0`

L3.2.1 Let `u(m,n): NN xx NN -> RR` st `AA m lim_(n->oo) u(m,n) -> 0`. Then there exists a `uarr` sequence `{M_n}_(n>=1) -> oo` st `lim_(n>oo) u(M_n,n) = 0`

T3.2.1 (Lindeburg-Feller). For a DA. Assume `Var(X_(nj)) < sigma_(nj)^2 < oo AA n`, `alpha_(nj) = 0`. If LC then:

D3.2.1 A sequence of rv `{X_n}` is m-dependent if `EE m in NN => AA >=1, j >=m => X_n+j "is independent of " F_n=sigma{X_j:1<=j<=n}`. If `m = 0` then `{X_n}` is independent.

T3.2.3 Let `{X_n}` be a sequence of `m`-depedendent uniformly bounded random variables st `(sigma_n)/(n^(1/3)) -> oo` then `(S_n - alpha_n)/(sigma_n) ->^d N(0,1)`.

Proof. Split up into large and small blocks. Show `S_n = S_n^' + S_n^('') + S_n^(''')`, show `(S_n^(''))/(sigma_n) "and" (S_n^('''))/(sigma_n) -> 0`, then `(sigma_n^')/(sigma_n) ->0 " and" (S_n^')/(sigma_n^') - >^d N(0,1)` as LC condition fulfilled.

Functional central limit theorem

D3.3.1 The Weiner measure is a probability measure on `(C, ccC)` corresponding to a stochastic prcoess `X_t quad t in [0,1]` having two properties:

Remarks:

T3.3.1 Let `{P_n}_(n>=1)` and `P` be probability measures. If all finite dimensional distributions of `P_n` converge to `P` and `{P_n}` is tight, then `P_n ->^d P`

T3.3.2 Consider `{X_n}` and `X` continous random functions on `(C, ccC)` and if

then `X_n ->^d X`

T3.3.3 If `{X_n0}_(n>=1)` is tight and `AA epsi, eta > 0 EE delta in (0,1) "and" n_0 "st" AA n >= n_0 1/delta P(sup_(t <= s <= t+delta) |X_n(s) - X_n(t)| > epsi) < eta AA t in [0,1]` then `{X_n}` is tight.

C3.3.4 Let `{X_n}` and `X` be random functions on `(C, ccC)`, if

then `X_n ->^d X`

Donska. `Z_n ->^d W`. Mapping can be extended to `(C, ccC)`

Conditional expectation and probability

Conditional expectation

D4.1.1 Let `(Omega, ccF, P)` be a ps, and `ccG sub ccF` a sub-`sigma`-field. Let `X` be a random variable with `E|X| < oo`, then the conditional expectation of `X` given `ccG`, `E[X|ccG] : Omega -> RR` and:

Remarks:

P4.1.1 For any rv `X` with `E|X| < oo` and a sub-`sigma`-field `ccG`, `E(X|ccG)` exists and is unique wp1.

P4.1.2 If `X` an rv `E|X| < oo`, `ccG` a sub `sigma`-field.
If `Z` is an integrable `ccG`-measurable rv st for a `pi`-class `D` with `sigma(D) = ccG` `EX = EZ` and `int_A Z dP = int_A X dP quad AA A in D` then `Z = E(X|ccG) "wp1"`.

C4.1.3 Let `ccG_1, ccG_2` be sub `sigma`-fields of `ccF` and `X, Y` be integrable rvs.

P4.1.4 (Properties of CE). Let `X, Y` be random variables on `(Omega, ccF, P)`, `ccG sub ccF` a sub-`sigma`-field. Then:

T4.1.5 `X` an rv, `|EX| < oo` and `ccG sub ccF`. If `Y` is a finite valued `ccG`-measurable random variable st `|E(X|Y)| <= oo`, then `E(XY | ccG) = Y E(X | ccG) "wp1"`.

T4.1.6 `{X_n}`, Y be random variables with `E|Y| < oo` and `ccG sub ccF` on `(Omega, ccF, P)` then:

T4.1.7 (Jensen). `Y` a `ccF`-measurable with `|EY| <= oo` and `g` be a finite convex function on `RR` with `|Eg(Y)| <= oo`. If for some `sigma`-field `ccG`:

then `g(X) <= E(g(X) | ccG "wp1"`

D4.1.2 If `EY^2 < oo`, the conditional variance of `Y` given sub-`sigma`-field `ccG` `Var(Y | ccG) = E(Y^2 | ccG) + E^2(Y | ccG)`

T4.1.8 Let `EY^2 < oo` then `Var(Y) = Var(E(Y| ccG)) + E(Var(Y|ccG))`

Conditional probability

D4.2.1 Let `(Omega, ccF, P)` be a ps. For a `B in ccF` and sub `sigma`-field `ccG sub ccF`, the conditional probability of `B` given `ccG`, `P(B | ccG) = E(I_B | ccG)`.

Remarks:

The properties of conditional expectation lead to the following properties of conditional probability:

Above suggests that `mu_(ccG)(*) = P(* | ccG)` is sub-additive wp1 for a given set of `{A_i}`. However, this may not be subadditive in general wp1, as the probability 1 set may change from set to set. Hence we may not be able to find a common probability 1 set `=> P(* | ccG)` may not be a pm on `ccF`.

D4.2.2 Let `ccF_1, ccG` be sub-`sigma`-fields of events in `(Omega, ccF, P)`. A regular conidtional probability on `F_1|ccG` is a function `mu: ccF_1 xx Omega -> [0,1]` satisfying:

T4.2.1 Let `P_omega(A) := P(A, w)` be a regular conditional probability. Given `ccG`, `X` is `ccF`-measuralbe with `|EX| <= oo` then `E(X|ccG)(omega) = int_Omega X dP_omega`

Fundamentals of statistical inference

Basic concepts

Parametric vs non-parametric porbability models:

D5.1.1 (Exponential family). A parameter family of pms `{P_theta}_(theta in Theta)` is dominated by a `sigma`-finite measure `nu` on `(bbX, ccx)` is called an exponential family iff `(dP_theta)/(dnu)(omega) = exp[ (eta(theta))^T T(omega) - zeta(theta)] h(omega)`, `omega in bbX`, where:

Remarks:

D5.1.2 (Location-scale family) Let `P` be a known pm on `(RR^k, B(RR^k))` and `nu sub RR^k` and `M_k = {k xx k " positive definite matrices"}`. The family of pms `{P_((mu, sigma_k)) = P(Sigma^(-1/2)( * - mu), mu in nu, Sigma in M_k}` is called a location-scale family on `(RR^k, B(RR^k))`

Sufficiency and completeness

D5.2.1 Given a random obsverable `X`, measurable function `T: bbX -> RR^d, d in NN^+` is called a statistic if `T(X)` is known whenever `X` is. `sigma(T(X)) sub sigma(X)`

D5.2.2 Let `(bbbX, sfX, sfP)` be an ops of `X`, and let `sfG` be a sub-`sigma`-field. `sfG` is sufficient for `sfP` if `AA in sfX quad P(A|sfG) = E_P(I_A | sfG)` does not depend on `P in sfP`. That is the conditional probability of `A | sfG` is the same for all `P in sfP`.

P5.2.1 `sfG` is sufficient for `sfP` iff for any bounded `sfX`-measurable function `f: bbbX -> RR` there exists a `sfG`-measurable function st `g = E_P(f | sfG) quad AA P in sfP`

L5.2.2 Suppose the ops is dominated by a `sigma`-finite measure `lambda` then there exits a countable subset `sfP_0 sub sfP` st `sfP << sfP_0`

C5.2.3 If a family of pm on `(bbbX, sfX)` is dominated by a `sigma`-finite measure `lambda` then `sfP` is dominated by a pm `Q = sum_i c_i P_i quad c_i >0, sum c_i = 1, P_i in sfP`

T5.2.4 (Halmann-Savage) Let `(bbbX, sfX, sfP^x)` be a dominated ops and `sfB` be a sub-`sigma`-field.

T5.2.5 (Factorisation theorem). Suppose `(bbbX, sfX, sfP)` an ops, `sfP << sigma-"finite" lambda`. Then `T(x)` is sufficient for `sfP` iff there exists non-negative measurable function `h: bbbX -> RR^d` does not depend on `P in sfP` and a non-negative `g_P` is measurable on `sigma(T(X))` st `dP/dlambda (x) = g_P(T(x))h(X) quad AA x in bbbX`

Sufficiency is very much determined by the structure of `sfP`. If `sfP` is not a proper set of models the discussion of sufficiency is quite hypothetical.

Minimal sufficient

D5.2.3 Let `(bbbX, sfX, sfP^x)` be an ops and `sfC sub sfX` be a sub-`sigma`-field. `sfC` is necessary for `sfP` if for all given sufficient `sigma`-fields `sfB` for `sfP` and any `C in sfC` there exists `B in sfB` st `P(B o+ C) = 0 quad AA P in sfP` (`o+` = XOR).

D5.2.2 A statistic `T:(bbbX, sfX) -> (RR^d,B(RR^d))` is necessary for `sfP` if for any sufficient statistic `S: (bbbX, sfX) -> (RR^d, B(RR^d))` there exists a measurable `H: RR^q -> RR^d` st `T = H(S) "wp1" P quad AA P in Q, q in NN^+, d in NN^+ uu {oo}`

A `sigma`-field is minimal sufficient if it is sufficient and necessary.

T5.2.6 Let `(bbbX, sfX, sfP)` be an ops, dominated by `sigma`-finite `lambda`. Then a minimal sufficient `sigma`-field for `sfP` exists.

T5.2.7 Let `(bbbX, sfX, sfP)` be an ops and `sfP_1 sub sfP` st `sfP << sfP_1`. If `T(X)` is sufficient for `sfP` and minimal sufficent for `sfP_1` then `T` is minimal sufficient for `sfP`

T5.2.8 Let `(bbbX, sfX, sfP)` be an ops where `sfP = {P_0, ..., P_k}` is a finite set of pms with densities `{f_1, ..., f_k}` wrt a `sigma`-finite `lambda`. Then `T(X) = ((f_1(x))/(f_0(x)), ..., (f_k(x))/(f_0(x)))I_A` is a minimal sufficient statistic for `sfP`

Completeness

D5.3.1 Let `(bbbX, sfX, sfP)` be an ops, and `sfB` be a sub-`sigma` field of `sfX`.

P5.3.1 Let `Y: bbbX -> bbbY` measurable, st `sigma(Y)` is complete for `sfP`. Let `sfP_y = {P @ Y^(-1)}`, then `sfP_y` is complete.

L5.3.2 Let `sfP = {P_y: eta in Xi}` be a natural expoential family woth density `(dP_eta)/(dnu)(x) = exp(eta^T T(x) - xi(eta))h(x) quad x in bbbX`. Suppose `T(X) = (Y(X), U(X))` and `eta = (theta, phi)` where `Y` and `theta` have same dimension. Then `Y` has density `f_y(y) = exp(theta^T Y - xi(y))` with a `sigma`-finite measure `lambda` depending on `phi`

If `T=Y` then `T(X)` has distriubtion in natural exponential family form.

Given an rv `X` its moment generating function (MGF) is defined as `psi_X(t) = E(e^(t^T X))`. Has similar properties to characteristic functions but `psi_X(t) ` can take value of `oo`

L5.3.3 Let `X` and `Y` be rvs in `RR^k`. If `psi_X(t) = psi_Y(t) < oo` for `|t| < delta, delta > 0`, then `X` and `Y` have the same distribution.

T5.3.4 Let `sfP = {P_eta; eta in Xi}` be a natural exponential family of full rank with density `(dP_eta)/(dnu)(x) = exp(eta^T T(x) - xi(eta))h(x)`. Then `T(X)` is complete.

T5.3.5 Let `(bbbX, sfX, sfP)` be an ops. If `S sub sfX` is bounded complete, and sufficient `sigma`-field for `sfP` then `S` is minimal sufficient for `sfP`

D5.3.2 Let `(bbbX, sfX, sfP)` be an ops. A `sigma`-field `sfB sub sfX` is ancilliary for `sfP` if `AA sfB`-measurable statistic `V(X)` the distribution of `V` does not depend on `P in sfP`

T5.3.6 If `sfB` is complete and sufficient and `sfC` is ancillary `sigma`-field for `sfP_1`, then `sfB` and `sfC` are independent for `P in sfP`.

Let `V(X)` and `T(X)` be two statistics on `X` on `(bbbX, sfX, sfP)`. If `V` is ancillary and `T(X)` is complete and sufficient for `sfP` then `V(X)` and `T(X)` are indepdendent for `P in sfP`

Decision theory

Deicision rules. Loss functions and risks

D6.1.1 A statistical decision problem consists of the following elements:

The goal of decision theory is to find the best decision rule given a loss function.

If `ccP` is parametric, then often write `L(theta, d(x))`

D6.1.2 A dr is `d_1`:

Let `ccT` be a class of decision rules. A dr d is `ccT`-optimal if `d` is as good as any other dr in `ccT`. If `ccT` contains all the possible dr's, then `d` is optimal if `d` is `ccT`-optimal

D6.1.2

Given a statistical decision problem, let `bbbD = {"all nrdr"}`, `bbbD_b=("all bdr")`. An nrdr is a degenerate bdr with `delta_d(x, A) = I_A(d(x))`. `bbbD sub bbbD_b`

D6.1.3 A randomised decision rule (rdr) `bar delta` is a pm on `(bbbD, ccD)`. The risk is `R(P, bar delta) = int_bbbD R(P, g) d bar delta(g)`

Remarks:

Admissiability and geometry of decision rules

D6.2.1 Let `ccT` be a class of drs. A dr `delta in ccT` is `ccT`-admissible if there is not dr `in ccT` that is better than `delta`.

The notion of admissibility is a retreat from `ccT`-optimality as the latter may not exist.

T6.2.1 Suppose `bbbA sub RR^k` is convex aand `delta in bbbD_b` st `int_bbbA ||a|| delta(x, d(a)) < oo quad AA x in ccX`. Let `d(x) = int_bbbA a delta(x, d(x)) quad AA x in ccX` (a nrdr). Then:

Geometry of decision rules

A helpful device to understand basics of decision rules. Assume `ccP = {P_1, ..., P_k}` a finite collectioon of pms. Given a dr `delta` define the k-dim risk profile `y_delta = (R(P_1, delta), ..., (R(P_k, delta)))`. Let `ccR_(k,r) = {y_delta in RR^k, AA delta in bbbD_r}`, `ccR_(k,r) = {y_delta in RR^k, AA delta in bbbD_b}`

T6.2.2 `ccR_(k,r)` and `ccR(b,r)` are convex.

D6.2.2 Let `X = (X_1, ..., X_k )` and `Y = (Y_1, ..., Y_k)`:

D6.2.3 `X in RR^k`, the lower quadrant of X is the set `Q_x = {Y in RR^k; Y <= X}`

D6.2.4

T6.2.3 `y_delta in ccR_(k,b)`. If `y_delta in lambda(ccR_(k,b))` then `delta` is admissible. The converse is true if `ccR_(k,b)` is closed.

Complete classes of decision rules

D6.3.1 Let `ccG sub bbbD_b` be a class of drs.

T6.3.1 Let `A(bbbD_b)` be the set of admissable drs in `bbbD_b`. If a MCC exists then it is `A(bbbD_b)`

T6.3.2 If `A(bbbD_b)` is complete, then it is a MCC.

T6.3.3 Suppose `ccP = {P_1, ..., P_k}` is finite. If `bbbD_b` is continuous from below, then `bbbD_0 {delta in bDDD: Y_delta in lambda(ccR_(k,b))}` is MCC.

T6.3.4 If `T(X)` is sufficient for `ccP` and `delta in bbbD_b` then `delta'(X,A) = E(delta(X,A) | T(X)) in bbbD_b` and `R(P, delta') < R(P, delta) quad AA P in ccP`

L6.3.5 Suppose `bbbA sub RR^k` is convex and `d_1, d_2 in bbbD`. Let `d(x) = (1/2)(d_1 + d_2)` then:

C6.3.6 Suppose `bbbA sub RR^k` is convex and `d_1, d_2 in bbbD` with same rish `AA P in ccP`. If loss is convex in `a quad AA P in ccP` and strictly convex for one `P_0 in ccP` and `R(P_0, d_1) = R(P_0, d_2) < oo` and `P(d_1 != d_2) > 0` then `d_1` and `d_2` are inadmissable.

T6.3.7 (Rao-Blackwell theorem). Let `bbbA` be a convex subset of `bbbR^k` and `T` be a sufficient statistic for `ccP`. Let `d` be an nrdr with `E_p||d(x)|| < oo quad AA P in ccP` and `d_0(x) = E_p(d |T)(x)`. Then

Bayes and minimax rules

So far we have compared drs `delta_1` and `delta_2` via their risk vectors/profiles, however, this multivariate comparison may not produce a "better" rule. Bayes and minimax rules used a univariate measure of risk.

D6.4.1 Given a statistical decision problem `(bbbX, ccX, ccP)`, `(bbbA, ccA)`, `L(p,a)` `delta in bbbD` which produce a risk `R(P delta)`. The Bays risk wrt a prior pm `Pi` on `(ccP, ccF_P)` is `R_Pi(delta) = int_ccP R(P, delta) Pi(dP)`.

Remarks:

D6.4.2 Let `ccT` be a set of drs. A dr `delta_0 in ccT` is the `ccT`-Bayes rule if `R_Pi(delta) = int_(delta in ccT) R_Pi(delta)`

T6.4.1 Let `ccP = {P_1, ..., P_k}` and `ccT` a family of drs. If `delta_0 ` is `ccT`-Bayes wrt to a prior `PI = (pi_1, ..., pi_k)`, `pi_i > 0 sum pi_i = 1` then `delta_0` is admissable.

T6.4.2 Let `ccP = {P_theta; theta in Theta sub RR^k}` st tevery open ball in `Theta` has a non-empty intersection with the interior with positive `Pi`-probability and `R(theta, delta) := R(P_theta, delta)` is continuous wrt `theta` on `Theta` for each `delta in ccT`. If:

then `delta_0` is `ccT`-admissable.

T6.4.5 If `ccP` is finite, and `delta` is `ccT`-admissable then there exists a prior `Pi` on `(ccP, ccF_P)` st `delta` is `ccT`-Bayes wrt `Pi`.

P6.4.4 (Lehmann's theorem) Let `T: (Omega, ccF) -> (Lambda, ccG)` measurable, `phi: (Omega, ccF) -> (RR^k, B(RR^k))` measurable. Then `phi` is `(Omega, sigma(T)) -> (RR^k, B(RR^k))` iff there exists a `psi: (Lambda, ccG) -> (RR, B(RR^k))` st `phi = psi * T`

D6.4.3 The conditional expectation of `X | Y=y` for some `y in RR^k` is `E(X|Y=y) = h(y)`

P6.4.5 Let `X` and `Y` be n- and m-dimensiional r. vectors. Suppose `P_((x,y))`, the pm of `(X,Y)` is dominated by `vu xx lambda` with density `f(x,y)` where `nu` and `lambda` are `sigma`-finite measure of `(RR^n, B(RR^n))` and `(RR^m, B(RR^m))`. Let `g(x,y): (RR^(n+m), B(RR^(n+m))) -> (RR, B(RR))` be measurable, st `E|g(x,y)| < oo`. Then `E(g(X,Y)) = (int g(x,Y) f(x,Y) nu(dx))/(int f(x,Y) nu(dx))` and `E(g(X,Y) | Y=y) = (int g(x,y) f(x,y) nu(dx))/(int f(x,y) nu(dx))`.

T6.4.6 (Existence of conditional distribution in a general case). Let `X` be a n-dim r. vec on `(Omega, ccF, ccP)`. `Y: (Omega, ccF) -> (Lambda, ccG)` measurable. Then there exists a regular cond pm `P_(X|Y)( * | y)` called the conditional distribution of `X|Y=y`, st

Furthermore, if `E|g(X,Y)| < oo` for `g: RR^n xx Lambda -> RR` measurable, then `E(g(X,Y) | Y=y) = E(g(X, y) | Y=y) = int_(RR^n) g(x,y) dP_(X|Y)(x|y) "wp1" P_Y`

Remark:

Construction of Bayes Rules

T6.4.7 (Bayes formula). Assume `ccP = {P_theta; theta in Theta}` is dominated by `sigma`-finite `nu` and `f_theta(x) = (dP_(X|theta)(x | theta))/(dnu)` is the cond density which is a function of `(x, theta)` is measurable on `(bbbX xx Theta, sigma(ccX xx ccF_Theta))`. Let `Pi` by a prior pm on `(Theta, ccF_Theta)` Assume `m(x) = int_Theta f_theta(x) dPi > 0 ` then:

Remarks:

T6.4.8 Under the conditions of T6.4.6 and `Theta, bbbA` convex. Then

Minimax rules

D6.4.4 A dr `delta_o in ccT` is the minimax rule if `spr_(P in ccP) R(P, delta_0) = inf_(delta in T) spr_(P in ccP) R(P, delta)`. A minimax rule has the smallest worst-case risk.

D6.4.5 If a dr `delta` has constant risk if it is an equiliser rule, ie `R(P, delta) = c quad AA P in ccF`

T6.4.6 If `delta` is an equiliser rule and is admissible then it is minimax.

T6.4.9 Suppose `{delta_i}_{i>=1}` is a sequence of drs, and each `delta_i` is Bayes wrt `Pi_i`. If `R_(Pi_i) -> c < oo` and `delta_0` is a dr with `R(P, delta_0) <= c quad AA P in ccP` then `delta_0` is minimax.

C6.4.10 If `delta_0` is an equiliser rule and is Bayes wrt `PI_0` then it is minimax.

Remarks:

D6.4.6 A prior `Pi_0` is least favourable if `R_(Pi_0) = sup_Pi R_Pi`.

If a dr `delta` which is least favourable has the best chance of being minimax.

C6.4.11 If `delta` is Bayes wrt `Pi` and `R(p, delta) <= R_Pi = int_(delta in T) R_Pi(delta)` then `delta` is minimax and `Pi` is least favourable.

Unbiased estimators and invariance drs

D6.5.1 `(bbbX, ccX, ccP = {P_theta, theta in Theta})`

T6.5.1 (Lehmann-Scheffe) Suppose that:

If there exists an UE of a parametric fn `gamma(theta)` then there exists a best UE (BUE) of `gamma(theta)`

Invariance drs

Would be nice to have decision rules that are invariant to transformations of r. obs.

D6.5.2 Let `G != o/`, and `@` be a binary operator on `G`. `(G, @)` is called a groupp if:

`e` is unqiue and is the identity element of `G`. `h` is the inverse of `g`.

D6.5.3 Let `(bbbX, ccX)` be a measurable space. `G` is a group of measurable transformations (GMT) if :

D6.5.4 Let `(bbbX, ccX, {P_theta, theta in Theta})` be ps for r. obs `X` and `G` is a GMT on `(bbbX, ccX)`, then `ccP` is invariant under `G` if `AA theta in Theta, g in G` there exists a unique `theta' in Theta` st `P_theta(g^(-1)(A)) = P_(theta')(A) quad AA in bbbX`