Convergence in distribution
Defintitions and basic properties
Let `{X_n}_(n>=0)` be a collection of rv, and let `F_n` denote the cdf of `X_n`. Then `{X_n}_(n>=1)` is said to converge in distribution, or weakly, written `X_n ->^d X_0` if:
- `lim_(n->oo) F_n(x) = F_0(x) quad AA x in C(F_0)` where `C(F_0) = {x in RR: F_0 "continuous at" x}`, or
- `mu_n(a, b] -> mu(a, b]`
Does not require that random vairables be defined on a common PS.
Prop: If `X_n ->^p X_0` then `X_n ->^d X_0`. Converse false in general, but if `X_n ->^d X_0` and `P(X_0 = c) = 1, c in RR`, then `X_n ->^p c`
Prop: If a cdf `F` is continuous on `RR` then it is uniformly continuous on `RR`.
Th: `{X_n}_(n>=0)` a collection of rv, with cdfs `{F_n}_(n>=0)`.
Then `X_n ->^d X` iff there exists a dense set `D sub R` st `lim_(n->oo) F_n(x) = F_0(x) quad AA x in D`.
Polya's th: `{X_n}_(n>=0)` a collection of rvs, with cdfs `{F_n}_(n>=0)`, if `F_0` is continuous on `RR` then `spr_(x in RR) |F_n(x) - F_0(x)| -> 0` as `n->oo`.
Slutysky's th: `{X_n}_(n>=1), {Y_n}_(n>=1)` sequences of rv, st. `(X_n, Y_n)` is defined on a PS `(Omega_n), F_n, P_n)`.
If `X_n ->^d X_0` and `Y_n ->_p a in RR` then
- `X_n + Y_n ->^d X + a`
- `X_n Y_n ->^d aX`
- `X_n/Y_n ->^d X / a` provided `a != 0`.
Asymptotic normality
A special case of `F_n -> F_0`. A seq of rv's `{X_n}_(n>=1)` is said to be asymptotically normal with asymptotic mean `mu_n` and variance `sigma_n^2 > 0` if for sufficient large `n` (`EE n_0 > 0 "st" AA n >= n_0`) `(X_n - mu_n)/(sigma_n) ->^d N(0,1) "as" n->oo`. Write `X_n` as `"AN"(mu_n, sigma_n^2)`.
- `{mu_n}, {sigma_n^2}` are not necessarily the mean and variance of `X_n`, (`X_n` might not have moments)
- if `X_n` is `AN(mu_n, sigma_n^2)` it is uncertain what `x_n` will converge to, eg. if `mu_n = mu = 0`, `sigma_n -> 0`, `X_n ->p 0`.
- if `X_n` is `AN(mu_n, sigma_n^2)` then `x_n` is `AN(bar mu_n, bar sigma_n^2)` iff `(sigma_n)/(bar sigma_n) -> 1` and `(bar mu_n - mu_n)/(sigma_n) -> 0`
- `X_n` is `AN(mu_n, sigma_n^2)` the sequence of asymptotic means and variances are not unique
- `X_n` is `AN(mu_n, sigma_n^2)` then `a_n x_n + b_n` is `AN(mu_n, sigma_n^2)` iff `a_n -> 1` and `(mu_n(a_n -1) + b_n)/(sigma_n) -> 0`
Vague convergence, Helly-Bray theorems and tightness
Bolzano-Weisenstraus th: If `A sub [0,1]` is infinite, then `EE {x_n}_(n>=1) "st" lim_(n->oo) x_n = x` exists in `[0,1]` (but not necessarily in A unless A is closed). There is an anologue of this for sub-probability measures (ie. `mu(RR) <= 1`).
`{mu_n}_(n>=1), mu` subprobability measures on `(RR, B(RR))`. `mu_n ->^v mu` converges vaguely if `EE D sub RR, D "dense"` and `mu_n(a,b] -> mu(a,b] qquad AA a,b in D`. For probability measures, `->^d <=> ->^v`.
Helly's selection th: If `A` is an infinite collection of sub-probability measures on `(RR, B(RR))`.
Then there exists a sequence `{mu_n}_(n>=1) sub A` and a sub-probability measure `mu` st `mu_n ->^v mu`.
Helly-Bray theorem for vague convergence: `{mu}_(n>=1), mu` sub-pm on `(RR, B(RR))`. Then `mu_n ->^v mu` iff `int f dmu_n -> int f dmu quad AA f in C_0(RR)` , where `C_0(RR) = {g | g: RR -> RR " is continuous and " lim_(|x| -> oo) = 0}`.
Helly-Bray theorem for weak convergence: `{mu}_(n>=1), mu` pm on `(RR, B(RR))`. Then `mu_n ->^v mu` iff `int f dmu_n -> int f dmu AA f in C_B(RR) = {g | g: RR -> RR " is continuous and bounded"}`.
Tightness
A sequence of pm's on `(RR, B(RR))` is called tight if `AA epsi > 0 EE M_epsi in (0, oo) "st" spr mu_n[-M,M]^c < epsi`
A sequence of rv's is called tight or stochastically bdd if the sequence of probability dists is tight, ie `AA epsi > 0 EE M_epsi in (0, oo) "st" spr(P|X_n| > M) < epsi`. Denoted `X_n = O_p(1)`.
- `X_n ->^p 0` called stochastically small, `X_n = o_p(1)`
- any finite collection of pms is tight
- property of tightness analogous to notion of boundedness of sequence of real numbers
- a tight sequence may not converge, but will have one or more weakly convergent subsequences
In general, given a stochastic quantity `T_n`, the stochastic order of `T_n - E(T_n)` is determined by the order the variance, `sigma^2`, if it exists.
`P(|T_n - mu_n | /(sigma_n)| > m) = P(|T_n - mu_n| > m sigma_n) <= (sigma_n^2)/(m^2 sigma_n^2) = m^(-2)` (Chebychev).
` AA epsi > 0 quad EE m "st" m^(-2) < epsi^2 P(|T_n - mu_n|/(sigma_n) > m) < epsi => |T_n - mu_n|/(sigma_n) in O_p(1) => T_n - mu_n in O_p(sigma_n)`
If `sigma_n^2 -> 0` then `T_n - mu_n = o_p(1)`.
If `mu_n = 0`, then `T_n = O_p(sigma_n)`
T1.2.8 `{X_n}, {Y_n}` sequences of rv's. `X_n = O_p(1), Y_n = o_p(1)`.
Then:
- `X_n + Y_n = O_p(1)`
- `X_n Y_n = o_p(1)`
T1.2.9 Let `{mu_n}_(n>=1)` be pm's.
`{mu_n}` is tight iff it is relatively compact, ie for all subsequences `{mu_(n_i)}_(i>=1)` there exists a further subsequence `{mu_(m_i)}_(i>=1)` of `{mu_(n_i)}_(i>=1)` and pm `mu` on `(RR, B(RR))` st `mu_(m_i) ->^d mu`.
T1.2.10 `{mu_n}_(n>=1), mu`, pm's on `(RR, B(RR))`. Then `mu_n ->^d mu` iff `{mu_n}` is tight and all weakly convergent subsequences converge to `mu`.
Convergence of probability and sub-probability measures on general metric spaces
- interior: `A^@ = {x in A | EE epsi >0 "st" B(x, epsi) sub A}`
- closure: `barA = {x in S| EE {x_n}_(n>=1) sub A "st" x_n -> x}`
- boundary: `del A = barA - A^@`
- diameter: `delta(A) = spr_(x,y in A){d(x,y)}`
A set `A` is:
- bounded if `delta(A) < oo`
- compact if it is closed and bounded
A sequence is Cauchy is `AA epsi >0 EE N_epsi st. AA n,m > N_epsi, d(n,m) < epsi`
A metric space `(S, d)` is:
- complete if if every Cauchy sequence converges to a point in the space
- separable if `EE` a countable dense set `D sub S`
- Polish if complete and separable
D1.3.1 Let `{mu_n}, mu` be pm on `(S, ccS)`. If `int f dmu_n -> int f dmu quad AA f in C_B(S)` then `mu_n ->^d mu`.
L1.3.1 If `F` is closed in `(S, d)` then ` AA epsi >0 EE f in C_B(S)` st `f(x) = 1 if x in F, f(x)=0 if d(x, F) >= epsi) and f(x) in [0,1] "ow"`. The f can be uniformly continuous.
T1.3.1 Let `{mu_n}_(n>=1)` be pm on `(S, ccS)`. Then the following are equivalent:
- `mu_n ->^d mu`
- `lim_(n->oo) int f dmu_n = int f dmu AA f "bdd and uniformly continous"`
- `bar lim mu_n(F) < mu(F) AA F "closed"`
- `ul lim mu_n(G) > mu(G) AA G "open"`
- `lim_(n->oo) mu_n(B) = mu(B) AA B in ccS "st" mu(del B)=0`
D1.3.2
- A pm on `(S, ccS)` is tight if `AA epsi > 0` there exists a compact `K` st `mu(K) > 1-epsi`.
- Let `{mu_n}_(n>=1)` be a sequence of pms on `(S, ccS)`, `{mu_n}` is tight if `AA epsi > 0 quad EE "a compact" K "st" inf_(n>=1) mu_n(K) > 1- epsi`
- A sequence of random variables is tight if the associated sequence of probability measures is tight
D1.3.3 A family of probability measures `Pi` on `(S, ccS)` is relatively compact if every sequnece of pm's in `Pi` contains a weakly convergent subsequence `{mu_(n_i)}_(i>=1)` and a pm `mu` (not necessarily in `Pi`) st `mu_(n_i) ->^d mu`.
T1.3.3 (Prohorov direct half) For a family of pm's, tightness `=>` relative compactness. T1.3.4 (Prohorov converse half) If `(S, ccS)` is Polish, then relative compactness `=>` tightness.
Skorokhod's construction and continuous mapping theories
Let `F` be a df on `RR`, and for any `0 < p < 1` we define the quartile function `F^(-1)(p) = inf{x | F(x)>=p} = spr{x | F(x) < p}`.
L1.4.1 Let F be a df, then `F^(-1)` is non-decreasing and left-continuous, also saticcying:
- `F^(-1)(F(x)) <= x quad AA x in RR`
- `F(F^(-1)(t)) >t quad AA t in [0,1]`
- `F(x) >= t iff x >= F^(-1)(t)`
L1.4.2 If `F_n -> F` then the set `D = {t | t in [0,1], F_n^(-1) !-> F^(-1)}` is at most countable.
T1.4.3 (Skorokhod). Let `{X_n}_(n>=1)` and `X` be rv's on `(RR, B(RR))` st `X_n ->^d X`.
Then there exists rv's `{Y_n}_(n>=1)` and `Y` on `((0,1), B(0,1), "LM")` st `X_n =^d Y_n` and `X =^d Y` and `Y_n ->^(wp1) Y`
- valid for more general space
- for a df `F`, if `U ~ U(0,1)` then `F^(-1)(U)` is an rv with `F` as its df
- we know `X_n ->^(wp1) X => X_n ->^p X => X_n ->^d X`, T1.4.3 is a converse of this in a sense
Continuous mapping theorems
`f: RR -> RR`, Borel measurable st `P(D_f) = 0`
P1.4.4 If `X_n ->_("or p")^(wp1) X` and then `f(X_n) ->_("or p")^(wp1) f(X)` respectively.
T1.4.5 If `X_n ->^d X` then `f(X_n) ->^d f(X)`.
Convergence of moments
`X_n ->^d X iff Ef(X_n) -> Ef(X) quad AA f in C_B(RR)`. However to ensure `E|X_n|^k -> E|X|^k` we need extra conditions.
D1.5.1 A sequence of random variables `{X_n}_(n>=1)` is uniformly integrable if `lim_(A->oo) spr_n E(|X_n| I(|X_n| > A)) = 0`, or `lim_(A->oo) int_(|X_n| > A) dP = 0` uniformly over `n`.
L1.5.1 A sequence of random variables is u.i. iff:
- `spr_n E|X_n| < oo` and
- `AA epsi >0 EE delta_epsi >0 " st " AA E in ccF P(E) < delta => int_E |X_n| dP < epsi AA n`
L1.5.2 If `EE epsi > 0 "st" spr_n E|X_n|^(1+epsi) < oo` then `{X_n}` is u.i.
T1.5.3 If `X_n ->^d X "in" (RR, B(RR))` and `{X_n}^r` is u.i. for some `r > 0` then:
- `E|X|^r < oo`
- `EX_n^r -> EX^r`
- `E|X_n|^r -> E|X|^r`
T1.5.4 If `X_n ->^d X` and `E|X_n|^r -> E|X|^r < oo, r > 0` then `{X_n}^r` is u.i.
T1.5.5 (Frechet-Shoket). Let `{X_n}` be a sequence of random variables st `EX^k -> m_k
Sufficient conditions for convergence
- `C_B^oo = {f | f "has bdd derivatives of all orders on" RR}`.
- `n_h(x) = 1/(sqrt(2pi)h) e^(-x^2/(2h^2))`
- `f_h(x) = int^oo_(-oo) f(x-y) h_n(y) dy = int_(-oo)^oo h_n(x+y) f(x) dy`
T2.5.1 `f in C_B => AA h > 0 f_h in C_B`. `f in C_BU => f_h -> f` uniformly in `RR` as `h -> 0`.
T2.5.2 Let `{mu_n}` and `mu` be pms of `(RR, B(RR))`. If `AA f in C_B^oo` `int f dmu_n -> int f dmu` then `mu_n ->^d mu`.