A more in-depth treatment of many of these concepts can be found at: https://github.com/zsunberg/CU-DMUPP-Materials/blob/main/notes/STAT_219_Notes.pdf
\(X:\Omega \to E\)
\[I = \int_\Omega f(x) dx\]
\[I = \int_\Omega f(x) \mu(dx)\]
\[I \approx Q_N \equiv \frac{\int_\Omega \mu(dx)}{N} \sum_{i=1}^N f(X_i)\]
\[X_i \sim U(\Omega) \]
\[\text{E}[X] = \int_{-\infty}^{\infty}x \, p(x) \, dx\]
\[\approx \frac{1}{N} \sum_{i=1}^N X_i\]
In what sense does this converge as \(N\to\infty\)? How accurate is it?
Why are probability distributions not enough?
Consider this definition: Two random variables are equal if their probability distributions are the same.
\[A \sim \text{Bernoulli}(0.5)\]
\[B = \neg A\]
\[B \stackrel{?}{=} A\]
Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).
\[\omega \in \Omega\]
\[X(\omega) \in E\]
What is this function?
Example: Coin World
\[\Omega = \{H, T\}\]
\[E = [0, 1]\]
\[X(\omega) = \mathbf{1}_{\{H\}}(\omega)\]
\[\Omega = \{H, T\}^n\]
\[E = [0, 1]\]
\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]
\[\Omega = \{H, T\}^\infty\]
\[E = [0, 1]\]
\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]
Example: Many coins
\(\Omega\)
\(\mathcal{F} = \sigma(\{\quad\}) = \{\Omega,\quad, \quad, \emptyset\}\)
If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1\})\)?
If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1, 2\})\)?
The Borel \(\sigma\)-algebra for a topological space \(\Omega\) is the \(\sigma\)-field generated by all open sets in \(\Omega\).
The Borel \(\sigma\)-field on \(\mathbb{R}\) is \(\mathcal{B} = \sigma(\{(a, b): a, b \in \mathbb{R}\})\)
A probability measure \(P\) is a function \(P:\mathcal{F}\to [0,1]\) having the following properties:
Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).
A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)
\[f^{-1}(A) \in \mathcal{F}\]
Are there functions that are not Borel-measurable?
Yes! Example: \(\mathbf{1}_V\) where \(V\) is a Vitali Set
\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)
\(X = \mathbf{1}_{\{1,2\}}\)
A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)
\[f^{-1}(A) \in \mathcal{F}\]
Hint:
1. Use \(\sigma(\{1\})\) (what is it?)
2. What function has a preimage not in \(\sigma(\{1\})\)?
\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)
\(X = \mathbf{1}_{\{1,2\}}\)
A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)
\[f^{-1}(A) \in \mathcal{F}\]
Review: For a (deterministic) sequence \(\{x_n\}\), we say
\[\lim_{n \to \infty} x_n = x\]
or
\[x_n \to x\]
if, for every \(\epsilon > 0\), there exists an \(N\) such that \(|x_n - x| < \epsilon\) for all \(n > N\).
In what senses can we talk about random variables converging?
\(X\) = \(Y\) if \(X(\omega) = Y(\omega) \quad \forall \omega \in \Omega\)
In practice, there are often unimportant \(\omega\) where this is not true.
We say that \(X\) is almost surely the same as \(Y\) if \[P(\{\omega: X(\omega) \neq Y(\omega) \}) = 0\text{.}\]
This is denoted \(X \stackrel{a.s.}{=}Y\) and the terms almost everywhere (a.e.) and with probability 1 (w.p.1) mean the same thing.
\[X_n(\omega) \to X(\omega) \quad \forall \, \omega \in \Omega\]
\(X_n \stackrel{a.s.}{\to} X\) if there exists \(A \in \mathcal{F}\) with \(P(A) = 1\) such that \(X_n(\omega) \to X(\omega)\) for each fixed \(\omega \in A\).
Does sure convergence imply almost sure convergence?
\(X_n \to_p X\) if \(P(\{\omega : |X_n(\omega) - X(\omega) | > \epsilon\}) \to 0\) for any fixed \(\epsilon > 0\).
Does \(X_n \stackrel{a.s}{\to} X\) imply \(X_n \to_p X\)?
Yes.
Does \(X_n \to_p X\) imply \(X_n \stackrel{a.s}{\to} X\)?
No.
But there exists a subsequence \(n_k\) such that \(X_{n_k} \stackrel{a.s.}{\to} X\).
Let \(F_X : \mathbb{R} \to [0,1]\) be the cumulative distribution function of real-valued random variable \(X\).
\(X_n \stackrel{D}{\to} X\) if \(F_{X_n}(\alpha) \to F_{X}(\alpha)\) for each fixed \(\alpha\) that is a continuity point of \(F_X\).
"Weak convergence", "convergence in distribution", and "convergence in law" all mean the same thing.
In what senses can we talk about random variables converging?
Let \(X_i\) be independent, identically distributed random variables with mean \(\mu\), and \(Q_N \equiv \frac{1}{N} \sum_{i=1}^N X_i\).
\[Q_N \stackrel{?}{\to} \mu \text{?}\]
\[Q_N \to \mu \text{ (sure)?}\]
\[Q_N \stackrel{a.s.}{\to} \mu \text{?}\]
\[Q_N \to_p \mu \text{?}\]
\[Q_N \stackrel{D}{\to} \mu \text{?}\]
\(\exists \omega \in \Omega\) where you always sample the same point.
Probability that there are enough measurements off in one direction to keep \(|Q_N - \mu| > \epsilon\) decays with more samples.
Weak law of large numbers
Strong law of large numbers
How do you quantify \(|Q_N - \mu|\)?
Run \(M\) sets of \(N\) simulations and plot a histogram of \(Q_N^j\) for \(j \in \{1,\ldots,M\}\).
Lindeberg-Levy CLT: If \(\text{Var}[X_i] = \sigma^2 < \infty\), then
\[\sqrt{N}(Q_N - \mu) \stackrel{D}{\to} \mathcal{N}(0, \sigma)\]
After many samples \(Q_N\) starts to look distributed like \(\mathcal{N}(\mu, \frac{\sigma}{\sqrt{N}})\)
1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.
Two somewhat astounding takeaways:
1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.
2. You can estimate the "standard error" with \[SE = \frac{s}{\sqrt{N}}\]
where \(s\) is the sample standard deviation.