Anatomy of a Random Variable
A more in-depth treatment of many of these concepts can be found at: https://github.com/zsunberg/CU-DMUPP-Materials/blob/main/notes/STAT_219_Notes.pdf
Outline
- A Motivating Example: Monte Carlo Integration
- Rigorous Definitions of a Random Variable
- Law of large numbers and the Central Limit Theorem
\(X:\Omega \to E\)
Monte Carlo Integration
\[I = \int_\Omega f(x) dx\]
\[I = \int_\Omega f(x) \mu(dx)\]
\[I \approx Q_N \equiv \frac{\int_\Omega \mu(dx)}{N} \sum_{i=1}^N f(X_i)\]
\[X_i \sim U(\Omega) \]
A useful function: the Indicator
Monte Carlo Integration
\[\text{E}[X] = \int_{-\infty}^{\infty}x \, p(x) \, dx\]
Special Case: Expectation
\[\approx \frac{1}{N} \sum_{i=1}^N X_i\]
In what sense does this converge as \(N\to\infty\)? How accurate is it?
Random Variables
Why are probability distributions not enough?
Consider this definition: Two random variables are equal if their probability distributions are the same.
\[A \sim \text{Bernoulli}(0.5)\]
\[B = \neg A\]
\[B \stackrel{?}{=} A\]
Random Variables
Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).
\[\omega \in \Omega\]
\[X(\omega) \in E\]
What is this function?
What is \(\omega \in \Omega\)?
Example: Coin World
\[\Omega = \{H, T\}\]
\[E = [0, 1]\]
\[X(\omega) = \mathbf{1}_{\{H\}}(\omega)\]
Â
\[\Omega = \{H, T\}^n\]
\[E = [0, 1]\]
\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]
Â
\[\Omega = \{H, T\}^\infty\]
\[E = [0, 1]\]
\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]
Example: Many coins
What is \(\omega \in \Omega\)?
- All of the randomness in the world
- Closest thing you work regularly with is a seeded rng
- Typically not written expressly (just \(X\) instead of \(X(\omega)\))
What is \(\mathcal{F}\) (and \(\mathcal{E}\))?
- "\(\sigma\)-algebra" or "\(\sigma\)-field"
- Subset of subsets of \(\Omega\) (that is, \(\mathcal{F} \subseteq 2^\Omega\))
- Three requirements to be a \(\sigma\)-field
- \(\Omega \in \mathcal{F}\)
- If \(A \in \mathcal{F}\) then \(A^c \in \mathcal{F}\) (where \(A^c = \Omega \backslash \mathcal{F}\))
- If \(A_i \in \mathcal{F}\) for \(i \in \mathbb{N}\) then \(\cup_{i=1}^\infty A_i \in \mathcal{F}\)
- \(\sigma(\cdot)\) creates a \(\sigma\)-field from a set of generators
\(\Omega\)
\(\mathcal{F} = \sigma(\{\quad\}) = \{\Omega,\quad, \quad, \emptyset\}\)
\(\sigma\)-algebra Examples
If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1\})\)?
If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1, 2\})\)?
Borel Sigma Algebra
The Borel \(\sigma\)-algebra for a topological space \(\Omega\) is the \(\sigma\)-field generated by all open sets in \(\Omega\).
Â
The Borel \(\sigma\)-field on \(\mathbb{R}\) is \(\mathcal{B} = \sigma(\{(a, b): a, b \in \mathbb{R}\})\)
- Is \((1,2)\) in \(\mathcal{B}\)?
- Is \((1,\infty)\) in \(\mathcal{B}\)?
- Is \([1,2]\) in \(\mathcal{B}\)?
- Is 1 in \(\mathcal{B}\)?
- \(\Omega \in \mathcal{F}\)
- If \(A \in \mathcal{F}\) then \(A^c \in \mathcal{F}\) (where \(A^c = \Omega \backslash \mathcal{F}\))
- If \(A_i \in \mathcal{F}\) for \(i \in \mathbb{N}\) then \(\cup_{i=1}^\infty A_i \in \mathcal{F}\)
What is \(P\)?
A probability measure \(P\) is a function \(P:\mathcal{F}\to [0,1]\) having the following properties:
- \(0 \leq P(A) \leq 1 \quad \forall \, A \in \mathcal{F}\).
- \(P(\Omega) = 1\).
- (Countable additivity) \(P(A) = \sum_{n=1}^\infty P(A_n)\) whenever \(A=\cup_{n=1}^\infty A_n\) is a countable union of disjoint sets \(A_n \in \mathcal{F}\)
Random Variables
Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).
A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)
\[f^{-1}(A) \in \mathcal{F}\]
Are there functions that are not Borel-measurable?
Yes! Example: \(\mathbf{1}_V\) where \(V\) is a Vitali Set
Advantages over pdf definition
- Rigorous treatment of deterministic outcomes
- More sophisticated convergence concepts
- Better way of thinking about related random variables (personally, I think)
Break
\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)
\(X = \mathbf{1}_{\{1,2\}}\)
A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)
\[f^{-1}(A) \in \mathcal{F}\]
Hint:
1. Use \(\sigma(\{1\})\) (what is it?)
2. What function has a preimage not in \(\sigma(\{1\})\)?
Â
Break
\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)
\(X = \mathbf{1}_{\{1,2\}}\)
A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)
\[f^{-1}(A) \in \mathcal{F}\]
Convergence
Review: For a (deterministic) sequence \(\{x_n\}\), we say
\[\lim_{n \to \infty} x_n = x\]
or
\[x_n \to x\]
if, for every \(\epsilon > 0\), there exists an \(N\) such that \(|x_n - x| < \epsilon\) for all \(n > N\).
Convergence
In what senses can we talk about random variables converging?
- Sure ("pointwise")
- Almost Sure
- In Probability
- Weak ("in distribution"/"in law")
When are two R.V.'s the same?
When are two R.V.'s the same?
\(X\) = \(Y\) if \(X(\omega) = Y(\omega) \quad \forall \omega \in \Omega\)
In practice, there are often unimportant \(\omega\) where this is not true.
We say that \(X\) is almost surely the same as \(Y\) if \[P(\{\omega: X(\omega) \neq Y(\omega) \}) = 0\text{.}\]
This is denoted \(X \stackrel{a.s.}{=}Y\) and the terms almost everywhere (a.e.) and with probability 1 (w.p.1) mean the same thing.
Sure Convergence
\[X_n(\omega) \to X(\omega) \quad \forall \, \omega \in \Omega\]
Almost Sure Convergence
\(X_n \stackrel{a.s.}{\to} X\) if there exists \(A \in \mathcal{F}\) with \(P(A) = 1\) such that \(X_n(\omega) \to X(\omega)\) for each fixed \(\omega \in A\).
Does sure convergence imply almost sure convergence?
Convergence in Probability
\(X_n \to_p X\) if \(P(\{\omega : |X_n(\omega) - X(\omega) | > \epsilon\}) \to 0\) for any fixed \(\epsilon > 0\).
Does \(X_n \stackrel{a.s}{\to} X\) imply \(X_n \to_p X\)?
Yes.
Convergence in Probability
Does \(X_n \to_p X\) imply \(X_n \stackrel{a.s}{\to} X\)?
No.
But there exists a subsequence \(n_k\) such that \(X_{n_k} \stackrel{a.s.}{\to} X\).
Weak Convergence
Let \(F_X : \mathbb{R} \to [0,1]\) be the cumulative distribution function of real-valued random variable \(X\).
\(X_n \stackrel{D}{\to} X\) if \(F_{X_n}(\alpha) \to F_{X}(\alpha)\) for each fixed \(\alpha\) that is a continuity point of \(F_X\).
"Weak convergence", "convergence in distribution", and "convergence in law" all mean the same thing.
Convergence
In what senses can we talk about random variables converging?
- Sure ("pointwise")
- Almost Sure
- In Probability
- Weak ("in distribution"/"in law")
Convergence of MC integration
Let \(X_i\) be independent, identically distributed random variables with mean \(\mu\), and \(Q_N \equiv \frac{1}{N} \sum_{i=1}^N X_i\).
\[Q_N \stackrel{?}{\to} \mu \text{?}\]
Convergence of MC integration
\[Q_N \to \mu \text{ (sure)?}\]
Â
\[Q_N \stackrel{a.s.}{\to} \mu \text{?}\]
Â
\[Q_N \to_p \mu \text{?}\]
Â
\[Q_N \stackrel{D}{\to} \mu \text{?}\]
\(\exists \omega \in \Omega\) where you always sample the same point.
Probability that there are enough measurements off in one direction to keep \(|Q_N - \mu| > \epsilon\) decays with more samples.
Weak law of large numbers
Strong law of large numbers
Convergence Rate of M.C. Integration
How do you quantify \(|Q_N - \mu|\)?
Run \(M\) sets of \(N\) simulations and plot a histogram of \(Q_N^j\) for \(j \in \{1,\ldots,M\}\).
Central Limit Theorem
Lindeberg-Levy CLT: If \(\text{Var}[X_i] = \sigma^2 < \infty\), then
\[\sqrt{N}(Q_N - \mu) \stackrel{D}{\to} \mathcal{N}(0, \sigma)\]
After many samples \(Q_N\) starts to look distributed like \(\mathcal{N}(\mu, \frac{\sigma}{\sqrt{N}})\)
Central Limit Theorem
1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.
Two somewhat astounding takeaways:
1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.
2. You can estimate the "standard error" with \[SE = \frac{s}{\sqrt{N}}\]
where \(s\) is the sample standard deviation.
Random Lecture I
By Zachary Sunberg
Random Lecture I
- 408