Anatomy of a Random Variable

A more in-depth treatment of many of these concepts can be found at: https://github.com/zsunberg/CU-DMUPP-Materials/blob/main/notes/STAT_219_Notes.pdf

Outline

A Motivating Example: Monte Carlo Integration
Rigorous Definitions of a Random Variable
Law of large numbers and the Central Limit Theorem

\(X:\Omega \to E\)

Monte Carlo Integration

\[I = \int_\Omega f(x) dx\]

\[I = \int_\Omega f(x) \mu(dx)\]

\[I \approx Q_N \equiv \frac{\int_\Omega \mu(dx)}{N} \sum_{i=1}^N f(X_i)\]

\[X_i \sim U(\Omega) \]

A useful function: the Indicator

Monte Carlo Integration

\[\text{E}[X] = \int_{-\infty}^{\infty}x \, p(x) \, dx\]

Special Case: Expectation

\[\approx \frac{1}{N} \sum_{i=1}^N X_i\]

In what sense does this converge as \(N\to\infty\)? How accurate is it?

Random Variables

Why are probability distributions not enough?

Consider this definition: Two random variables are equal if their probability distributions are the same.

\[A \sim \text{Bernoulli}(0.5)\]

\[B = \neg A\]

\[B \stackrel{?}{=} A\]

Random Variables

Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).

\[\omega \in \Omega\]

\[X(\omega) \in E\]

What is this function?

What is \(\omega \in \Omega\)?

Example: Coin World

\[\Omega = \{H, T\}\]

\[E = [0, 1]\]

\[X(\omega) = \mathbf{1}_{\{H\}}(\omega)\]

\[\Omega = \{H, T\}^n\]

\[E = [0, 1]\]

\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]

\[\Omega = \{H, T\}^\infty\]

\[E = [0, 1]\]

\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]

Example: Many coins

What is \(\omega \in \Omega\)?

All of the randomness in the world
Closest thing you work regularly with is a seeded rng
Typically not written expressly (just \(X\) instead of \(X(\omega)\))

What is \(\mathcal{F}\) (and \(\mathcal{E}\))?

"\(\sigma\)-algebra" or "\(\sigma\)-field"
Subset of subsets of \(\Omega\) (that is, \(\mathcal{F} \subseteq 2^\Omega\))
Three requirements to be a \(\sigma\)-field
- \(\Omega \in \mathcal{F}\)
- If \(A \in \mathcal{F}\) then \(A^c \in \mathcal{F}\) (where \(A^c = \Omega \backslash \mathcal{F}\))
- If \(A_i \in \mathcal{F}\) for \(i \in \mathbb{N}\) then \(\cup_{i=1}^\infty A_i \in \mathcal{F}\)
\(\sigma(\cdot)\) creates a \(\sigma\)-field from a set of generators

\(\Omega\)

\(\mathcal{F} = \sigma(\{\quad\}) = \{\Omega,\quad, \quad, \emptyset\}\)

\(\sigma\)-algebra Examples

If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1\})\)?

If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1, 2\})\)?

Borel Sigma Algebra

The Borel \(\sigma\)-algebra for a topological space \(\Omega\) is the \(\sigma\)-field generated by all open sets in \(\Omega\).

The Borel \(\sigma\)-field on \(\mathbb{R}\) is \(\mathcal{B} = \sigma(\{(a, b): a, b \in \mathbb{R}\})\)

Is \((1,2)\) in \(\mathcal{B}\)?
Is \((1,\infty)\) in \(\mathcal{B}\)?
Is \([1,2]\) in \(\mathcal{B}\)?
Is 1 in \(\mathcal{B}\)?

\(\Omega \in \mathcal{F}\)
If \(A \in \mathcal{F}\) then \(A^c \in \mathcal{F}\) (where \(A^c = \Omega \backslash \mathcal{F}\))
If \(A_i \in \mathcal{F}\) for \(i \in \mathbb{N}\) then \(\cup_{i=1}^\infty A_i \in \mathcal{F}\)

What is \(P\)?

A probability measure \(P\) is a function \(P:\mathcal{F}\to [0,1]\) having the following properties:

\(0 \leq P(A) \leq 1 \quad \forall \, A \in \mathcal{F}\).
\(P(\Omega) = 1\).
(Countable additivity) \(P(A) = \sum_{n=1}^\infty P(A_n)\) whenever \(A=\cup_{n=1}^\infty A_n\) is a countable union of disjoint sets \(A_n \in \mathcal{F}\)

Random Variables

Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).

A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)

\[f^{-1}(A) \in \mathcal{F}\]

Are there functions that are not Borel-measurable?

Yes! Example: \(\mathbf{1}_V\) where \(V\) is a Vitali Set

Advantages over pdf definition

Rigorous treatment of deterministic outcomes
More sophisticated convergence concepts
Better way of thinking about related random variables (personally, I think)

Break

\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)

\(X = \mathbf{1}_{\{1,2\}}\)

A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)

\[f^{-1}(A) \in \mathcal{F}\]

Hint:

1. Use \(\sigma(\{1\})\) (what is it?)

2. What function has a preimage not in \(\sigma(\{1\})\)?

Break

\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)

\(X = \mathbf{1}_{\{1,2\}}\)

A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)

\[f^{-1}(A) \in \mathcal{F}\]

Convergence

Review: For a (deterministic) sequence \(\{x_n\}\), we say

\[\lim_{n \to \infty} x_n = x\]

\[x_n \to x\]

if, for every \(\epsilon > 0\), there exists an \(N\) such that \(|x_n - x| < \epsilon\) for all \(n > N\).

Convergence

In what senses can we talk about random variables converging?

Sure ("pointwise")
Almost Sure
In Probability
Weak ("in distribution"/"in law")

When are two R.V.'s the same?

\mathbf{\neq}

When are two R.V.'s the same?

\(X\) = \(Y\) if \(X(\omega) = Y(\omega) \quad \forall \omega \in \Omega\)

In practice, there are often unimportant \(\omega\) where this is not true.

We say that \(X\) is almost surely the same as \(Y\) if \[P(\{\omega: X(\omega) \neq Y(\omega) \}) = 0\text{.}\]

This is denoted \(X \stackrel{a.s.}{=}Y\) and the terms almost everywhere (a.e.) and with probability 1 (w.p.1) mean the same thing.

Sure Convergence

\[X_n(\omega) \to X(\omega) \quad \forall \, \omega \in \Omega\]

Almost Sure Convergence

\(X_n \stackrel{a.s.}{\to} X\) if there exists \(A \in \mathcal{F}\) with \(P(A) = 1\) such that \(X_n(\omega) \to X(\omega)\) for each fixed \(\omega \in A\).

Does sure convergence imply almost sure convergence?

Convergence in Probability

\(X_n \to_p X\) if \(P(\{\omega : |X_n(\omega) - X(\omega) | > \epsilon\}) \to 0\) for any fixed \(\epsilon > 0\).

Does \(X_n \stackrel{a.s}{\to} X\) imply \(X_n \to_p X\)?

Yes.

Convergence in Probability

Does \(X_n \to_p X\) imply \(X_n \stackrel{a.s}{\to} X\)?

No.

But there exists a subsequence \(n_k\) such that \(X_{n_k} \stackrel{a.s.}{\to} X\).

Weak Convergence

Let \(F_X : \mathbb{R} \to [0,1]\) be the cumulative distribution function of real-valued random variable \(X\).

\(X_n \stackrel{D}{\to} X\) if \(F_{X_n}(\alpha) \to F_{X}(\alpha)\) for each fixed \(\alpha\) that is a continuity point of \(F_X\).

"Weak convergence", "convergence in distribution", and "convergence in law" all mean the same thing.

Convergence

In what senses can we talk about random variables converging?

Sure ("pointwise")
Almost Sure
In Probability
Weak ("in distribution"/"in law")

Convergence of MC integration

Let \(X_i\) be independent, identically distributed random variables with mean \(\mu\), and \(Q_N \equiv \frac{1}{N} \sum_{i=1}^N X_i\).

\[Q_N \stackrel{?}{\to} \mu \text{?}\]

Convergence of MC integration

\[Q_N \to \mu \text{ (sure)?}\]

\[Q_N \stackrel{a.s.}{\to} \mu \text{?}\]

\[Q_N \to_p \mu \text{?}\]

\[Q_N \stackrel{D}{\to} \mu \text{?}\]

\(\exists \omega \in \Omega\) where you always sample the same point.

Probability that there are enough measurements off in one direction to keep \(|Q_N - \mu| > \epsilon\) decays with more samples.

Weak law of large numbers

Strong law of large numbers

Convergence Rate of M.C. Integration

How do you quantify \(|Q_N - \mu|\)?

Run \(M\) sets of \(N\) simulations and plot a histogram of \(Q_N^j\) for \(j \in \{1,\ldots,M\}\).

Central Limit Theorem

Lindeberg-Levy CLT: If \(\text{Var}[X_i] = \sigma^2 < \infty\), then

\[\sqrt{N}(Q_N - \mu) \stackrel{D}{\to} \mathcal{N}(0, \sigma)\]

After many samples \(Q_N\) starts to look distributed like \(\mathcal{N}(\mu, \frac{\sigma}{\sqrt{N}})\)

Central Limit Theorem

1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.

Two somewhat astounding takeaways:

1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.

2. You can estimate the "standard error" with \[SE = \frac{s}{\sqrt{N}}\]

where \(s\) is the sample standard deviation.