Anatomy of a Random Variable

A more in-depth treatment of many of these concepts can be found at: https://github.com/zsunberg/CU-DMUPP-Materials/blob/main/notes/STAT_219_Notes.pdf

Outline

  • A Motivating Example: Monte Carlo Integration
  • Rigorous Definitions of a Random Variable
  • Law of large numbers and the Central Limit Theorem

\(X:\Omega \to E\)

Monte Carlo Integration

\[I = \int_\Omega f(x) dx\]

\[I = \int_\Omega f(x) \mu(dx)\]

\[I \approx Q_N \equiv \frac{\int_\Omega \mu(dx)}{N} \sum_{i=1}^N f(X_i)\]

\[X_i \sim U(\Omega) \]

A useful function: the Indicator

Monte Carlo Integration

\[\text{E}[X] = \int_{-\infty}^{\infty}x \, p(x) \, dx\]

Special Case: Expectation

\[\approx \frac{1}{N} \sum_{i=1}^N X_i\]

In what sense does this converge as \(N\to\infty\)? How accurate is it?

Random Variables

Why are probability distributions not enough?

Consider this definition: Two random variables are equal if their probability distributions are the same.

\[A \sim \text{Bernoulli}(0.5)\]

\[B = \neg A\]

\[B \stackrel{?}{=} A\]

Random Variables

Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).

\[\omega \in \Omega\]

\[X(\omega) \in E\]

What is this function?

What is \(\omega \in \Omega\)?

Example: Coin World

\[\Omega = \{H, T\}\]

\[E = [0, 1]\]

\[X(\omega) = \mathbf{1}_{\{H\}}(\omega)\]

 

\[\Omega = \{H, T\}^n\]

\[E = [0, 1]\]

\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]

 

\[\Omega = \{H, T\}^\infty\]

\[E = [0, 1]\]

\[X_i(\omega) = \mathbf{1}_{\{H\}}(\omega_i)\]

Example: Many coins

What is \(\omega \in \Omega\)?

  • All of the randomness in the world
  • Closest thing you work regularly with is a seeded rng
  • Typically not written expressly (just \(X\) instead of \(X(\omega)\))

What is \(\mathcal{F}\) (and \(\mathcal{E}\))?

  • "\(\sigma\)-algebra" or "\(\sigma\)-field"
  • Subset of subsets of \(\Omega\) (that is, \(\mathcal{F} \subseteq 2^\Omega\))
  • Three requirements to be a \(\sigma\)-field
    • \(\Omega \in \mathcal{F}\)
    • If \(A \in \mathcal{F}\) then \(A^c \in \mathcal{F}\) (where \(A^c = \Omega \backslash \mathcal{F}\))
    • If \(A_i \in \mathcal{F}\) for \(i \in \mathbb{N}\) then \(\cup_{i=1}^\infty A_i \in \mathcal{F}\)
  • \(\sigma(\cdot)\) creates a \(\sigma\)-field from a set of generators

\(\Omega\)

\(\mathcal{F} = \sigma(\{\quad\}) = \{\Omega,\quad, \quad, \emptyset\}\)

\(\sigma\)-algebra Examples

If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1\})\)?

If \(\Omega = \{1,2,3\}\), what is \(\sigma(\{1, 2\})\)?

Borel Sigma Algebra

The Borel \(\sigma\)-algebra for a topological space \(\Omega\) is the \(\sigma\)-field generated by all open sets in \(\Omega\).

 

The Borel \(\sigma\)-field on \(\mathbb{R}\) is \(\mathcal{B} = \sigma(\{(a, b): a, b \in \mathbb{R}\})\)

  • Is \((1,2)\) in \(\mathcal{B}\)?
  • Is \((1,\infty)\) in \(\mathcal{B}\)?
  • Is \([1,2]\) in \(\mathcal{B}\)?
  • Is 1 in \(\mathcal{B}\)?
  • \(\Omega \in \mathcal{F}\)
  • If \(A \in \mathcal{F}\) then \(A^c \in \mathcal{F}\) (where \(A^c = \Omega \backslash \mathcal{F}\))
  • If \(A_i \in \mathcal{F}\) for \(i \in \mathbb{N}\) then \(\cup_{i=1}^\infty A_i \in \mathcal{F}\)

What is \(P\)?

A probability measure \(P\) is a function \(P:\mathcal{F}\to [0,1]\) having the following properties:

  1. \(0 \leq P(A) \leq 1 \quad \forall \, A \in \mathcal{F}\).
  2. \(P(\Omega) = 1\).
  3. (Countable additivity) \(P(A) = \sum_{n=1}^\infty P(A_n)\) whenever \(A=\cup_{n=1}^\infty A_n\) is a countable union of disjoint sets \(A_n \in \mathcal{F}\)

Random Variables

Given a probability space \((\Omega, \mathcal{F}, P)\), and a measurable space \((E, \mathcal{E})\), an \(E\)-valued random variable is a measurable function \(X: \Omega \to E\).

A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)

\[f^{-1}(A) \in \mathcal{F}\]

Are there functions that are not Borel-measurable?

Yes! Example: \(\mathbf{1}_V\) where \(V\) is a Vitali Set

Advantages over pdf definition

  • Rigorous treatment of deterministic outcomes
  • More sophisticated convergence concepts
  • Better way of thinking about related random variables (personally, I think)

Break

\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)

\(X = \mathbf{1}_{\{1,2\}}\)

A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)

\[f^{-1}(A) \in \mathcal{F}\]

Hint:

1. Use \(\sigma(\{1\})\) (what is it?)

2. What function has a preimage not in \(\sigma(\{1\})\)?

 

Break

\(\mathcal{F} = \{\Omega, \emptyset, \{1\}, \{2, 3\}\}\)

\(X = \mathbf{1}_{\{1,2\}}\)

A function \(f:\Omega \to E\) is measurable if for every \(A \in \mathcal{E}\), the pre-image of \(A\) under \(f\) is in \(\mathcal{F}\). That is, for all \(A \in \mathcal{E}\)

\[f^{-1}(A) \in \mathcal{F}\]

Convergence

Review: For a (deterministic) sequence \(\{x_n\}\), we say

\[\lim_{n \to \infty} x_n = x\]

or

\[x_n \to x\]

if, for every \(\epsilon > 0\), there exists an \(N\) such that \(|x_n - x| < \epsilon\) for all \(n > N\).

Convergence

In what senses can we talk about random variables converging?

  • Sure ("pointwise")
  • Almost Sure
  • In Probability
  • Weak ("in distribution"/"in law")

When are two R.V.'s the same?

\mathbf{\neq}

When are two R.V.'s the same?

\(X\) = \(Y\) if \(X(\omega) = Y(\omega) \quad \forall \omega \in \Omega\)

In practice, there are often unimportant \(\omega\) where this is not true.

We say that \(X\) is almost surely the same as \(Y\) if \[P(\{\omega: X(\omega) \neq Y(\omega) \}) = 0\text{.}\]

This is denoted \(X \stackrel{a.s.}{=}Y\) and the terms almost everywhere (a.e.) and with probability 1 (w.p.1) mean the same thing.

Sure Convergence

\[X_n(\omega) \to X(\omega) \quad \forall \, \omega \in \Omega\]

Almost Sure Convergence

\(X_n \stackrel{a.s.}{\to} X\) if there exists \(A \in \mathcal{F}\) with \(P(A) = 1\) such that \(X_n(\omega) \to X(\omega)\) for each fixed \(\omega \in A\).

Does sure convergence imply almost sure convergence?

Convergence in Probability

\(X_n \to_p X\) if \(P(\{\omega : |X_n(\omega) - X(\omega) | > \epsilon\}) \to 0\) for any fixed \(\epsilon > 0\).

Does \(X_n \stackrel{a.s}{\to} X\) imply \(X_n \to_p X\)?

Yes.

Convergence in Probability

Does \(X_n \to_p X\) imply \(X_n \stackrel{a.s}{\to} X\)?

No.

But there exists a subsequence \(n_k\) such that \(X_{n_k} \stackrel{a.s.}{\to} X\).

Weak Convergence

Let \(F_X : \mathbb{R} \to [0,1]\) be the cumulative distribution function of real-valued random variable \(X\).

\(X_n \stackrel{D}{\to} X\) if \(F_{X_n}(\alpha) \to F_{X}(\alpha)\) for each fixed \(\alpha\) that is a continuity point of \(F_X\).

"Weak convergence", "convergence in distribution", and "convergence in law" all mean the same thing.

Convergence

In what senses can we talk about random variables converging?

  • Sure ("pointwise")
  • Almost Sure
  • In Probability
  • Weak ("in distribution"/"in law")

Convergence of MC integration

Let \(X_i\) be independent, identically distributed random variables with mean \(\mu\), and \(Q_N \equiv \frac{1}{N} \sum_{i=1}^N X_i\).

\[Q_N \stackrel{?}{\to} \mu \text{?}\]

Convergence of MC integration

\[Q_N \to \mu \text{ (sure)?}\]

 

\[Q_N \stackrel{a.s.}{\to} \mu \text{?}\]

 

\[Q_N \to_p \mu \text{?}\]

 

\[Q_N \stackrel{D}{\to} \mu \text{?}\]

\(\exists \omega \in \Omega\) where you always sample the same point.

Probability that there are enough measurements off in one direction to keep \(|Q_N - \mu| > \epsilon\) decays with more samples.

Weak law of large numbers

Strong law of large numbers

Convergence Rate of M.C. Integration

How do you quantify \(|Q_N - \mu|\)?

Run \(M\) sets of \(N\) simulations and plot a histogram of \(Q_N^j\) for \(j \in \{1,\ldots,M\}\).

Central Limit Theorem

Lindeberg-Levy CLT: If \(\text{Var}[X_i] = \sigma^2 < \infty\), then

\[\sqrt{N}(Q_N - \mu) \stackrel{D}{\to} \mathcal{N}(0, \sigma)\]

After many samples \(Q_N\) starts to look distributed like \(\mathcal{N}(\mu, \frac{\sigma}{\sqrt{N}})\)

Central Limit Theorem

1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.

Two somewhat astounding takeaways:

1. Error decays at \(\frac{1}{\sqrt{N}}\) regardless of dimension.

2. You can estimate the "standard error" with \[SE = \frac{s}{\sqrt{N}}\]

where \(s\) is the sample standard deviation.

Random Lecture I

By Zachary Sunberg

Random Lecture I

  • 408