CS6015: Linear Algebra and Random Processes

Lecture 39: Moments, Moment generating functions: What are they and why do we care?

Learning Objectives

Slides to be made

What are moments?

first moment

E[X]

E[X]

E[X^2]

E[X^2]

second moment

E[X^3]

E[X^3]

third moment

E[X^n]

E[X^n]

$n$ -th moment

\dots\dots

\dots\dots

Centred and standardized moments

E[X]

E[X]

E[X^2]

E[X^2]

How big is $X$ on average?

How big is $X^2$ on average?

does not add much information as based on E[X] we anyways expect E[X^2] to be greater for the red points than the blue points

(mean)

Remove the information already contained in $E[X]$

E[(X-E[X])^2]

E[(X-E[X])^2]

(variance)

centred 2nd moment

spread of $X$ around the mean

\underbrace{~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~~~~~~~~~}

Centred and standardized moments

(skewness)

E[X^3]

E[X^3]

E[(\frac{X-E[X]}{\sigma})^3]

E[(\frac{X-E[X]}{\sigma})^3]

Remove the information already contained in $E[X]$ and $\sigma = E[(X-E[X])^2]$

centred & standardized 3rd moment

Rule of thumb

$|skewness| > 1$

Highly skewed

Moderately skewed

$0.5 < |skewness| < 1$

$\approx symmetric$

$0 < |skewness| < 0.5$

positive/right skew

E[X]

E[X]

negative/left skew

E[X]

E[X]

Centred and standardized moments

(kurtosis)

E[X^4]

E[X^4]

E[(\frac{X-E[X]}{\sigma})^4]

E[(\frac{X-E[X]}{\sigma})^4]

Remove the information already contained in $E[X]$ and $\sigma = E[(X-E[X])^2]$

centred & standardized 4th moment

Measures the heaviness in the tails

$kurtosis = 3$

Standard normal

Rule of thumb: A distribution with $kurtosis > 3$ is said to be heavy tailed

normal

normal

Why do we care about them?

Raw

(centre of gravity)

E[X]

E[X]

E[X^2]

E[X^2]

E[X^3]

E[X^3]

E[X^4]

E[X^4]

E[(X-E[X])^2]

E[(X-E[X])^2]

E[(\frac{X-E[X]}{\sigma})^3]

E[(\frac{X-E[X]}{\sigma})^3]

E[(\frac{X-E[X]}{\sigma})^4]

E[(\frac{X-E[X]}{\sigma})^4]

Centred

Centred + Standardised

(spread/variance)

(skewness)

(kurtosis)

Moments are a good way of summarising large data

How do we compute them?

Example 1: exponential distribution

These integrals are not very pleasant to compute

f_X(x) = \lambda e^{-\lambda x}

f_X(x) = \lambda e^{-\lambda x}

x \in [0, \infty)

x \in [0, \infty)

E[X] = \int_0^{\infty} x \lambda e^{-\lambda x} dx

E[X] = \int_0^{\infty} x \lambda e^{-\lambda x} dx

E[X^2] = \int_0^{\infty} x^2 \lambda e^{-\lambda x} dx

E[X^2] = \int_0^{\infty} x^2 \lambda e^{-\lambda x} dx

Recap:

We already saw a good way of computing these for exponential families

E[T_i(x)] = \frac{\partial A(\eta)}{\partial \eta_i}

E[T_i(x)] = \frac{\partial A(\eta)}{\partial \eta_i}

Var[T_i(x)] = \frac{\partial^2 A(\eta)}{\partial \eta_i^2}

CS6015: Linear Algebra and Random Processes

Lecture 39: Moments, Moment generating functions: What are they and why do we care?

CS6015: Lecture 39

More from Mitesh Khapra