CS6015: Linear Algebra and Random Processes

Lecture 34:  Joint distribution, conditional distribution and marginal distribution of multiple random variables

Learning Objectives

What are joint, conditional and marginal pmfs?

What is conditional expectation?

What is the expectation of a function of multiple random variables?

Multiple random variables

X_1
X_2
X_3
X_4
X_5

Salinity

Pressure

Temperature

Y

Depth

Density

Oil

0/1

0/1

0/1

0/1

0/1

0/1

0: High   1:  Low

Multiple random variables

Questions of Interest

P(Y=0|X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6)

What is the probability that we will find oil?

What is the probability that everything will be high?

P(X_1=1,X_2=1,X_3=1,X_4=1,X_5=1,Y=1)
joint probability
conditional probability
P(X_5=1)

What is the probability that density will be high?

marginal probability

Understanding the notation

P(Y=0|X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6)

We have already discussed conditional distribution of events

The "event" notation

The "random variable" notation

P(\overbrace{Y=0}|\overbrace{X_1=x_1}, \overbrace{X_2=x_2}, \overbrace{X_3=x_3}, \overbrace{X_4=x_4}, \overbrace{X_5=x_5}, \overbrace{X_6=x_6})

events

p_{Y|X_1,X_2,X_3,X_4,X_5}(y|x_1,x_2,x_3,x_4,x_5)
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~}

given

i.e., the values of these random variables are fixed

random variables

This is not a new concept - just a change of notation

Understanding the notation

p_X(x) = P(X=x)
p_{X,Y}(x,y) = P(X=x, Y=y)
p_{X|Y}(x|y) = P(X=x| Y=y)
marginal
conditional
joint

We will soon see that if we know the joint pmf we can compute the marginal and the conditional

Understanding the notation

P(X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6,Y=0)
p_{X_1,X_2,X_3,X_4,X_5,Y}(x_1,x_2,x_3,x_4,x_5,y)
P(Y=0|X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6)
joint probability of multiple events
joint pmf: 2^n different inputs possible
conditional probability
P(Y=0)
probability of a single event
p_{Y|X_1,X_2,X_3,X_4,X_5}(y|x_1,x_2,x_3,x_4,x_5)
conditional pmf: function of y, other values fixed 
p_{Y}(y)
marginal pmf

Example

X: number of heads

-1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
TTT\\ TTH\\ THT\\ THH\\ HTT\\ HTH\\ HHT\\ HHH
\Omega
-1\\ 3\\ 2\\ 2\\ 1\\ 1\\ 1\\ 1
0\\ 1\\ 1\\ 2\\ 1\\ 2\\ 2\\ 3
X
Y

Y: position of first heads (-1 if no heads)

Y\\\overbrace{~~~~~~~~~~~~~~~~~~~~~~~}
X\begin{cases} ~\\ ~\\ ~\\ \end{cases}
p_{X,Y}(x,y)
=P(X=x, Y=y)

Can we compute the conditional and marginal distributions from the joint pmf?

Example

-1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
TTT\\ TTH\\ THT\\ THH\\ HTT\\ HTH\\ HHT\\ HHH
\Omega
-1\\ 3\\ 2\\ 2\\ 1\\ 1\\ 1\\ 1
0\\ 1\\ 1\\ 2\\ 1\\ 2\\ 2\\ 3
X
Y
Y\\\overbrace{~~~~~~~~~~~~~~~~~~~~~~~}
X\begin{cases} ~\\ ~\\ ~\\ \end{cases}
p_{X,Y}(x,y)
=P(X=x, Y=y)

Can we compute the conditional and marginal distributions from the joint pmf?

p_{X}(x) = \sum_{y}p_{X,Y}(x,y)

summing over all the different ways in which \(X\) can take the value \(x\)

Example

-1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
TTT\\ TTH\\ THT\\ THH\\ HTT\\ HTH\\ HHT\\ HHH
\Omega
-1\\ 3\\ 2\\ 2\\ 1\\ 1\\ 1\\ 1
0\\ 1\\ 1\\ 2\\ 1\\ 2\\ 2\\ 3
X
Y
Y\\\overbrace{~~~~~~~~~~~~~~~~~~~~~~~}
X\begin{cases} ~\\ ~\\ ~\\ \end{cases}
p_{X,Y}(x,y)
=P(X=x, Y=y)

Can we compute the conditional and marginal distributions from the joint pmf?

p_{X|Y}(x|y) = P(X=x|Y=y)
= \frac{P(X=x,~Y=y)}{P(Y=y)}
= \frac{p_{X,Y}(x,y)}{p_{Y}(y)}
= \frac{p_{X,Y}(x,y)}{\sum_x p_{X,Y}(x,y)}

Revisiting the laws

Multiplication/Chain Rule

p_{X,Y}(x,y)
=p_{X|Y}(x|y) p_Y(y)
P(X=x, Y=y) = P(X=y|Y=y)P(Y=y)

Total Probability Theorem

p_{X}(x)
=\sum_{y} p_{X|Y}(x|y) p_Y(y)
P(X=x) = \sum_i P(X=x|Y=y_i)P(Y=y_i)
= \sum_{y}p_{X,Y}(x,y)

Bayes' Theorem

p_{X|Y}(x|y)
= \frac{p_{X,Y}(x,y)}{p_Y(y)}
= \frac{p_{X,Y}(x,y)}{\sum_x p_{X,Y}(x,y)}
= \frac{p_{Y|X}(y|x) p_X(x)}{\sum_x p_{Y|X}(y|x) p_X(x)}
A_1
A_5
A_4
A_3
A_2
A_6
A_7
B
\Omega

Revisiting the laws

Bayes' Theorem

\overbrace{p_{X|Y}(x|y)}
= \frac{\overbrace{p_{Y|X}(y|x)} \overbrace{p_X(x)}}{\sum_x p_{Y|X}(y|x) p_X(x)}

Prior

Likelihood

Posterior

Revisiting the laws

\sum_{x}\sum_{y}p_{X,Y}(x,y) = 1
\sum_{x}p_{X}(x) = 1
\sum_{x}p_{X|Y}(x|y) = 1
\sum_{y}p_{X|Y}(x|y) \neq 1

Generalising to more variables

p_{X,Y,Z}(x,y,z)
=p_X(x) p_{Y|X}(y|x) p_{Z|X,Y}(z|x,y)
     
0 0 1/4 3/4
0 1 1/8 7/8
1 0 2/5 3/5
1 1 1/2 1/2
p_{Z|X,Y}(z|x,y)
Z
X
Y
p_Z(z) = \sum_x\sum_y p_{X,Y,Z}(x,y,z)

Conditional distribution

Z=0
Z=1

Joint distribution

Marginal distribution

     
0 0 0 1/21
0 0 1 3/21
0 1 0 1/21
0 1 1 7/21
1 0 0 2/21
1 0 1 3/21
1 1 0 2/21
1 1 1 2/21
X
Y
p_{X,Y,Z}
0 6/21
1 15/21
Z
p_Z(z)

Independence

p_{X,Y,Z}(x,y,z)
=p_X(x) p_{Y|X}(y|x) p_{Z|X,Y}(z|x,y)

\(X,Y,Z\) are independent if 

p_{X,Y,Z}(x,y,z)
=p_X(x) p_{Y}(y) p_{Z}(z)
\forall x,y,z
Z
     
0 0 0 1/20
0 0 1 3/20
0 1 0 2/20
0 1 1 6/20
1 0 0 1/20
1 0 1 3/20
1 1 0 1/20
1 1 1 3/20
X
Y
p_{X,Y,Z}
0 5/20
1 15/20
Z
p_Z(z)
     
0 0 1/4 3/4
0 1 1/4 3/4
1 0 1/4 3/4
1 1 1/4 3/4
p_{Z|X,Y}(z|x,y)
X
Y
Z=0
Z=1
1/4
3/4
1/20
3/20
2/20
6/20
2/20
2/20
0/20
4/20
     
0 0 1/4 3/4
0 1 1/4 3/4
1 0 1/2 1/2
1 1 0 1
X
Y
Z=0
Z=1
p_{Z|X,Y}(z|x,y)

Independence

\(X_1,X_2,X_3, \dots, X_n\) are independent if 

p_{X_1,X_2,X_3, \dots, X_n}(x_1,x_2,x_3, \dots, x_n)
=p_{X1}(x_1) p_{X2}(x_2) p_{X3}(x_3)\dots p_{Xn}(x_n)
\forall x_1,x_2,x_3, \dots, x_n

Expectation: Recap

E[X] = \sum_x xp_X(x)

If we interpret \(p_X(x)\) as the long term relative frequency then \(E[X]\) is the long term average value of \(X\)

E[g(X)] = \sum_x g(x)p_X(x)

What if we have a function of multiple random variables?

Conditional Expectation

E[X|A]

What is the expected value of the sum of two die given that the second die shows an even number

X:

random variable indicating sum of the dice

A:

event that the second die shows an even no.

What are we interested in?

E[X] = \sum_x xp_X(x)
= \sum_x xp_{X|A}(x)
(1 , 1) (1 , 2) (1 , 3) (1 , 4) (1 , 5) (1 , 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
(1 , 2) (1 , 4) (1 , 6)
(2, 2) (2, 4) (2, 6)
(3, 2) (3, 4) (3, 6)
(4, 2) (4, 4) (4, 6)
(5, 2) (5, 4) (5, 6)
(6, 2) (6, 4) (6, 6)
A
\Omega
\mathbb{R}_X ={3,4,5,6,7,8,9,10,11,12}
p_{X|A} ={\frac{1}{18},\frac{1}{18},\frac{2}{18},\frac{2}{18},\frac{3}{18},\frac{3}{18},\frac{2}{18},\frac{2}{18},\frac{1}{18},\frac{1}{18}}
= 7.5

Conditional Expectation

E[g(X)|A]
E[X] = \sum_x xp_X(x)
= \sum_x g(x)p_{X|A}(x)
E[X|A]
= \sum_x xp_{X|A}(x)

Instead of conditioning on events we can condition on random variables

E[X|Y=y]
= \sum_x xp_{X|Y}(x|y)
E[g(X)|Y=y]
= \sum_x g(x)p_{X|Y}(x|y)

Total Expectation Theorem

E[X] = \sum_x xp_X(x)
A_1
A_5
A_4
A_3
A_2
A_6
A_7
B
\Omega
p_{X}(x) = \sum_{i=1}^n P(A_i)p_{X|A_i}(x)

Multiply by \(x\) on both sides and sum over \(x\)

\sum_x xp_{X}(x) = \sum_x x\sum_{i=1}^n P(A_i)p_{X|A_i}(x)
= \sum_{i=1}^n P(A_i)\sum_x x p_{X|A_i}(x)
= \sum_{i=1}^n P(A_i)E[X|A_i]
E[X]

Instead of conditioning on events we can also condition on random variables

E[X]
= \sum_{y} p_Y(y)E[X|Y=y]

Total Expectation Theorem

E[X] = \sum_x xp_X(x)

time taken

0.5
0.3
0.2
X:
E[X|A_1] = 60 mins
E[X|A_2] = 30 mins
E[X|A_3] = 45 mins
E[X] = ?
\sum_{i=1}^{3}P(A_i)E[X|A_i]

Expectation: Mult. rand. variables

Example: You lose INR 1 if the number on die 1 is less than that on die 2 and win INR 1 otherwise

E[g(X)] = \sum_x g(x)p_X(x)
g(X,Y) = \begin{cases} -1~if X < Y\\ +1~if X \geq Y \end{cases}
E[g(X,Y)] = ?

How do you compute this without computing the distribution of \(g(X,Y)\)?

Expectation: Mult. rand. variables

E[g(X)] = \sum_x g(x)p_X(x)
E[g(X,Y)] = \sum_{y} p_Y(y) E [g(X,Y)|Y=y]
= \sum_{y} p_Y(y) E [g(X,y)|Y=y]
= \sum_{y} p_Y(y) \sum_{x} g(x,y)p_{X|Y}(x|y)
= \sum_{x} \sum_{y} p_Y(y) g(x,y)p_{X|Y}(x|y)
= \sum_{x} \sum_{y} g(x,y)p_{X,Y}(x,y)

Expectation: Mult. rand. variables

Example: You lose INR 1 if the number on die 1 is less than that on die 2 and win INR 1 otherwise

g(X,Y) = \begin{cases} -1~if X < Y\\ +1~if X \geq Y \end{cases}
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_{X,Y}(x,y)
p_X(x,y) = \frac{1}{36} \forall x,y
= \frac{1}{6}

Expectation: Mult. rand. variables

In general, 

E[g(X,Y)] \neq g(E[X],E[Y])

Exception 1

E[g(X,Y)]
g(X,Y) = aX + bY
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
= \sum_x\sum_y (ax + by)p_X(x,y)
= a \sum_x x \sum_y p_X(x,y) + b \sum_y y \sum_x p_X(x,y)
\underbrace{~~~~~~~~~~~~~~~~~~~~}
\underbrace{~~~~~~~~~~~~~~~~~~~~~}
= a \sum_x x p_X(x) + b \sum_y y p_Y(y)
= a E[X] + b E[Y]
= g(E[X], E[Y])
= \sum_x\sum_y g(x,y)p_X(x,y)
\underbrace{~~~~~~~~~~}
\underbrace{~~~~~~~~~~}

Expectation: Mult. rand. variables

In general, 

E[g(X,Y)] \neq g(E[X],E[Y])

Exception 2

E[g(X,Y)]
g(X,Y) = XY
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
= \sum_x\sum_y g(x,y)p_X(x,y)

\(X,Y\) are independent

= \sum_x\sum_y xyp_X(x)p_Y(y)
= \sum_x xp_X(x) \sum_y yp_Y(y)
= E[X]E[Y]
= g(E[X],E[Y])
\underbrace{~~~~~~~~~~~~~~~~~~~}
\underbrace{~~~~~~~~~~~~~~~~~~~}

Variances: Mult. rand. variables

Recap, 

Var(aX) = a^2 Var(X)
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
Var(X+a) = Var(X)

In general, 

Var(X+Y) \neq Var(X) + Var(Y)
Examples, X = Y, X = -Y

Exception: If \(X\) and \(Y\) are independent

Var(X+Y) = E [(X+Y)^2] - (E[X+Y])^2

Variances: Mult. rand. variables

Proof: (given: \(X~and~Y\) are independent)

E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
Var(X+Y) = E [(X+Y)^2] - (E[X+Y])^2
= E [X^2 + 2XY + Y^2] - (E[X] + E[Y])^2
= E [X^2] + 2E[XY] + E[Y^2] - (E[X]^2 + 2E[X]E[Y] + E[Y]^2)
= E [X^2] + 2E[X]E[Y] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2
= E [X^2] - E[X]^2 + E[Y^2] - E[Y]^2
= Var(X) + Var(Y)

Where did we use the independence property?

Summary of main results

X
X|Y
X, Y
E[X] = \sum_x xp_X(x)

"long term" average

E[g(X)] = \sum_x g(x)p_X(x)

function of RV

E[a X + b] = a E[X] + b

linearity of expectation

Var(X) = E[(X - E[X])^2]

spread in the data

Var(a X + b) = a^2 Var(X)
E[X|A] = \sum_x xp_{X|A}(x)

conditioned on event

E[X|Y] = \sum_x xp_{X|Y}(x|y)

conditioned on RV

E[g(X)|A] = \sum_x g(x)p_{X|A}(x)
E[g(X)|Y=y] = \sum_x g(x)p_{X|Y}(x|y)
E[X] = \sum_{i=1}^n P(A_i)E[X|A_i]
E[X] = \sum_{y} p_Y(y)E[X|Y=y]

total expectation theorem

E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)

function of multiple RVs

E[g(X,Y)] \neq g(E[X],E[Y])

in general, not equal but

E[aX+bY)] = aE[X] + b E[Y]
E[XY)] = E[X]E[Y]

if \(X\) and \(Y\) are independent

Var(X+Y) = Var(X) + Var(Y)

only if \(X\) and \(Y\) are indep.