# CS6015: Linear Algebra and Random Processes

## Lecture 34:  Joint distribution, conditional distribution and marginal distribution of multiple random variables

X_1
X_2
X_3
X_4
X_5

Y

### Questions of Interest

P(Y=0|X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6)

### What is the probability that everything will be high?

P(X_1=1,X_2=1,X_3=1,X_4=1,X_5=1,Y=1)
joint probability
conditional probability
P(X_5=1)

### What is the probability that density will be high?

marginal probability

### Understanding the notation

P(Y=0|X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6)

### The "random variable" notation

P(\overbrace{Y=0}|\overbrace{X_1=x_1}, \overbrace{X_2=x_2}, \overbrace{X_3=x_3}, \overbrace{X_4=x_4}, \overbrace{X_5=x_5}, \overbrace{X_6=x_6})

### events

p_{Y|X_1,X_2,X_3,X_4,X_5}(y|x_1,x_2,x_3,x_4,x_5)
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~}

### Understanding the notation

p_X(x) = P(X=x)
p_{X,Y}(x,y) = P(X=x, Y=y)
p_{X|Y}(x|y) = P(X=x| Y=y)
marginal
conditional
joint

### Understanding the notation

P(X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6,Y=0)
p_{X_1,X_2,X_3,X_4,X_5,Y}(x_1,x_2,x_3,x_4,x_5,y)
P(Y=0|X_1=x_1, X_2=x_2, X_3=x_3, X_4=x_4, X_5=x_5, X_6=x_6)
joint probability of multiple events
joint pmf: 2^n different inputs possible
conditional probability
P(Y=0)
probability of a single event
p_{Y|X_1,X_2,X_3,X_4,X_5}(y|x_1,x_2,x_3,x_4,x_5)
conditional pmf: function of y, other values fixed
p_{Y}(y)
marginal pmf

### Example

-1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
TTT\\ TTH\\ THT\\ THH\\ HTT\\ HTH\\ HHT\\ HHH
\Omega
-1\\ 3\\ 2\\ 2\\ 1\\ 1\\ 1\\ 1
0\\ 1\\ 1\\ 2\\ 1\\ 2\\ 2\\ 3
X
Y

Y\\\overbrace{~~~~~~~~~~~~~~~~~~~~~~~}
X\begin{cases} ~\\ ~\\ ~\\ \end{cases}
p_{X,Y}(x,y)
=P(X=x, Y=y)

### Example

-1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
TTT\\ TTH\\ THT\\ THH\\ HTT\\ HTH\\ HHT\\ HHH
\Omega
-1\\ 3\\ 2\\ 2\\ 1\\ 1\\ 1\\ 1
0\\ 1\\ 1\\ 2\\ 1\\ 2\\ 2\\ 3
X
Y
Y\\\overbrace{~~~~~~~~~~~~~~~~~~~~~~~}
X\begin{cases} ~\\ ~\\ ~\\ \end{cases}
p_{X,Y}(x,y)
=P(X=x, Y=y)

### Can we compute the conditional and marginal distributions from the joint pmf?

p_{X}(x) = \sum_{y}p_{X,Y}(x,y)

### Example

-1 1 2 3
0 1/8 0 0 0
1 0 1/8 1/8 1/8
2 0 2/8 1/8 0
3 0 1/8 0 0
TTT\\ TTH\\ THT\\ THH\\ HTT\\ HTH\\ HHT\\ HHH
\Omega
-1\\ 3\\ 2\\ 2\\ 1\\ 1\\ 1\\ 1
0\\ 1\\ 1\\ 2\\ 1\\ 2\\ 2\\ 3
X
Y
Y\\\overbrace{~~~~~~~~~~~~~~~~~~~~~~~}
X\begin{cases} ~\\ ~\\ ~\\ \end{cases}
p_{X,Y}(x,y)
=P(X=x, Y=y)

### Can we compute the conditional and marginal distributions from the joint pmf?

p_{X|Y}(x|y) = P(X=x|Y=y)
= \frac{P(X=x,~Y=y)}{P(Y=y)}
= \frac{p_{X,Y}(x,y)}{p_{Y}(y)}
= \frac{p_{X,Y}(x,y)}{\sum_x p_{X,Y}(x,y)}

### Multiplication/Chain Rule

p_{X,Y}(x,y)
=p_{X|Y}(x|y) p_Y(y)
P(X=x, Y=y) = P(X=y|Y=y)P(Y=y)

### Total Probability Theorem

p_{X}(x)
=\sum_{y} p_{X|Y}(x|y) p_Y(y)
P(X=x) = \sum_i P(X=x|Y=y_i)P(Y=y_i)
= \sum_{y}p_{X,Y}(x,y)

### Bayes' Theorem

p_{X|Y}(x|y)
= \frac{p_{X,Y}(x,y)}{p_Y(y)}
= \frac{p_{X,Y}(x,y)}{\sum_x p_{X,Y}(x,y)}
= \frac{p_{Y|X}(y|x) p_X(x)}{\sum_x p_{Y|X}(y|x) p_X(x)}
A_1
A_5
A_4
A_3
A_2
A_6
A_7
B
\Omega

### Bayes' Theorem

\overbrace{p_{X|Y}(x|y)}
= \frac{\overbrace{p_{Y|X}(y|x)} \overbrace{p_X(x)}}{\sum_x p_{Y|X}(y|x) p_X(x)}

### Revisiting the laws

\sum_{x}\sum_{y}p_{X,Y}(x,y) = 1
\sum_{x}p_{X}(x) = 1
\sum_{x}p_{X|Y}(x|y) = 1
\sum_{y}p_{X|Y}(x|y) \neq 1

### Generalising to more variables

p_{X,Y,Z}(x,y,z)
=p_X(x) p_{Y|X}(y|x) p_{Z|X,Y}(z|x,y)

0 0 1/4 3/4
0 1 1/8 7/8
1 0 2/5 3/5
1 1 1/2 1/2
p_{Z|X,Y}(z|x,y)
Z
X
Y
p_Z(z) = \sum_x\sum_y p_{X,Y,Z}(x,y,z)

Z=0
Z=1

0 0 0 1/21
0 0 1 3/21
0 1 0 1/21
0 1 1 7/21
1 0 0 2/21
1 0 1 3/21
1 1 0 2/21
1 1 1 2/21
X
Y
p_{X,Y,Z}
0 6/21
1 15/21
Z
p_Z(z)

### Independence

p_{X,Y,Z}(x,y,z)
=p_X(x) p_{Y|X}(y|x) p_{Z|X,Y}(z|x,y)

### $$X,Y,Z$$ are independent if

p_{X,Y,Z}(x,y,z)
=p_X(x) p_{Y}(y) p_{Z}(z)
\forall x,y,z
Z

0 0 0 1/20
0 0 1 3/20
0 1 0 2/20
0 1 1 6/20
1 0 0 1/20
1 0 1 3/20
1 1 0 1/20
1 1 1 3/20
X
Y
p_{X,Y,Z}
0 5/20
1 15/20
Z
p_Z(z)

0 0 1/4 3/4
0 1 1/4 3/4
1 0 1/4 3/4
1 1 1/4 3/4
p_{Z|X,Y}(z|x,y)
X
Y
Z=0
Z=1
1/4
3/4
1/20
3/20
2/20
6/20
2/20
2/20
0/20
4/20

0 0 1/4 3/4
0 1 1/4 3/4
1 0 1/2 1/2
1 1 0 1
X
Y
Z=0
Z=1
p_{Z|X,Y}(z|x,y)

### $$X_1,X_2,X_3, \dots, X_n$$ are independent if

p_{X_1,X_2,X_3, \dots, X_n}(x_1,x_2,x_3, \dots, x_n)
=p_{X1}(x_1) p_{X2}(x_2) p_{X3}(x_3)\dots p_{Xn}(x_n)
\forall x_1,x_2,x_3, \dots, x_n

### Expectation: Recap

E[X] = \sum_x xp_X(x)

### If we interpret $$p_X(x)$$ as the long term relative frequency then $$E[X]$$ is the long term average value of $$X$$

E[g(X)] = \sum_x g(x)p_X(x)

E[X|A]

X:

A:

### What are we interested in?

E[X] = \sum_x xp_X(x)
= \sum_x xp_{X|A}(x)
(1 , 1) (1 , 2) (1 , 3) (1 , 4) (1 , 5) (1 , 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
(1 , 2) (1 , 4) (1 , 6)
(2, 2) (2, 4) (2, 6)
(3, 2) (3, 4) (3, 6)
(4, 2) (4, 4) (4, 6)
(5, 2) (5, 4) (5, 6)
(6, 2) (6, 4) (6, 6)
A
\Omega
\mathbb{R}_X ={3,4,5,6,7,8,9,10,11,12}
p_{X|A} ={\frac{1}{18},\frac{1}{18},\frac{2}{18},\frac{2}{18},\frac{3}{18},\frac{3}{18},\frac{2}{18},\frac{2}{18},\frac{1}{18},\frac{1}{18}}
= 7.5

### Conditional Expectation

E[g(X)|A]
E[X] = \sum_x xp_X(x)
= \sum_x g(x)p_{X|A}(x)
E[X|A]
= \sum_x xp_{X|A}(x)

### Instead of conditioning on events we can condition on random variables

E[X|Y=y]
= \sum_x xp_{X|Y}(x|y)
E[g(X)|Y=y]
= \sum_x g(x)p_{X|Y}(x|y)

### Total Expectation Theorem

E[X] = \sum_x xp_X(x)
A_1
A_5
A_4
A_3
A_2
A_6
A_7
B
\Omega
p_{X}(x) = \sum_{i=1}^n P(A_i)p_{X|A_i}(x)

### Multiply by $$x$$ on both sides and sum over $$x$$

\sum_x xp_{X}(x) = \sum_x x\sum_{i=1}^n P(A_i)p_{X|A_i}(x)
= \sum_{i=1}^n P(A_i)\sum_x x p_{X|A_i}(x)
= \sum_{i=1}^n P(A_i)E[X|A_i]
E[X]

### Instead of conditioning on events we can also condition on random variables

E[X]
= \sum_{y} p_Y(y)E[X|Y=y]

### Total Expectation Theorem

E[X] = \sum_x xp_X(x)

### time taken

0.5
0.3
0.2
X:
E[X|A_1] = 60 mins
E[X|A_2] = 30 mins
E[X|A_3] = 45 mins
E[X] = ?
\sum_{i=1}^{3}P(A_i)E[X|A_i]

### Example: You lose INR 1 if the number on die 1 is less than that on die 2 and win INR 1 otherwise

E[g(X)] = \sum_x g(x)p_X(x)
g(X,Y) = \begin{cases} -1~if X < Y\\ +1~if X \geq Y \end{cases}
E[g(X,Y)] = ?

### Expectation: Mult. rand. variables

E[g(X)] = \sum_x g(x)p_X(x)
E[g(X,Y)] = \sum_{y} p_Y(y) E [g(X,Y)|Y=y]
= \sum_{y} p_Y(y) E [g(X,y)|Y=y]
= \sum_{y} p_Y(y) \sum_{x} g(x,y)p_{X|Y}(x|y)
= \sum_{x} \sum_{y} p_Y(y) g(x,y)p_{X|Y}(x|y)
= \sum_{x} \sum_{y} g(x,y)p_{X,Y}(x,y)

### Example: You lose INR 1 if the number on die 1 is less than that on die 2 and win INR 1 otherwise

g(X,Y) = \begin{cases} -1~if X < Y\\ +1~if X \geq Y \end{cases}
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_{X,Y}(x,y)
p_X(x,y) = \frac{1}{36} \forall x,y
= \frac{1}{6}

### In general,

E[g(X,Y)] \neq g(E[X],E[Y])

### Exception 1

E[g(X,Y)]
g(X,Y) = aX + bY
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
= \sum_x\sum_y (ax + by)p_X(x,y)
= a \sum_x x \sum_y p_X(x,y) + b \sum_y y \sum_x p_X(x,y)
\underbrace{~~~~~~~~~~~~~~~~~~~~}
\underbrace{~~~~~~~~~~~~~~~~~~~~~}
= a \sum_x x p_X(x) + b \sum_y y p_Y(y)
= a E[X] + b E[Y]
= g(E[X], E[Y])
= \sum_x\sum_y g(x,y)p_X(x,y)
\underbrace{~~~~~~~~~~}
\underbrace{~~~~~~~~~~}

### In general,

E[g(X,Y)] \neq g(E[X],E[Y])

### Exception 2

E[g(X,Y)]
g(X,Y) = XY
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
= \sum_x\sum_y g(x,y)p_X(x,y)

### $$X,Y$$ are independent

= \sum_x\sum_y xyp_X(x)p_Y(y)
= \sum_x xp_X(x) \sum_y yp_Y(y)
= E[X]E[Y]
= g(E[X],E[Y])
\underbrace{~~~~~~~~~~~~~~~~~~~}
\underbrace{~~~~~~~~~~~~~~~~~~~}

### Recap,

Var(aX) = a^2 Var(X)
E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
Var(X+a) = Var(X)

### In general,

Var(X+Y) \neq Var(X) + Var(Y)
Examples, X = Y, X = -Y

### Exception: If $$X$$ and $$Y$$ are independent

Var(X+Y) = E [(X+Y)^2] - (E[X+Y])^2

### Proof: (given: $$X~and~Y$$ are independent)

E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)
Var(X+Y) = E [(X+Y)^2] - (E[X+Y])^2
= E [X^2 + 2XY + Y^2] - (E[X] + E[Y])^2
= E [X^2] + 2E[XY] + E[Y^2] - (E[X]^2 + 2E[X]E[Y] + E[Y]^2)
= E [X^2] + 2E[X]E[Y] + E[Y^2] - E[X]^2 - 2E[X]E[Y] - E[Y]^2
= E [X^2] - E[X]^2 + E[Y^2] - E[Y]^2
= Var(X) + Var(Y)

### Summary of main results

X
X|Y
X, Y
E[X] = \sum_x xp_X(x)

### "long term" average

E[g(X)] = \sum_x g(x)p_X(x)

### function of RV

E[a X + b] = a E[X] + b

### linearity of expectation

Var(X) = E[(X - E[X])^2]

Var(a X + b) = a^2 Var(X)
E[X|A] = \sum_x xp_{X|A}(x)

### conditioned on event

E[X|Y] = \sum_x xp_{X|Y}(x|y)

### conditioned on RV

E[g(X)|A] = \sum_x g(x)p_{X|A}(x)
E[g(X)|Y=y] = \sum_x g(x)p_{X|Y}(x|y)
E[X] = \sum_{i=1}^n P(A_i)E[X|A_i]
E[X] = \sum_{y} p_Y(y)E[X|Y=y]

### total expectation theorem

E[g(X,Y)] = \sum_x\sum_y g(x,y)p_X(x,y)

### function of multiple RVs

E[g(X,Y)] \neq g(E[X],E[Y])

### in general, not equal but

E[aX+bY)] = aE[X] + b E[Y]
E[XY)] = E[X]E[Y]

### if $$X$$ and $$Y$$ are independent

Var(X+Y) = Var(X) + Var(Y)