Recap: List of Topics

Descriptive Statistics

Probability Theory

Inferential Statistics

Different types of data

Different types of plots

Measures of centrality and spread

Counting, Sample spaces, events

Discrete and continuous RVs

Bernoulli, Uniform, Normal dist.

Sampling strategies

Interval Estimators

Hypothesis testing (z-test, t-test)

ANOVA, Chi-square test

Linear Regression

Learning Objectives

What are sets and some of their properties

What are experiments, sample spaces, outcomes and events?

What are the axioms of probability?

What are some simple ways of defining a  probability function?

What are some important theorems: multiplication rule, total probability theorem and Bayes' theorem?

What are independent events?

The element of chance (Nothing in life is certain)

Randomness everywhere!

What is the chance that he would get infected if he went to the market?

No definite answer!

Why?

Due to the random nature of the world around us

Randomness everywhere!

What is the mode of transport ?

Is private car always safer than public transport?

How good is his immune system?

Does he have co-morbidities?

How many infections in the neighbourhood?

Randomness everywhere!

How many infections in the neighbourhood?

Randomness everywhere!

How many infections in the neighbourhood?

The study of this chance is the subject matter of Probability Theory!

Set theory

Experiments, sample spaces, events

Axioms of Probability

Random Variables

Distributions

Expectation

A brief overview of Set Theory

Set: A collection of elements

S = \{a,e,i,o,u\}
E = \{0,2,4,\dots,96,98,100\}
E = \{x: 0 \leq x \leq 100, x\%2 = 0\}

Compact notation: Useful for large sets

x \in S

mean x belongs to the set S

2 \in E,~~3 \notin E

Subsets and equal sets

\mathbb{I}: set~of~all~integers
S = \{x: x \in \mathbb{I}, x < 0 \}

Every element of     in contained in

S
\mathbb{I}
S \subset \mathbb{I}
A = B~~iff~~A \subset B~and~B \subset A

subset

equal sets

Universal Set

A: set~of~all~aces
\Omega = set~of~all~52~cards

Every set of interest is a subset of the universal set

H: set~of~all~hearts
B: set~of~all~black~cards
F: set~of~all~face~cards
A \subset \Omega
H \subset \Omega
B \subset \Omega
F \subset \Omega

Empty Set

\phi = \{\}

Set with no elements (null set)

Set Operations

A^\mathsf{c} = \{x: x \in \Omega~and~x\notin A\}

Complement

Union (2 sets)

A \cup B = \{x: x \in A~or~x\in B\}

(black, gray and white in the image)

Intersection (2 sets)

A \cap B = \{x: x \in A~and~x\in B\}

(gray area in the image)

Set Operations (n sets)

Union (n sets)

x \in A_1 \cap A_2 \cap A_3 \cdots \cap A_n ~iff~x \in A_i \forall i

Intersection (n sets)

x \in A_1 \cup A_2 \cup A_3 \cdots \cup A_n ~iff~x \in A_i for~some~i
\Omega

Properties of Set operations

Commutativity

A \cup B = B \cup A
A \cap B = B \cap A

Associativity

A \cup (B \cup C) = (A \cup B) \cup C
A \cap (B \cap C) = (A \cap B) \cap C
\Omega

Properties of Set operations

Distributive Laws

A \cap (B \cup C) = (A \cap B) \cup (A \cap C)

Proof

A \cup (B \cap C) = (A \cup B) \cap (A \cup C)
x \in A\cap (B \cup C)
\implies x \in A~and~x \in (B \cup C)
\implies x \in A~and~B~or~x \in A~and~C
\Omega

Properties of Set operations

DeMorgan's Laws

(A \cup B)^\mathsf{c} = A^\mathsf{c} \cap B^\mathsf{c}

Proof

\implies x \in A^\mathsf{c} \cup B^\mathsf{c}
(A \cap B)^\mathsf{c} = A^\mathsf{c} \cup B^\mathsf{c}
x \in (A \cap B)^\mathsf{c}
\implies x \notin A \cap B
\implies x\notin A~or~x\notin B
\implies x \in A^\mathsf{c} ~or~x \in B^\mathsf{c}
\Omega
A
B

Countable v/s Uncountable Infinite Sets

: Set of all real numbers has infinite elements

: Set of all integers has infinite elements

\mathbb{I}
\mathbb{R}

(uncountable)

(countable)

An infinite set is said to be countable if there is a 1-1 correspondence between the elements of this set and the set of positive integers

Countable Infinite Sets

\mathbb{I} = \{-\infty, \dots, -3, -2, -1, 0, 1, 2, 3, \dots, \infty\}
= \{0,1,-1,2,-2,3,-3 \dots \}

: Set of all positive rational numbers

\mathbb{P}
\mathbb{P} = \{\frac{1}{2}, \frac{1}{3}, \frac{2}{3}, \frac{1}{4}, \frac{2}{4}, \frac{3}{4}, \dots\}

 1   2     3   4     5  6     7 ....

 1   2     3   4     5   ....

Uncountable Infinite Sets

: Set of all real numbers

\mathbb{R}
\mathbb{Q} = [0,1]

Experiments and Sample Spaces

Experiment: Bowling a bowl

Outcome: {0, 1, 2, 3, 4, 5, 6} runs

Experiment: Going to the mall

Outcome: {infected, not_infected}

Experiment: Blood Test

Outcome: {positive, negative}

Experiment: Writing an exam

Outcome: {A, B, C, D, E, F}

The outcome in every trial is uncertain but the set of outcomes is certain

An experiment or trial is any procedurethat can be repeated infinite times and has a well-defined set of outcomes

The set of all possible outcomes of an experiment is called the sample space. The elements in a sample space are mutually exclusive and collectively exhaustive

Experiments involving coin tosses

\Omega = \{H, T\}
\Omega = \{HH,
HT,
TH,
TT\}
\Omega = \{HHH,
HHT,
HTH, HTT,
THH, THT,
TTH, TTT\}
|\Omega| = 2
|\Omega| = 4
|\Omega| = 8
\dots n~coins
|\Omega| = 2^n

Experiments involving fair dice

\Omega = \{1, 2, 3, 4, 5, 6\}
|\Omega| = 6
|\Omega| = 36
\dots n~dice
|\Omega| = 6^n
\Omega = \{ (1,1), (1,2), (1,3), (1,4), (1,5), (1,6), \newline (2,1), (2,2), (2,3), (2,4), (2,5), (2,6),\newline(3,1), (3,2), (3,3), (3,4), (3,5), (3,6),\newline (4,1), (4,2), (4,3), (4,4), (4,5), (4,6),\newline (5,1), (5,2), (5,3), (5,4), (5,5), (5,6),\newline (6,1), (6,2), (6,3), (6,4), (6,5), (6,6) \}

Experiments involving cards

|\Omega| = 52
|\Omega| = 52^3
|\Omega| = 52^2
|\Omega| = 52^4

More the number of outcomes, less the probability of any single outcome (assuming all outcomes are equally likely)

Experiments:continuous outcomes

\Omega = \{(x,y)~s.t.~0\leq x,y\leq 1\}

Events of an experiment

\Omega = \{HH, HT, TH, TT\}

An event is a set of outcomes of an experiment. This set is a subset of the sample space

A = \{HH, HT\}

(the event that the first toss results in a head)

B = \{TT\}

(the event that both the tosses result in tails)

(the event that there are exactly 2 aces)

|C| = {4 \choose 2} * {48 \choose 1} = 288

We say that event A has occurred if the outcome of the experiment lies in the set A

A = \{HH, HT\}

(the event that the first toss results in a head)

B = \{TT\}

(the event that both the tosses result in tails)

(the event that there are exactly 2 aces)

|C| = {4 \choose 2} * {48 \choose 1} = 288
\Omega = \{HH, HT, TH, TT\}

Union of events

A = \{(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)\}
B = \{(1,4), (2,4), (3,4), (4,4), (5,4), (6,4)\}
C = A\cup B

(the event that the first die shows a 2)

(the event that the second die shows a 4)

= \{(2,1), (2,2), (2,3), (2,4), (2,5), (2,6),\\~~~~~~~~ (1,4), (2,4), (3,4), (4,4), (5,4), (6,5)\}

(the event that the first die shows a 2 or the second die shows a 4)

Intersection of events

A = \{(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)\}
B = \{(1,4), (2,4), (3,4), (4,4), (5,4), (6,4)\}
D = A \cap B

(the event that the first die shows a 2)

(the event that the second die shows a 4)

(the event that the first die shows a 2 and the second die shows a 4)

= \{(2,4)\}

Complement of events

A = \{(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)\}
B = \{(1,4), (2,4), (3,4), (4,4), (5,4), (6,4)\}
E = A^\mathsf{c}

(the event that the first die shows a 2)

(the event that the second die shows a 4)

(the event that the first die does not show a 2)

Multiple events

A

: the hand contains the ace of spade

B

: the hand contains the ace of clubs

C

: the hand contains the ace of hearts

A \cup B \cup C
A \cap B \cap C
\Omega
A
B
C

Disjoint events

A

: the event that the first die shows a 1

B

: the event that the first die shows a 2

A~and~A^\mathsf{c}

Two events A and B are said to be disjoint if they cannot occur simultaneously, i.e.,

A\cap A^\mathsf{c}= \phi
A\cap B= \phi
A\cap B= \phi
A\cup A^\mathsf{c}= \Omega
A\cup B \neq \Omega
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), \newline (2,1), (2,2), (2,3), (2,4), (2,5), (2,6),\newline(3,1), (3,2), (3,3), (3,4), (3,5), (3,6),\newline (4,1), (4,2), (4,3), (4,4), (4,5), (4,6),\newline (5,1), (5,2), (5,3), (5,4), (5,5), (5,6),\newline (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
\Omega
A
B

Disjoint events

A = \{HH\}
B=\{TT\}

The events                               are said to be mutually disjoint or pairwise disjoint if

A\cap B= \phi
A_i\cap A_j= \phi~\forall~i,j~s.t.~i\neq j
B\cap C = \phi
A_1, A_2, \dots, A_n
C=\{HT,TH\}
A\cap C = \phi

Partition of the sample space

A = \{HH\}
B=\{TT\}

If the events                               are mutually disjoint and                                                then                                       are said to partition the sample space

A\cap B= \phi
B\cap C = \phi
A_1, A_2, \dots, A_n
C=\{HT,TH\}
A\cap C = \phi
A_1 \cup A_2 \cup \dots \cup A_n = \Omega
A_1, A_2, \dots, A_n
A\cup B\cup C = \Omega
A_1
A_5
A_4
A_3
A_2
A_6
A_7

Axioms of Probability

Axioms of Probability

Recap

Experiments

Sample Space

Events

What is the chance of an event?

Goal: Assign a number to each event such that this number reflects the chance of the experiment resulting in that event

The probability function

What are the conditions that such a probability function must satisfy?

(Axioms of Probability)

P(A) = ?

Event

Probability function

The axioms of probability

P(A) \geq 0~\forall A

Axiom 1        (non-negativity)

P(\Omega) = 1

Axiom 2        (normalisation)

Axiom 3        (finite additivity)

If the events                              are mutually disjoint  then

A_1, A_2, \dots, A_n
= \sum_{i=1}^n P(A_i)
P(A_1\cup A_2 \cup \dots \cup A_n)

The axioms of probability

Axiom 3        (finite additivity)

= \sum_{i=1}^n P(A_i)
P(A_1\cup A_2 \cup \dots \cup A_n)

Smallest possible event = one outcome

Compute probabilities of larger events from smaller events

A_1
A_2
A_3
A_4
A_5
A_6

The axioms of probability

Given 

P(A_1),P(A_2),P(A_3),P(A_4),P(A_5),P(A_6)
B

: the event that the outcome is an odd no.

C

: the event that the outcome is

P(B) = P(A_1)+P(A_3)+P(A_5)
\geq 5
P(C) = P(A_5)+P(A_6)
D

: the event that the outcome is a mult. of 3

P(D) = P(A_3)+P(A_6)

we can compute other probabilities

The probability of an event can be computed as the sum of the probabilities of the disjoint outcomes contained in the event

Some properties of probability

Property 1: 

P(A) = 1 - P(A^\mathsf{c})
A \cup A^\mathsf{c} = \Omega
1 = P(\Omega) = P(A \cup A^\mathsf{c}) = P(A) + P(A^\mathsf{c})
\therefore P(A) = 1 - P(A^\mathsf{c})

Property 2: 

P(A) \leq 1
P(A) = 1 - P(A^\mathsf{c})

Some properties of probability

Property 3: 

P(A \cup B) = P(A) + P(B) - P(A \cap B)
A^\mathsf{c}
P(A \cup B) = P(A \cup (B \cap A^\mathsf{c}))
= P(A) + P(B \cap A^\mathsf{c})
P(B) = P((B \cap A^\mathsf{c}) \cup (B \cap A))
= P(B\cap A^\mathsf{c}) + P(B\cap A)
\therefore P(B\cap A^\mathsf{c}) = P(B) - P(B\cap A)
= P(A) + P(B) - P(B\cap A)

Some properties of probability

Property 4:  

\Omega = A_1 \cup A_2 \cup \dots \cup A_n

the sum of the probabilities of all outcomes is equal to 1

P(\Omega) = P(A_1 \cup A_2 \cup \dots \cup A_n) = \sum_{i=1}^n P(A_i)
\therefore \sum_{i=1}^n P(A_i) = 1
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
\Omega

Property 5:  

P(\phi) = 0
\therefore 1 = P(\Omega) = P(\Omega \cup \phi) = P(\Omega) + P(\phi)
\therefore 1 = 1 + P(\phi) \implies P(\phi) = 0

Examples

Outcomes: {0,   1,   2,   3,   4,   5,   6}

A_0, A_1, A_2, A_3, A_4, A_5, A_6
P(A_0)=0.3, P(A_1)=0.45, P(A_2)=0.12, P(A_3)=0.02, \\ P(A_4)=0.07, P(A_5)=0.01, P(A_6)=0.03

What is the prob. of scoring even no. of runs?

X = A_0 \cup A_2 \cup A_4 \cup A_6
P(X) = P(A_0 \cup A_2 \cup A_4 \cup A_6)
= P(A_0) + P(A_2) + P(A_4) + P(A_6)

Examples

Outcomes: {0,   1,   2,   3,   4,   5,   6}

A_0, A_1, A_2, A_3, A_4, A_5, A_6
P(A_0)=0.3, P(A_1)=0.45, P(A_2)=0.12, P(A_3)=0.02, \\ P(A_4)=0.07, P(A_5)=0.01, P(A_6)=0.03

What is the prob. of scoring less than 5 runs?

X = A_0 \cup A_1 \cup A_2 \cup A_3 \cup A_4
P(X) = P(A_0 \cup A_1 \cup A_2 \cup A_3 \cup A_4)
= P(A_0) + P(A_1) + P(A_2) + P(A_3) + P(A_4)

Examples

Outcomes: {0,   1,   2,   3,   4,   5,   6}

A_0, A_1, A_2, A_3, A_4, A_5, A_6
P(A_0)=0.3, P(A_1)=0.45, P(A_2)=0.12, P(A_3)=0.02, \\ P(A_4)=0.07, P(A_5)=0.01, P(A_6)=0.03

The prob. that the runs will be div. by 2 or 3?

X_1 = \{0, 2, 4, 6\} = A_0 \cup A_2 \cup A_4 \cup A_6
X_2 = \{0, 3, 6\} = A_0 \cup A_3 \cup A_6
X_1 \cap X_2 = \{0, 6\} = A_0 \cup A_6
P(X_1) = 0.3 + 0.12 + 0.07 + 0.03 = 0.52
P(X_2) = 0.3 + 0.02 + 0.03 = 0.35
P(X_1 \cap X_2) = 0.3 + 0.03 = 0.33
P(X_1 \cup X_2) = P(X_1) + P(X_2) - P(X_1 \cap X_2)
\therefore P (X_1 \cup X_2) = 0.52 + 0.35 - 0.33 = 0.54

Examples

The ball bearings manufactured by a factory have two types of defects. Suppose the probability of having Type 1 defect is 0.01, having Type 2 defect is 0.02 and having both is 0.0005

P(A_1)=0.01, P(A_2)=0.03, P(A_1 \cap A_2) = 0.005
P(A_1 \cup A_2)= P(A_1) + P(A_2) - P(A_1 \cap A_2)

What is the prob. that the bearing will have type 1 or type 2 defect ?

= 0.01 + 0.02 - 0.005 = 0.025
A_2
A_1

Examples

The ball bearings manufactured by a factory have two types of defects. Suppose the probability of having Type 1 defect is 0.01, having Type 2 defect is 0.02 and having both is 0.0005

P(A_1)=0.01, P(A_2)=0.03, P(A_1 \cap A_2) = 0.005
P(A_1 \cup A_2)^\mathsf{c}= 1 - P(A_1 \cup A_2)

What is the prob. that the bearing will have neither type 1 nor type 2 defect ?

= 1 - 0.025 = 0.975
A_2
A_1

Designing Probability Functions

Designing Probability Functions

(probability as relative frequency)

Recap

Goal: Assign a number to each event such that this number reflects the chance of the experiment resulting in that event

Required: The probability function must satisfy the axioms of probability

Probability as relative frequency

Karl Pearson tossed a coin 24000 times he observed that the number of heads was 12012

P(H) = \frac{12012}{24000} = 0.5005

We can think of the probability of an event as the fraction of times the event occurs when an experiment is repeated a large number of times

Probability as relative frequency

We can think of the probability of an event as the fraction of times the event occurs when an experiment is repeated a large number of times

P(A_i) = \frac{no.~of~times~the~outcome~is~in~A_i}{total~no.~of~times~the~experiment~was~repeated}

But does such a P() satisfy the axioms of probability?

Probability as relative frequency

P(A_i) \geq 0~?

: ratio of two positive numbers

Does P() satisfy the axioms?

P(\Omega)=1~?
P(\Omega) = \frac{no.~of~times~the~outcome~is~in~S}{total~no.~of~times~the~experiment~was~repeated} = 1
\Omega
P(A_1 \cup A_2) = P(A_1)+ P(A_2)~?
P(A_1 \cup A_2) = \frac{k_1 + k_2}{k} = \frac{k_1}{k} + \frac{k_2}{k}
= P(A_1)+ P(A_2)
A_1
A_2

Probability as relative frequency

Does P() satisfy the axioms?

\Omega
P(A_1 \cup A_2) = P(A_1)+ P(A_2)~?
|\Omega| = n \implies 2^n~subsets \implies 2^n~events
A_1, A_2, A_3, \dots, A_n

Suppose                                      are the outcomes

Every event is a union of these outcomes

If the frequencies of                                       are known then the probability of any event can be computed

A_1, A_2, A_3, \dots, A_n

(axioms are about events)

Examples

A dataset contains images of beaches (60000), mountains (25000) and forests (15000)

What is the probability that a randomly picked image would be of a forest?

Experiment: Select an image

Number of trials: 100000

Frequency of the event "forest": 15000

P(forest) = \frac{15000}{100000} = 0.15

Examples

A country tests 20 million randomly selected people and finds that 1 million are infected

What is the probability that a randomly picked person would be infected?

Experiment: Perform a test

Number of trials: 20 million

Frequency of the event "infected": 1 million

P(infected) = \frac{1000000}{20000000} = 0.05

Examples

By May-10-2020, India had tested 1673688 samples of which 67176 were found to be positive. Does this mean the probability that a randomly selected person being infected is 0.04

A subtle point: the sample from which the probabilities were estimated should be drawn from the same population on which we are interested in making inferences

No: testing in India was not random but only for people with flu-like symptoms

Flu
\Omega

Designing Probability Functions

(the case of equally likely outcomes)

Equally likely outcomes

P(H) = P(T) = k
\Omega = H \cup T
P(\Omega) = P(H \cup T)
= P(H) + P(T)
\Omega = \{H, T\}
H
T
= 2k
= 1
\therefore P(H) = P(T) = k = \frac{1}{2}

We can now compute the probability of all 4 subsets of

\Omega
\phi, \{H\}, \{T\}, \{H, T\}

Equally likely outcomes

A_i

: event that the out come is i

A_1, A_2, A_3, A_4, A_5, A_6

partition 

\Omega
P(A_1) = P(A_2) = P(A_3) = P(A_4) = P(A_5) = P(A_6) = k
P(S) = \sum_{i=1}^6 P(A_i) = 6k = 1
\therefore P(A_i) = \frac{1}{6}

We can now compute the probability of all subsets of

\Omega
\Omega = \{1, 2, 3, 4, 5, 6\}
A_1
A_2
A_3
A_4
A_5
A_6

Equally likely outcomes

We can now compute the probability of all subsets of

\Omega
E

: outcome is even

\Omega = \{1, 2, 3, 4, 5, 6\}
A_1
A_2
A_3
A_4
A_5
A_6
O

: outcome is odd

D

: outcome is divisible by 3

Equally likely outcomes

Can we derive a formula for computing the probability of events of an experiment with n equally likely outcomes?

E

: any event with k outcomes

(the outcomes are of course disjoint)

P(E) = \sum_{i=1}^{k} \frac{1}{n} = \frac{k}{n}
\frac{1}{n}
P(X) = \frac{number~of~outcomes~in~X}{number~of~outcomes~in~\Omega}

Equally likely outcomes

Are the axioms of probability satisfied?

\frac{1}{n}
P(X) = \frac{number~of~outcomes~in~X}{number~of~outcomes~in~\Omega}
P(A_i) \geq 0~?

: ratio of two positive numbers

P(\Omega)=1~?
P(A_1 \cup A_2) = P(A_1)+ P(A_2)~?

: contains all outcomes

A_1
A_2
P(A_1 \cup A_2) = \frac{k_1 + k_2}{n} = \frac{k_1}{n} + \frac{k_2}{n}
= P(A_1) + P(A_2)

Examples

What is the probability of getting a black card?

\frac{1}{n}
P(B) = \frac{26}{52}
P(X) = \frac{number~of~outcomes~in~X}{number~of~outcomes~in~\Omega}

What is the probability of getting 3 aces?

n = {52 \choose 3} = 22100
P(A) = \frac{4}{22100}

Examples

What is the probability of hitting the red circle at the centre?

P(C) = \frac{\pi r^2}{\pi R^2} = (\frac{r}{R})^2
R

: radius of dartboard

r

: radius of red circle

Summary

 

Learning Objectives

What are sets and some of their properties

What are experiments, sample spaces, outcomes and events?

What are the axioms of probability?

What are some simple ways of defining a  probability function?

What are some important theorems: multiplication rule, total probability theorem and Bayes' theorem?

What are independent events?

Set Theory

Finite, Countably infinite, Uncountably infinite

Intersection, Union, Complement

Properties of set operations

Disjoint sets

Axioms of Probability

Designing Probability Functions

Goal: Assign a number to each event such that this number reflects the chance of the experiment resulting in that event

Required: The probability function must satisfy the axioms of probability

Probability as long term relative frequency

Equally likely outcomes

Conditional Probabilities

Change in belief

Before start of play: What is the chance of India winning?

(assume fair playing conditions & equally good teams)

India scores 395 batting first: What is the chance of India winning?

> 0.5
0.5

Change in belief

What exactly happened here?

(assume fair playing conditions & equally good teams)

A: event that India will win

B: India scores 395 runs

P(A) changes once we know that event B has occurred

P(A|B)
\neq P(A)

Change in belief

What is the probability that a randomly selected person is healthy (not infected)?

A: event that a person is healthy

B: event that the person has COVID-19 symptoms

10% of the population is infected

P(A) = 0.9
P(A|B) \neq P(A)

                is called the conditional probability of the event A given the event B

P(A|B)

The definition of P(B|A)

P(A) = \frac{5}{36}
(1 , 1) (1 , 2) (1 , 3) (1 , 4) (1 , 5) (1 , 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

What is the probability that the sum is 8?

The definition of P(B|A)

P(A|B) = \frac{1}{6}
(1 , 1) (1 , 2) (1 , 3) (1 , 4) (1 , 5) (1 , 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

What is the probability that the sum is 8 given that the first dice shows a 4?

A: sum is 8

B: first dice shows a 4

The definition of P(B|A)

P(B) = \frac{6}{36}

A: sum is 8

B: first dice shows a 4

\Omega
(6, 2)
(1, 1)~(1,2)~(1,3)~(1,4)~(1,5)~(1,6)
(4, 4)
(3, 5)
(5, 3)
(4, 1)~~(4,2)~~(4,3)~~(4,5)~~(4,6)
(2, 6)
(2, 1)~(2,2)~(2,3)~(2,4)~(2,5)
(3, 1)~(3,2)~(3,3)~(3,4)~(3,6)
(5, 1)~(5,2)~(5,4)~(5,5)~(5,6)
(6, 1)~(6,3)~(6,4)~(6,5)~(6,6)
A
B
A \cap B
P(A\cap B) = \frac{1}{36}
P(A|B) = \frac{P(A\cap B)}{P(B)} = \frac{\frac{1}{36}}{\frac{6}{36}} = \frac{1}{6}

                is called the conditional probability of the event A given the event B

P(A|B)
P(A|B) = \frac{P(A\cap B)}{P(B)}

conditional probability

regular probabilities

(we already know how to compute these)

Examples

I am thinking of a two digit number. Suppose I tell you that at least one of the two digits is even then what is the probability that both are even?

A: event that both digits are even

B: event that at least one digit is even

P(A) = \frac{20}{90} = \frac{2}{9}

10, 11, 12, 13, 14, 15, ...., ..., ..., ..., ..., ..., ..., ..., ..., 94, 95, 96, 97, 98, 99

(all equally likely)

but we are interested in P(A|B)

Examples

I am thinking of a two digit number. Suppose I tell you that at least one of the two digits is even then what is the probability that both are even?

A: event that both digits are even

B: event that at least one digit is even

P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{\frac{20}{90}}{\frac{65}{90}} = \frac{4}{13}

10, 11, 12, 13, 14, 15, ...., ..., ..., ..., ..., ..., ..., ..., ..., 94, 95, 96, 97, 98, 99

(all equally likely)

A
B

Examples

60% of the students in a class opt for ML. 20% of the students opt for both ML and DL. Given that a student has opted for ML what is the probability that she has also opted for DL?

A: event that student has opted for DL

B: event that the student has opted for ML

P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0.2}{0.6} = \frac{1}{3}
A
B
\Omega

Do conditional probabilities satisfy the axioms of probability?

Axioms of Probability

P(A|B) = \frac{P(A\cap B)}{P(B)} \geq 0

: ratio of two probabilities

P(\Omega|A)= \frac{P(\Omega \cap B)}{P(B)} = \frac{P(B)}{P(B)} = 1
P(A_1 \cup A_2|B) = \frac{P((A_1 \cup A_2) \cap B)}{P(B)}
= \frac{P(A_1 \cap B)}{P(B)} + \frac{P(A_2 \cap B)}{P(B)}
= P(A_1|B) + P(A_2|B)
B
A_1
\Omega
A_2
= \frac{P((A_1 \cap B) \cup (A_2 \cap B)}{P(B)}

The multiplication principle

The chain rule of probability

P(A|B) = \frac{P(A\cap B)}{P(B)}
P(B|A) = \frac{P(B\cap A)}{P(A)}
\therefore P(A\cap B) = P(A|B)\cdot P(B)
\therefore P(B\cap A) = P(B|A)\cdot P(A)
\therefore P(A\cap B) = P(A|B)\cdot P(B) = P(B|A)\cdot P(A)

The chain rule of probability

+
B
\Omega
A

A: event that a person is infected

B: event that the test result is positive

A\cap B
A\cap B^\mathsf{c}
A^\mathsf{c}\cap B
A^\mathsf{c}\cap B^\mathsf{c}

The chain rule of probability

+
B
\Omega
A

roughly 10% of the population is infected

for an infected person the test shows a negative result in 1% of the cases

Facts:

P(A) = 0.1
P(B^\mathsf{c}|A) = 0.01
\implies P(B|A) = 0.99
\implies P(B^\mathsf{c}|A^\mathsf{c}) = 0.95
P(B|A^\mathsf{c}) = 0.05

for an healthy person the test shows a positive result in 5% of the cases

The chain rule of probability

+
B
\Omega
A

Facts:

P(A) = 0.1
P(B^\mathsf{c}|A) = 0.01
\implies P(B|A) = 0.99
\implies P(B^\mathsf{c}|A^\mathsf{c}) = 0.95
P(B|A^\mathsf{c}) = 0.05
A
A \cap B
A \cap B^\mathsf{c}
A^\mathsf{c} \cap B
A^\mathsf{c} \cap B^\mathsf{c}
A^\mathsf{c}
B|A
B^\mathsf{c}|A
B|A^\mathsf{c}
B^\mathsf{c}|A^\mathsf{c}

A:  infected

B: test positive

The chain rule of probability

for n events

P(A \cap B \cap C) = P((A \cap B) \cap C)
Let~(A \cap B)=X
\therefore P(A \cap B \cap C) = P(X \cap C)
\therefore P(A \cap B \cap C) = P(X)\cdot P(C|X)
\therefore P(A \cap B \cap C) = P(A\cap B)\cdot P(C|A \cap B)
\therefore P(A \cap B \cap C) = P(A)\cdot P(B|A)\cdot P(C|A \cap B)

The chain rule of probability

for n events

P(A \cap B \cap C \cap D)
= P(A)\cdot P(B|A)\cdot P(C|A \cap B)\cdot P(D|A \cap B \cap C)
P(A_1 \cap A_2 \cap \dots \cap A_n)
= P(A_1)\prod_{i=2}^{n} P(A_i|A_1\dots A_{i-1})

Examples

Suppose you draw 3 cards one by one without replacement. What is the probability that all the 3 cards are aces?

Using counting principles:

p = \frac{4 \choose 3}{52 \choose 3} = \frac{\frac{4!}{1!~3!}}{\frac{52!}{49!~3!}} = \frac{4*3*2}{52*51*50}

Examples

Suppose you draw 3 cards one by one without replacement. What is the probability that all the 3 cards are aces?

Using chain rule:

A_i

:the event that the i-th card is an ace

P(A_1\cap A_2\cap A_3)
= P(A_1)\cdot P(A_2|A_1)\cdot P(A_3|A_2 \cap A_1)
P(A_1)
= \frac{4}{52}
P(A_2|A_1)
= \frac{3}{51}
P(A_3|A_1\cap A_2)
= \frac{2}{50}
= \frac{4*3*2}{52*51*50}

Total Probability Theorem

Total Probability Theorem

A_1
A_5
A_4
A_3
A_2
A_6
A_7
B
A_1, A_2, \cdots A_n
\Omega

partition

\Omega
A_1 \cup A_2, \cup \dots \cup A_n = \Omega
A_i \cap A_j = \phi~\forall i \neq j
B = (B \cap A_1) \cup (B \cap A_2) \cup \dots \cup (B \cap A_n)
P(B) = P(B \cap A_1) + P(B \cap A_2) + \dots + P(B \cap A_n)
P(B) = \sum_{i=1}^{n} P (A_i) \cdot P(B | A_i)
P(B)
= P(A_1)\cdot P(B|A_1) + P(A_2)\cdot P(B|A_2) + \dots + P(A_n)\cdot P(B|A_n)

Examples

+
B
\Omega
A

Facts:

P(A) = 0.1
P(B^\mathsf{c}|A) = 0.01
\implies P(B|A) = 0.99
\implies P(B^\mathsf{c}|A^\mathsf{c}) = 0.95
P(B|A^\mathsf{c}) = 0.05

A:  infected

B: test positive

P(B) = ?
P(B) = P(A)P(B|A) + P(A^\mathsf{c})(B(|A^\mathsf{c})
\therefore P(B) = 0.1*0.99 + 0.9*0.05 =0.144

Examples

What is the probability that he will come out alive?

P(A_1)=P(A_2)=P(A_3) = \frac{1}{3}
P(B|A_1) = 0.3
P(B|A_2) = 0.6
P(B|A_3) = 0.75
P(B^\mathsf{c}) = ?

B   :  monster encountered

      i-th path taken

A_i:
= P(A_1)P(B^\mathsf{c}|A_1) + P(A_2)P(B^\mathsf{c}|A_2) + P(A_3)P(B^\mathsf{c}|A_3)
= \frac{1}{3}\cdot0.7 + \frac{1}{3}\cdot0.4 + \frac{1}{3}\cdot0.25 = 0.45

Bayes' Theorem

Examples

If he does not come out alive what is the probability that he took path A1?

P(A_1|B) = ?

B   :  monster encountered

      i-th path taken

A_i:
P(A_1|B) = \frac{P(A_1 \cap B)}{P(B)}
P(A_1|B) = \frac{P(A_1 \cap B)}{P(A_1)\cdot P(B|A_1) + P(A_2)\cdot P(B|A_2) + P(A_3)\cdot P(B|A_3)}
P(A_1 \cap B) = P(A_1)P(B|A_1) \\ P(A_1 \cap B) = P(B)P(A_1|B)
P(A_1)=P(A_2)=P(A_3) = \frac{1}{3}
P(B|A_1) = 0.3
P(B|A_2) = 0.6
P(B|A_3) = 0.75

Examples

If he does not come out alive what is the probability that he took path A1?

P(A_1)=P(A_2)=P(A_3) = \frac{1}{3}
P(B|A_1) = 0.3
P(B|A_2) = 0.6
P(B|A_3) = 0.75
P(A_1|B) = ?

B   :  monster encountered

      i-th path taken

A_i:
P(A_1|B) = \frac{P(A_1 \cap B)}{P(B)}
P(A_1|B) = \frac{P(A_1)P(B|A_1)}{P(A_1)\cdot P(B|A_1) + P(A_2)\cdot P(B|A_2) + P(A_3)\cdot P(B|A_3)}
P(A_1 \cap B) = P(A_1)P(B|A_1) \\ P(A_1 \cap B) = P(B)P(A_1|B)
= 0.182

Examples

If he does not come out alive what is the probability that he took path A3?

P(A_1)=P(A_2)=P(A_3) = \frac{1}{3}
P(B|A_1) = 0.3
P(B|A_2) = 0.6
P(B|A_3) = 0.75
P(A_3|B) = ?

B   :  monster encountered

      i-th path taken

A_i:
P(A_3|B) = \frac{P(A_3 \cap B)}{P(B)}
P(A_3|B) = \frac{P(A_3)P(B|A_3)}{P(A_1)\cdot P(B|A_1) + P(A_2)\cdot P(B|A_2) + P(A_3)\cdot P(B|A_3)}
P(A_3 \cap B) = P(A_3)P(B|A_3) \\ P(A_3 \cap B) = P(B)P(A_3|B)
= 0.45

Breaking it down

Exploit Multiplication Rule

P(A_1)P(B|A_1) = P(B)P(A_1|B)

Exploit Total Probability Theorem

P(B) = P(A_1)\cdot P(B|A_1) + P(A_2)\cdot P(B|A_2) + P(A_3)\cdot P(B|A_3)

Exploit the known probabilities

P(A_1|B) = \frac{P(A_1)\cdot P(B|A_1) }{\sum_{i=1}^nP(A_i)P(B|A_i)}

Bayes' Theorem

Breaking it down: Example 1

A   :  Ship 1 sends a signal 1

B  :  Ship 2 receives a signal 1

P(A) = 0.01
P(B|A) = 0.95
P(B|A^\mathsf{c}) = 0.05
P(A|B) = ?
P(A|B) = \frac{P(A)P(B|A)}{P(A)P(B|A) + P(A^\mathsf{c})P(B|A^\mathsf{c})} = 0.18
A
A^\mathsf{c}
10000
9900
100
495
95

Breaking it down: Example 2

A
A^\mathsf{c}
10000
9000
1000
450
990
+
B
\Omega
A

Facts:

P(A) = 0.1
P(B^\mathsf{c}|A) = 0.01
\implies P(B|A) = 0.99
\implies P(B^\mathsf{c}|A^\mathsf{c}) = 0.95
P(B|A^\mathsf{c}) = 0.05

A:  infected

B: test positive

P(A|B) = ?
P(A|B) = \frac{P(A)P(B|A)}{P(A)P(B|A) + P(A^\mathsf{c})P(B|A^\mathsf{c})} = 0.6875

Bayes' Theorem: 3 forms

P(A_i|B) = \frac{P(A_i)\cdot P(B|A_i) }{\sum_{j=1}^{n}P(A_j)P(B|A_j)}
P(A|B) = \frac{P(A \cap B)}{P(B)}
P(A|B) = \frac{P(A)\cdot P(B|A) }{P(B)}

Independent Events

Do we always update our beliefs

A: I had a sandwich for breakfast

If A occurs will you update your belief about B ?

What do you call such events?

B: It will rain today

No

Independent events

Example

50 girls and 70 boys in a class. Of these, 35 girls and 49 boys are good at Maths. If I tell you that a student is very good at Maths what is the probability that she is a girl?

Facts:

P(A) = \frac{50}{50+70} = \frac{5}{12}
P(B|A) = \frac{35}{50} = \frac{7}{10}

A:  student is girl

B: student is good at Maths

P(A^\mathsf{c}) = \frac{7}{12}
P(B|A^\mathsf{c}) = \frac{49}{70} = \frac{7}{10}
P(A|B) = ?
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
P(B) = P(B|A)P(A) + P(B|A')P(A') = \frac{7}{10}
= \frac{\frac{7}{10}\frac{5}{12}}{\frac{7}{10}} = \frac{5}{12}

Tow events A and B are independent if

P(A|B) = P(B)
P(A \cap B) = P(B)\cdot P(A|B)

Tow events A and B are independent if

P(A\cap B) = P(A)\cdot P(B)

Example

A:  first toss results in a head

B: exactly 2 tosses result in heads

P(A) = \frac{4}{8}

Are A and B independent ?

H H H *
H H T * * *
H T H * * *
H T T *
T H H *
T H T
T T H
T T T
A
B
A \cap B
P(B) = \frac{3}{8}
P(A \cap B) = \frac{2}{8}
P(A \cap B) \neq P(A)P(B)

Example

Are A and B independent ?

A: sum is 7

B: second dice shows an even number

A = \{ (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) \}
B = \{(1,2), (1,4), (1,6), (2,2), (2,4), (2,6),(3,2), (3,4), (3,6), \newline(4,2), (4,4), (4,6), (5,2), (5,4), (5,6), (6,2), (6,4), (6,6)\}
A \cap B = \{(1,6), (3,4), (5,2) \}
P(A) = \frac{6}{36} = \frac{1}{6}
P(A \cap B) = \frac{3}{36}= \frac{1}{12}
P(B) = \frac{18}{36} = \frac{1}{2}
\therefore P(A \cap B) = P(A)\cdot P(B)

Example

A: first answer is correct

B: second answer is correct

A quiz has two multiple choice Qs. The first Q has 4 choices of which 1 is correct and the second Q has 3 choices of which 1 is correct. If a student randomly guesses the answers what is the probability that he will answer both Qs correctly?

P(A) = \frac{1}{4}
P(B) = \frac{1}{3}
P(A \cap B) = P(A)\cdot P(B) = \frac{1}{12}

Independence: n events

We say that events                                      are pairwise independent if

A_1, A_2, A_3, \dots, A_n
P(A_i \cap A_j) = P(A_i)\cdot P(A_j)~\forall~i\neq j

We say that events                                      are mutually independent or independent if for all subsets

P(\cap_{i \in I} A_i) = \prod_{i=1}^{n}P(A_i)
A_1, A_2, A_3, \dots, A_n
I \subset \{1,2,3,\dots,n\}
\{1,2,3\}
n = 3
\{1,2\}, \{1, 3\}, \{2,3\}, \{1, 2, 3\}
P(A_1 \cap A_2 ) = P(A_1)\cdot P(A_2)
P(A_1 \cap A_3 ) = P(A_1)\cdot P(A_3)
P(A_2 \cap A_3 ) = P(A_2)\cdot P(A_3)
P(A_1 \cap A_2 \cap A_3 )
= P(A_1)\cdot P(A_2)\cdot P(A_3)

Summary

Putting it all together

Conditional Probability

Compute~P(B|A)~using~P(A\cap B)~and~P(A)

Multiplication Rule

Compute~P(A\cap B)~using~P(A)~and~P(B|A)

Total Probability Theorem

Compute~P(B)~using~P(A_1), P(A_2), \dots, P(A_n)
and~P(B|A_1), P(B|A_2), \dots, P(B|A_n)

Bayes' Theorem

Compute~P(A|B)~using
P(\cap_{i \in I} A_i) = \prod_{i=1}^{n}P(A_i)

Independent events

FDS_Intro_To_Probability

By One Fourth Labs

FDS_Intro_To_Probability

PadhAI One: FDS Week 3 (MK)

  • 58