Bayes' Theorem (2.6.3)

Step 1: Understand Conditional Probability

Definition

P(A | B) is the probability of A happening given that B has already happened.

Formula by definition:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$$

Intuitive Meaning

P(A ∩ B): probability that both A and B happen together
P(B): probability that B happens
When we know B has occurred, the sample space shrinks to only B
So we divide by P(B) to "normalize" to this new smaller space

Suppose there are 100 students: 30 female, 70 male
Among 30 females: 20 like math

$$P(\text{likes math} \mid \text{female}) = \frac{20}{30} = \frac{P(\text{female and likes math})}{P(\text{female})}$$

Illustrative Example

Suppose there are 100 students: 30 female, 70 male
Among 30 females: 20 like math

$$P(\text{likes math} \mid \text{female}) = \frac{20}{30} = \frac{P(\text{female and likes math})}{P(\text{female})}$$

Step 2: Reverse Direction

Similarly, we can write the probability of B given A:

$$P(B \mid A) = \frac{P(A \cap B)}{P(A)}$$

Explanation:

This is also the definition of conditional probability, just swapping roles of A and B
P(B | A) = probability of B happening given that A has already happened

Step 3: Find P(A ∩ B)

From Step 2, multiply both sides by P(A):

$$P(B \mid A) \cdot P(A) = P(A \cap B)$$

Rewritten as:

$$P(A \cap B) = P(B \mid A) \cdot P(A)$$

This is called the Chain Rule (or Product Rule) in probability.

Step 4: Derive Bayes' Theorem

From Step 1: P(A | B) = P(A ∩ B) / P(B)

Substitute P(A ∩ B) = P(B | A) · P(A) from Step 3:

$$\boxed{P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}}$$

Meaning of Bayes' Theorem

Component	Name	Meaning
P(A \| B)	Posterior	Probability of A after observing B
P(A)	Prior	Initial probability of A
P(B \| A)	Likelihood	How likely we observe B if A is true
P(B)	Evidence	Total probability of observing B

Concrete Example: Email Spam Filter

Problem: You receive an email containing the word "FREE". What's the probability it's spam?

Given:

P(Spam) = 0.30 (30% of all emails are spam)
P(Not Spam) = 0.70
P("FREE" | Spam) = 0.80 (80% of spam emails contain "FREE")
P("FREE" | Not Spam) = 0.10 (10% of legitimate emails contain "FREE")

Find: P(Spam | "FREE") = ?

Solution using Bayes:

Step 1: Calculate P("FREE") using Law of Total Probability

$$P(\text{"FREE"}) = P(\text{"FREE"} \mid \text{Spam}) \cdot P(\text{Spam}) + P(\text{"FREE"} \mid \text{Not Spam}) \cdot P(\text{Not Spam})$$

$$P(\text{"FREE"}) = 0.80 \times 0.30 + 0.10 \times 0.70 = 0.24 + 0.07 = 0.31$$

Step 2: Apply Bayes' Theorem

$$P(\text{Spam} \mid \text{"FREE"}) = \frac{P(\text{"FREE"} \mid \text{Spam}) \cdot P(\text{Spam})}{P(\text{"FREE"})}$$

$$P(\text{Spam} \mid \text{"FREE"}) = \frac{0.80 \times 0.30}{0.31} = \frac{0.24}{0.31} \approx 0.774 = 77.4%$$

Conclusion: If an email contains "FREE", there's a 77.4% chance it's spam!

Exercise 7: Problem Setup

Problem (D2L 2.6.5): Assume the two tests are not independent.

Given:

P(D = 1 | H = 0) = 0.10 (false positive = 10%)
P(D = 0 | H = 1) = 0.01 (false negative = 1%)
Sensitivity: P(D = 1 | H = 1) = 0.99 (99%)
For infected (H = 1): tests are conditionally independent
For healthy (H = 0): tests are coupled with P(D₁ = D₂ = 1 | H = 0) = 0.02
Baseline: P(H = 1) = 0.0015

Part 1: Joint Probability Table (H = 0)

Step 1: Marginal probabilities for D₁ given H = 0:

P(D₁ = 1 | H = 0) = 0.10
P(D₁ = 0 | H = 0) = 0.90

Step 2: We know P(D₁ = 1, D₂ = 1 | H = 0) = 0.02 (given)

Step 3: Find remaining joint probabilities:

P(D₁ = 1, D₂ = 0 | H = 0) = P(D₁ = 1 | H = 0) - P(D₁ = 1, D₂ = 1 | H = 0) = 0.10 - 0.02 = 0.08

P(D₂ = 1 | H = 0) = 0.10 (by symmetry, same false positive rate)

P(D₁ = 0, D₂ = 1 | H = 0) = P(D₂ = 1 | H = 0) - P(D₁ = 1, D₂ = 1 | H = 0) = 0.10 - 0.02 = 0.08

P(D₁ = 0, D₂ = 0 | H = 0) = 1 - 0.02 - 0.08 - 0.08 = 0.82

Joint Probability Table:

	D₂ = 0	D₂ = 1	Marginal
D₁ = 0	0.82	0.08	0.90
D₁ = 1	0.08	0.02	0.10
Marginal	0.90	0.10	1.00

Part 2: P(H = 1 | D₁ = 1)

Using Bayes' Theorem:

$$P(H = 1 \mid D_1 = 1) = \frac{P(D_1 = 1 \mid H = 1) \cdot P(H = 1)}{P(D_1 = 1)}$$

Calculate P(D₁ = 1):

$$P(D_1 = 1) = P(D_1 = 1 \mid H = 1) \cdot P(H = 1) + P(D_1 = 1 \mid H = 0) \cdot P(H = 0)$$

$$P(D_1 = 1) = 0.99 \times 0.0015 + 0.10 \times 0.9985 = 0.001485 + 0.09985 = 0.101335$$

Apply Bayes:

$$P(H = 1 \mid D_1 = 1) = \frac{0.99 \times 0.0015}{0.101335} = \frac{0.001485}{0.101335} \approx 0.0147 = 1.47%$$

Result: P(H = 1 | D₁ = 1) ≈ 1.47%

Part 3: P(H = 1 | D₁ = 1, D₂ = 1)

For H = 1 (infected, tests independent):

$$P(D_1 = 1, D_2 = 1 \mid H = 1) = P(D_1 = 1 \mid H = 1) \times P(D_2 = 1 \mid H = 1) = 0.99 \times 0.99 = 0.9801$$

For H = 0 (healthy, from table):

$$P(D_1 = 1, D_2 = 1 \mid H = 0) = 0.02$$

Calculate P(D₁ = 1, D₂ = 1):

$$P(D_1 = 1, D_2 = 1) = 0.9801 \times 0.0015 + 0.02 \times 0.9985$$

$$= 0.00147015 + 0.01997 = 0.02144015$$

Apply Bayes:

$$P(H = 1 \mid D_1 = 1, D_2 = 1) = \frac{0.9801 \times 0.0015}{0.02144015} = \frac{0.00147015}{0.02144015} \approx 0.0686 = 6.86%$$

Result: P(H = 1 | D₁ = 1, D₂ = 1) ≈ 6.8%

Comparison: Independent vs Coupled

Scenario	P(Disease \| Evidence)
One positive test	1.47%
Both positive (COUPLED)	6.8%
Both positive (INDEPENDENT)	~83%

Key takeaway: Conditional dependence significantly reduces the evidential value of the second test!

Limitations of Bayes' Theorem

Prior dependency - Results heavily depend on P(A)
Independence assumption - Often assumes conditional independence
Base rate neglect - People often ignore P(B) in denominator
Requires accurate likelihoods - Need accurate P(B|A) from data
Doesn't capture complex dependencies - Real-world relationships can be more complex

Illustrative Example

By Tú Nguyễn Thị Cẩm

Bayes' Theorem (2.6.3)

Step 1: Understand Conditional Probability

Definition

Formula by definition:

Intuitive Meaning

Illustrative Example

Step 2: Reverse Direction

Step 3: Find P(A ∩ B)

Step 4: Derive Bayes' Theorem

Meaning of Bayes' Theorem

Concrete Example: Email Spam Filter

Exercise 7: Problem Setup

Part 1: Joint Probability Table (H = 0)

Part 2: P(H = 1 | D₁ = 1)

Part 3: P(H = 1 | D₁ = 1, D₂ = 1)

Comparison: Independent vs Coupled

Limitations of Bayes' Theorem

Illustrative Example

More from Tú Nguyễn Thị Cẩm