Bayes' Theorem (2.6.3)
Step 1: Understand Conditional Probability
Definition
P(A | B) is the probability of A happening given that B has already happened.
Formula by definition:
$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$$
Intuitive Meaning
- P(A ∩ B): probability that both A and B happen together
- P(B): probability that B happens
- When we know B has occurred, the sample space shrinks to only B
- So we divide by P(B) to "normalize" to this new smaller space
- Suppose there are 100 students: 30 female, 70 male
- Among 30 females: 20 like math
$$P(\text{likes math} \mid \text{female}) = \frac{20}{30} = \frac{P(\text{female and likes math})}{P(\text{female})}$$
Illustrative Example
- Suppose there are 100 students: 30 female, 70 male
- Among 30 females: 20 like math
$$P(\text{likes math} \mid \text{female}) = \frac{20}{30} = \frac{P(\text{female and likes math})}{P(\text{female})}$$
Step 2: Reverse Direction
Similarly, we can write the probability of B given A:
$$P(B \mid A) = \frac{P(A \cap B)}{P(A)}$$
Explanation:
- This is also the definition of conditional probability, just swapping roles of A and B
- P(B | A) = probability of B happening given that A has already happened
Step 3: Find P(A ∩ B)
From Step 2, multiply both sides by P(A):
$$P(B \mid A) \cdot P(A) = P(A \cap B)$$
Rewritten as:
$$P(A \cap B) = P(B \mid A) \cdot P(A)$$
This is called the Chain Rule (or Product Rule) in probability.
Step 4: Derive Bayes' Theorem
From Step 1: P(A | B) = P(A ∩ B) / P(B)
Substitute P(A ∩ B) = P(B | A) · P(A) from Step 3:
$$\boxed{P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}}$$
Meaning of Bayes' Theorem
| Component | Name | Meaning |
|---|---|---|
| P(A | B) | Posterior | Probability of A after observing B |
| P(A) | Prior | Initial probability of A |
| P(B | A) | Likelihood | How likely we observe B if A is true |
| P(B) | Evidence | Total probability of observing B |
Concrete Example: Email Spam Filter
Problem: You receive an email containing the word "FREE". What's the probability it's spam?
Given:
- P(Spam) = 0.30 (30% of all emails are spam)
- P(Not Spam) = 0.70
- P("FREE" | Spam) = 0.80 (80% of spam emails contain "FREE")
- P("FREE" | Not Spam) = 0.10 (10% of legitimate emails contain "FREE")
Find: P(Spam | "FREE") = ?
Solution using Bayes:
Step 1: Calculate P("FREE") using Law of Total Probability
$$P(\text{"FREE"}) = P(\text{"FREE"} \mid \text{Spam}) \cdot P(\text{Spam}) + P(\text{"FREE"} \mid \text{Not Spam}) \cdot P(\text{Not Spam})$$
$$P(\text{"FREE"}) = 0.80 \times 0.30 + 0.10 \times 0.70 = 0.24 + 0.07 = 0.31$$
Step 2: Apply Bayes' Theorem
$$P(\text{Spam} \mid \text{"FREE"}) = \frac{P(\text{"FREE"} \mid \text{Spam}) \cdot P(\text{Spam})}{P(\text{"FREE"})}$$
$$P(\text{Spam} \mid \text{"FREE"}) = \frac{0.80 \times 0.30}{0.31} = \frac{0.24}{0.31} \approx 0.774 = 77.4%$$
Conclusion: If an email contains "FREE", there's a 77.4% chance it's spam!
Exercise 7: Problem Setup
Problem (D2L 2.6.5): Assume the two tests are not independent.
Given:
- P(D = 1 | H = 0) = 0.10 (false positive = 10%)
- P(D = 0 | H = 1) = 0.01 (false negative = 1%)
- Sensitivity: P(D = 1 | H = 1) = 0.99 (99%)
- For infected (H = 1): tests are conditionally independent
- For healthy (H = 0): tests are coupled with P(D₁ = D₂ = 1 | H = 0) = 0.02
- Baseline: P(H = 1) = 0.0015
Part 1: Joint Probability Table (H = 0)
Step 1: Marginal probabilities for D₁ given H = 0:
- P(D₁ = 1 | H = 0) = 0.10
- P(D₁ = 0 | H = 0) = 0.90
Step 2: We know P(D₁ = 1, D₂ = 1 | H = 0) = 0.02 (given)
Step 3: Find remaining joint probabilities:
P(D₁ = 1, D₂ = 0 | H = 0) = P(D₁ = 1 | H = 0) - P(D₁ = 1, D₂ = 1 | H = 0) = 0.10 - 0.02 = 0.08
P(D₂ = 1 | H = 0) = 0.10 (by symmetry, same false positive rate)
P(D₁ = 0, D₂ = 1 | H = 0) = P(D₂ = 1 | H = 0) - P(D₁ = 1, D₂ = 1 | H = 0) = 0.10 - 0.02 = 0.08
P(D₁ = 0, D₂ = 0 | H = 0) = 1 - 0.02 - 0.08 - 0.08 = 0.82
Joint Probability Table:
| D₂ = 0 | D₂ = 1 | Marginal | |
|---|---|---|---|
| D₁ = 0 | 0.82 | 0.08 | 0.90 |
| D₁ = 1 | 0.08 | 0.02 | 0.10 |
| Marginal | 0.90 | 0.10 | 1.00 |
Part 2: P(H = 1 | D₁ = 1)
Using Bayes' Theorem:
$$P(H = 1 \mid D_1 = 1) = \frac{P(D_1 = 1 \mid H = 1) \cdot P(H = 1)}{P(D_1 = 1)}$$
Calculate P(D₁ = 1):
$$P(D_1 = 1) = P(D_1 = 1 \mid H = 1) \cdot P(H = 1) + P(D_1 = 1 \mid H = 0) \cdot P(H = 0)$$
$$P(D_1 = 1) = 0.99 \times 0.0015 + 0.10 \times 0.9985 = 0.001485 + 0.09985 = 0.101335$$
Apply Bayes:
$$P(H = 1 \mid D_1 = 1) = \frac{0.99 \times 0.0015}{0.101335} = \frac{0.001485}{0.101335} \approx 0.0147 = 1.47%$$
Result: P(H = 1 | D₁ = 1) ≈ 1.47%
Part 3: P(H = 1 | D₁ = 1, D₂ = 1)
For H = 1 (infected, tests independent):
$$P(D_1 = 1, D_2 = 1 \mid H = 1) = P(D_1 = 1 \mid H = 1) \times P(D_2 = 1 \mid H = 1) = 0.99 \times 0.99 = 0.9801$$
For H = 0 (healthy, from table):
$$P(D_1 = 1, D_2 = 1 \mid H = 0) = 0.02$$
Calculate P(D₁ = 1, D₂ = 1):
$$P(D_1 = 1, D_2 = 1) = 0.9801 \times 0.0015 + 0.02 \times 0.9985$$
$$= 0.00147015 + 0.01997 = 0.02144015$$
Apply Bayes:
$$P(H = 1 \mid D_1 = 1, D_2 = 1) = \frac{0.9801 \times 0.0015}{0.02144015} = \frac{0.00147015}{0.02144015} \approx 0.0686 = 6.86%$$
Result: P(H = 1 | D₁ = 1, D₂ = 1) ≈ 6.8%
Comparison: Independent vs Coupled
| Scenario | P(Disease | Evidence) |
|---|---|
| One positive test | 1.47% |
| Both positive (COUPLED) | 6.8% |
| Both positive (INDEPENDENT) | ~83% |
Key takeaway: Conditional dependence significantly reduces the evidential value of the second test!
Limitations of Bayes' Theorem
- Prior dependency - Results heavily depend on P(A)
- Independence assumption - Often assumes conditional independence
- Base rate neglect - People often ignore P(B) in denominator
- Requires accurate likelihoods - Need accurate P(B|A) from data
- Doesn't capture complex dependencies - Real-world relationships can be more complex
Illustrative Example
By Tú Nguyễn Thị Cẩm
Illustrative Example
- 75