Bayes' Theorem (2.6.3)

Step 1: Understand Conditional Probability

Definition

P(A | B) is the probability of A happening given that B has already happened.

Formula by definition:

$$P(A \mid B) = \frac{P(A \cap B)}{P(B)}$$

Intuitive Meaning

  • P(A ∩ B): probability that both A and B happen together
  • P(B): probability that B happens
  • When we know B has occurred, the sample space shrinks to only B
  • So we divide by P(B) to "normalize" to this new smaller space
  • Suppose there are 100 students: 30 female, 70 male
  • Among 30 females: 20 like math

$$P(\text{likes math} \mid \text{female}) = \frac{20}{30} = \frac{P(\text{female and likes math})}{P(\text{female})}$$

Illustrative Example

  • Suppose there are 100 students: 30 female, 70 male
  • Among 30 females: 20 like math

$$P(\text{likes math} \mid \text{female}) = \frac{20}{30} = \frac{P(\text{female and likes math})}{P(\text{female})}$$

Step 2: Reverse Direction

Similarly, we can write the probability of B given A:

$$P(B \mid A) = \frac{P(A \cap B)}{P(A)}$$

Explanation:

  • This is also the definition of conditional probability, just swapping roles of A and B
  • P(B | A) = probability of B happening given that A has already happened

Step 3: Find P(A ∩ B)

From Step 2, multiply both sides by P(A):

$$P(B \mid A) \cdot P(A) = P(A \cap B)$$

Rewritten as:

$$P(A \cap B) = P(B \mid A) \cdot P(A)$$

This is called the Chain Rule (or Product Rule) in probability.

Step 4: Derive Bayes' Theorem

From Step 1: P(A | B) = P(A ∩ B) / P(B)

Substitute P(A ∩ B) = P(B | A) · P(A) from Step 3:

$$\boxed{P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}}$$

Meaning of Bayes' Theorem

Component Name Meaning
P(A | B) Posterior Probability of A after observing B
P(A) Prior Initial probability of A
P(B | A) Likelihood How likely we observe B if A is true
P(B) Evidence Total probability of observing B

Concrete Example: Email Spam Filter

Problem: You receive an email containing the word "FREE". What's the probability it's spam?

Given:

  • P(Spam) = 0.30 (30% of all emails are spam)
  • P(Not Spam) = 0.70
  • P("FREE" | Spam) = 0.80 (80% of spam emails contain "FREE")
  • P("FREE" | Not Spam) = 0.10 (10% of legitimate emails contain "FREE")

Find: P(Spam | "FREE") = ?

Solution using Bayes:

Step 1: Calculate P("FREE") using Law of Total Probability

$$P(\text{"FREE"}) = P(\text{"FREE"} \mid \text{Spam}) \cdot P(\text{Spam}) + P(\text{"FREE"} \mid \text{Not Spam}) \cdot P(\text{Not Spam})$$

$$P(\text{"FREE"}) = 0.80 \times 0.30 + 0.10 \times 0.70 = 0.24 + 0.07 = 0.31$$

Step 2: Apply Bayes' Theorem

$$P(\text{Spam} \mid \text{"FREE"}) = \frac{P(\text{"FREE"} \mid \text{Spam}) \cdot P(\text{Spam})}{P(\text{"FREE"})}$$

$$P(\text{Spam} \mid \text{"FREE"}) = \frac{0.80 \times 0.30}{0.31} = \frac{0.24}{0.31} \approx 0.774 = 77.4%$$

Conclusion: If an email contains "FREE", there's a 77.4% chance it's spam!

Exercise 7: Problem Setup

Problem (D2L 2.6.5): Assume the two tests are not independent.

Given:

  • P(D = 1 | H = 0) = 0.10 (false positive = 10%)
  • P(D = 0 | H = 1) = 0.01 (false negative = 1%)
  • Sensitivity: P(D = 1 | H = 1) = 0.99 (99%)
  • For infected (H = 1): tests are conditionally independent
  • For healthy (H = 0): tests are coupled with P(D₁ = D₂ = 1 | H = 0) = 0.02
  • Baseline: P(H = 1) = 0.0015

Part 1: Joint Probability Table (H = 0)

Step 1: Marginal probabilities for D₁ given H = 0:

  • P(D₁ = 1 | H = 0) = 0.10
  • P(D₁ = 0 | H = 0) = 0.90

Step 2: We know P(D₁ = 1, D₂ = 1 | H = 0) = 0.02 (given)

Step 3: Find remaining joint probabilities:

P(D₁ = 1, D₂ = 0 | H = 0) = P(D₁ = 1 | H = 0) - P(D₁ = 1, D₂ = 1 | H = 0) = 0.10 - 0.02 = 0.08

P(D₂ = 1 | H = 0) = 0.10 (by symmetry, same false positive rate)

P(D₁ = 0, D₂ = 1 | H = 0) = P(D₂ = 1 | H = 0) - P(D₁ = 1, D₂ = 1 | H = 0) = 0.10 - 0.02 = 0.08

P(D₁ = 0, D₂ = 0 | H = 0) = 1 - 0.02 - 0.08 - 0.08 = 0.82

Joint Probability Table:

D₂ = 0 D₂ = 1 Marginal
D₁ = 0 0.82 0.08 0.90
D₁ = 1 0.08 0.02 0.10
Marginal 0.90 0.10 1.00

Part 2: P(H = 1 | D₁ = 1)

Using Bayes' Theorem:

$$P(H = 1 \mid D_1 = 1) = \frac{P(D_1 = 1 \mid H = 1) \cdot P(H = 1)}{P(D_1 = 1)}$$

Calculate P(D₁ = 1):

$$P(D_1 = 1) = P(D_1 = 1 \mid H = 1) \cdot P(H = 1) + P(D_1 = 1 \mid H = 0) \cdot P(H = 0)$$

$$P(D_1 = 1) = 0.99 \times 0.0015 + 0.10 \times 0.9985 = 0.001485 + 0.09985 = 0.101335$$

Apply Bayes:

$$P(H = 1 \mid D_1 = 1) = \frac{0.99 \times 0.0015}{0.101335} = \frac{0.001485}{0.101335} \approx 0.0147 = 1.47%$$

Result: P(H = 1 | D₁ = 1) ≈ 1.47%

Part 3: P(H = 1 | D₁ = 1, D₂ = 1)

For H = 1 (infected, tests independent):

$$P(D_1 = 1, D_2 = 1 \mid H = 1) = P(D_1 = 1 \mid H = 1) \times P(D_2 = 1 \mid H = 1) = 0.99 \times 0.99 = 0.9801$$

For H = 0 (healthy, from table):

$$P(D_1 = 1, D_2 = 1 \mid H = 0) = 0.02$$

Calculate P(D₁ = 1, D₂ = 1):

$$P(D_1 = 1, D_2 = 1) = 0.9801 \times 0.0015 + 0.02 \times 0.9985$$

$$= 0.00147015 + 0.01997 = 0.02144015$$

Apply Bayes:

$$P(H = 1 \mid D_1 = 1, D_2 = 1) = \frac{0.9801 \times 0.0015}{0.02144015} = \frac{0.00147015}{0.02144015} \approx 0.0686 = 6.86%$$

Result: P(H = 1 | D₁ = 1, D₂ = 1) ≈ 6.8%

Comparison: Independent vs Coupled

Scenario P(Disease | Evidence)
One positive test 1.47%
Both positive (COUPLED) 6.8%
Both positive (INDEPENDENT) ~83%

Key takeaway: Conditional dependence significantly reduces the evidential value of the second test!

Limitations of Bayes' Theorem

  1. Prior dependency - Results heavily depend on P(A)
  2. Independence assumption - Often assumes conditional independence
  3. Base rate neglect - People often ignore P(B) in denominator
  4. Requires accurate likelihoods - Need accurate P(B|A) from data
  5. Doesn't capture complex dependencies - Real-world relationships can be more complex

Illustrative Example

By Tú Nguyễn Thị Cẩm