Bayesian Math

Probability

What is the probability of blindly grabbing a blue marble?

\frac{4}{10} = 0.4
104=0.4
P(blue) = \frac{blue}{all}
P(blue)=allblue

Probability

\frac{4}{10} = 0.4
104=0.4
P(blue) = \frac{blue}{all}
P(blue)=allblue

What is the probability of blindly grabbing a blue marble?

Probability Given Data

What is the probability of grabbing a blue marble given the marble has an 'X' inscribed on it?

X

X

X

X

Z

Z

Z

Z

Z

Z

\frac{(\frac{2}{4} \times \frac{4}{ 10})} {\frac{4}{10}} = 0.5
104(42×104)=0.5
P(blue\ \vert\ x) = \frac{P(blue\ \cap\ x) \times P(x)}{P(blue)}
P(blue  x)=P(blue)P(blue  x)×P(x)

Probability Given Data

What is the probability of grabbing a blue marble given the marble has an 'X' inscribed on it?

X

X

X

X

Z

Z

Z

Z

Z

Z

\frac{(\frac{2}{4} \times \frac{4}{ 10})} {\frac{4}{10}} = 0.5
104(42×104)=0.5
P(blue\ \vert\ x) = \frac{P(blue\ \cap\ x) \times P(x)}{P(blue)}
P(blue  x)=P(blue)P(blue  x)×P(x)

Probability of n-Features

What is the probability of a blue marble given the marble has an 'X' inscribed on it and white border?

X

X

X

X

Z

Z

Z

Z

Z

Z

P(blue\ \vert\ x \cap\ white)
P(blue  x white)

X

X

X

X

Z

Z

Z

Z

Z

Z

\frac{P(blue\ \cap\ x) \times P(blue\ \cap\ white) \times P(blue)} {P(blue\ \cap\ x) \times P(blue\ \cap\ white) \times P(blue)\ +\ P(green\ \cap\ x) \times P(green\ \cap\ white) \times P(green)}
P(blue  x)×P(blue  white)×P(blue) + P(green  x)×P(green  white)×P(green)P(blue  x)×P(blue  white)×P(blue)

*Confused myself at this point.*

Here's Bayes Rule

P(c_1) = \frac {P(c_1)[P(x_1 | c_1)\ \times\ \ldots\ \times\ P(x_p | c_1)]} {P(c_1)[P(x_1 | c_1)\ \times\ \ldots\ \times\ P(x_p | c_1)]\ +\ \ldots\ +\ P(c_n)[P(x_1 | c_n)\ +\ \ldots\ +\ P(x_p | c_n)]}
P(c1)=P(c1)[P(x1c1) ×  × P(xpc1)] +  + P(cn)[P(x1cn) +  + P(xpcn)]P(c1)[P(x1c1) ×  × P(xpc1)]
P(blue\ \vert\ x \cap\ white) =
P(blue  x white)=

So what does this mean?

We need the probability of each class

# of rows with class / # of rows total

 

We need the probability of each feature (word), given each class

# or rows with word & class / # of rows with class (for each class)

 

We need the top half of the equation for all classes summed before we can finish our full calculation

Gotchas to watch out for

If a single feature does not exist for a single class, the probability gets nuked to zero (zero times anything is zero)

 

Not all features are independent, nor are all features relevant

Made with Slides.com