The Data Saturation Problem

Or How I Learned to Stop Worrying About Entropy and Enjoy the Bomb

The Fabric of Information

What is Entropy

The unpredictability of information contained within an message

Q: What color cloud is first to show up

Quantifying Disorder

Side 1

Side 1

Probability of flipping heads?

Probability of flipping heads and then tails?

Equations 

\sum -(p)log_{2}(p), p = probability
(p)log2(p),p=probability

Entropy and probability are directly connected.

Entropy of a fair coin toss?

Side 1

Side 1

0.5

Information Gain

In a data set, which characteristics give us the most information.

Last Numbers

IGain = E_{parent} - average(E_{children})
IGain=Eparentaverage(Echildren)

Parent

Children

0.918

0.874

0.043

-

=

FSelector in R will do this all for us.

data(iris)
weights <- information.gain(Species~., iris)
print(weights)
subset <- cutoff.k(weights, 2)
f <- as.simple.formula(subset, "Species")
print(f)

deck

By Matthew FancyPants Getch