CS6015: Linear Algebra and Random Processes
Lecture 42: Information Theory, Entropy, Cross Entropy, KL Divergence
A prediction game (with certainty)
Compute the difference between the true distribution and the predicted distribution
We will take a detour and then return back to this goal
Is there any information gain?
Can you relate it to probabilities ?
More surprise = more information gain
low probability = more information gain
Entropy and number of bits
Example: Binomial, Poisson, Normal
CS6015: Linear Algebra and Random Processes Lecture 42: Information Theory, Entropy, Cross Entropy, KL Divergence