Information Theory

KL Divergence

Colab Notebook Before seeing KL Divergence, let’s see a very simple concept called Entropy Entropy Entropy is the expected information contained in a Distribution. It measures the uncertainty. $H(x) = \sum{p(x)I(x)}$ where $I(x)$ is called the Information content of $x$. “If an event is very probable, it is no surprise (and generally uninteresting) when that event happens as expected. However, if an event is unlikely to occur, it has much more information to learn that the event happened or will happen.