Entropy
- is a measure of how close a system is to equilibrium (higher means more equal)
Information Entropy
- is a measure of the amount of disorder/stochasticity/noise in the distribution
- lower entropy implies distribution mass/density is on a few instances
- larger entropy implies distribution mass/density is more evenly spread out (similar to uniform distribution)
- is the MINIMAL number of bits needed, on average, to encode the information produced by a:
- stochastic source of data
- stochastic/probability distribution
- random variable
Univariate Entropy (Information Content - Entropy - Cross Entropy - KL Divergence)
|
Information Content ℎ(𝑋=𝑥) - a measure of information content of an outcome 𝑥. (optimal) minimal number of bits to encode outcome 𝑥
Entropy 𝐻𝑃(𝑃) or 𝐻(𝑃) - is the expected value of the information content of a probability distribution 𝑃. the average length of communicating an event from a distribution 𝑃 with the optimal code for the same distribution 𝑃
Cross Entropy 𝐻𝑄(𝑃) - the average length of communicating an event from one distribution 𝑃 with the optimal code for another distribution 𝑄
Relative Entropy or Kullback-Leibler (KL) Divergence 𝐷𝐾𝐿(𝑃||𝑄) or 𝐷𝑄(𝑃) - measures the “distance” between 2 distributions (see: divergence)
Bringing It All Together
|
Multivariate Entropy
see: Information Gain - Variation of Information)
/crossentropyqp.png)
/crossentropypq.png)