Entropy
  • is a measure of how close a system is to equilibrium (higher means more equal)
Information Entropy
  • is a measure of the amount of disorder/stochasticity/noise in the distribution
    • lower entropy implies distribution mass/density is on a few instances
    • larger entropy implies distribution mass/density is more evenly spread out (similar to uniform distribution)
  • is the MINIMAL number of bits needed, on average, to encode the information produced by a:
    • stochastic source of data
    • stochastic/probability distribution
    • random variable

Univariate Entropy (Information Content - Entropy - Cross Entropy - KL Divergence)

Information Content ℎ(𝑋=𝑥) - a measure of information content of an outcome 𝑥. (optimal) minimal number of bits to encode outcome 𝑥

Entropy 𝐻𝑃(𝑃) or 𝐻(𝑃) - is the expected value of the information content of a probability distribution 𝑃. the average length of communicating an event from a distribution 𝑃 with the optimal code for the same distribution 𝑃

Cross Entropy 𝐻𝑄(𝑃) - the average length of communicating an event from one distribution 𝑃 with the optimal code for another distribution 𝑄

Relative Entropy or Kullback-Leibler (KL) Divergence 𝐷𝐾𝐿(𝑃||𝑄) or 𝐷𝑄(𝑃) - measures the “distance” between 2 distributions (see: divergence)

Bringing It All Together

Multivariate Entropy

see:  Information Gain - Variation of Information)

Subpages

Resources