see: Univariate Entropy (Information Content - Entropy - Cross Entropy - KL Divergence)

Entropy 𝐻(𝑋) = 𝐻𝑃(𝑋) = - 𝛴𝑥∊𝑋[ 𝑃(𝑋=𝑥) 𝑙𝑔 𝑃(𝑋=𝑥) ]

Joint Entropy 𝐻(𝑋,𝑌) = 𝐻𝑃(𝑋,𝑌)

Conditional Entropy 𝐻(𝑋|𝑌) = 𝐻𝑃(𝑋|𝑌)

Mutual Information or Information Gain 𝐼(𝑋,𝑌) - information shared between the variables

Pointwise Mutual Information 𝐼𝑃(𝑋=𝑥,𝑌=𝑦) - measures how much more do events 𝑥 and 𝑦 co-occur than if they were independent

Self Information - mutual information with itself 𝐼𝑃(𝑋,𝑋)

Variation of Information 𝑉𝑃(𝑋,𝑌) - gives us a metric, a notion of distance, between different variables. The variation of information between two variables is zero if knowing the value of one tells you the value of the other and increases as they become more independent

Relationship between Entropy, Joint Entropy, & Conditional Entropy
KL-Divergence vs Variation of Information
Bringing it All Together

Resources