Divergences 𝐷(·,·) 𝐷(·||·)
- is a kind of statistical distance
- measures the “separation/distances” between 2 probability distributions
- is a binary operation that establishes the separation from one probability distribution to another on a statistical manifold
Divergence - Definition
Given a differentiable manifold (𝑀) of dimension 𝑛, a divergence on 𝑀 is a 𝐶2-function 𝐷: 𝑀 × 𝑀 → [0,∞) satisfying:
- 𝐷(𝑝,𝑞) ≥ 0 for all 𝑝,𝑞∈𝑀 (non-negativity)
- 𝐷(𝑝,𝑞) = 0 if and only if 𝑝=𝑞 (positivity)
- At every point, 𝑝∈𝑀, 𝐷(𝑝,𝑝+𝑑𝑝) is a positive-definite quadratic form for infinitesimal displacements 𝑑𝑝 from 𝑝
Statistics and probability generally only require conditions 1 and 2. In information geometry, condition 3 is also required.
As an example, the total variation distance, a commonly used statistical divergence, does not satisfy condition 3.