Maximum Entropy (Maxent) Models
- applying Optimization where the objective function is the entropy formula 𝐻𝐏(𝐏) and subjected with any additional constraints
Maxent Model - Example
let’s consider a discrete random variable 𝐶 with 2 outcomes: ℎ and 𝑡
- 𝐏(𝐶=ℎ) = probability of seeing heads
- 𝐏(𝐶=𝑡) = probability of seeing tails
below is the formula for univariate entropy, in which we want to maximize 𝐻𝐏(𝐏) with respect to the constraints of the model
- 𝐻𝐏(𝐏) = 𝛴𝑥∊𝐶[ - 𝐏(𝐶=𝑥) 𝑙𝑛 𝐏(𝐶=𝑥) ]
below are 3 different models
|
Model With No Constraints |
Model With 1 Constraint |
Model With 2 Constraints |
|---|---|---|
|
NONE |
𝐏(𝐶=ℎ) + 𝐏(𝐶=𝑡) = 1 |
𝐏(𝐶=ℎ) + 𝐏(𝐶=𝑡) = 1 |
|
thus there is a 2D plane of possible candidates |
thus there is a 1D line of possible candidates |
thus there is a single 1D point as the possible candidate |
|
𝐻𝐏(𝐏) is maximized when: this is because the max of -𝐏(𝐶=𝑥)𝑙𝑛𝐏(𝐶=𝑥) is 1/𝑒 |
𝐻𝐏(𝐏) is maximized when: |
𝐻𝐏(𝐏) is maximized when: which is the only candidate point |
|
|
|
|
Why Find Maximum Entropy Model?
maximizing entropy in effect helps us find an estimated distribution model 𝐏ˆ that:
- minimizes commitment (which is another way of saying maximizes entropy)
- resembles some reference to the true population distribution (actually empirical distribution)
this is what we want in the estimated distribution model 𝐏ˆ
Solution
is to maximize entropy 𝐻, subject to feature-based constraints:
- 𝐄𝐏[𝑓𝑖] = 𝐄𝐏ˆ[𝑓𝑖] ↔ 𝛴𝑥∊𝑓𝑖𝐏𝑥 = 𝐶𝑖
adding constraints/features:
- lowers maximum entropy
- raises the maximum likelihood of data
- brings the distribution model further from the uniform distribution
- brings the distribution model closer to the empirical distribution
Maxent - Properties
maximum entropy models are convex
a model 𝐹 is convex when:
- 𝐹(𝛴𝑖𝑤𝑖𝑥𝑖) ≥ 𝛴𝑖𝑤𝑖𝐹(𝑥𝑖) where 𝛴𝑖𝑤𝑖 = 1
convexity guarantees a single, global maximum because any higher points are greedily reachable
maximum entropy models 𝐻𝐏(𝐏) = 𝛴𝑥∊𝐶[ - 𝐏(𝐶=𝑥) 𝑙𝑛 𝐏(𝐶=𝑥) ] are convex
- 𝐏(𝐶=𝑥) 𝑙𝑛 𝐏(𝐶=𝑥) is convex
- 𝛴𝑥∊𝐶[ - 𝐏(𝐶=𝑥) 𝑙𝑛 𝐏(𝐶=𝑥) ] is convex (sum of convex functions is convex)
- the feasible-region of constrained 𝐻𝐏(𝐏) is a linear subspace that is convex
- the constrained entropy surface is therefore convex
the Maximum Likelihood Estimation (MLE) exponential model formulation is also convex (dual)
/maximum-entropy-(maxent)-models/maximum-entropy-maxent-inuition-no-constraints.png)
/maximum-entropy-(maxent)-models/maximum-entropy-maxent-inuition-1-constraint-as-probability-distribution.png)
/maximum-entropy-(maxent)-models/maximum-entropy-maxent-inuition-2-constraints.png)