Maximum Entropy (Maxent) Models

Maxent Model - Example

let’s consider a discrete random variable 𝐶 with 2 outcomes: ℎ and 𝑡

  • 𝐏(𝐶=ℎ) = probability of seeing heads
  • 𝐏(𝐶=𝑡) = probability of seeing tails

below is the formula for univariate entropy, in which we want to maximize 𝐻𝐏(𝐏) with respect to the constraints of the model

  • 𝐻𝐏(𝐏) = 𝛴𝑥∊𝐶[ - 𝐏(𝐶=𝑥) 𝑙𝑛 𝐏(𝐶=𝑥) ]

below are 3 different models

Model With No Constraints

Model With 1 Constraint

Model With 2 Constraints

NONE
here 𝐏(𝐶) is allowed to be an un-normalized distributioni.e. 𝐏(𝐶) does not have to be a probability distribution

𝐏(𝐶=ℎ) + 𝐏(𝐶=𝑡) = 1
this constrains 𝐏(𝐶) to be a normalized distribution
i.e. 𝐏(𝐶) is a probability distribution

𝐏(𝐶=ℎ) + 𝐏(𝐶=𝑡) = 1
𝐏(𝐶=ℎ) = 0.3

thus there is a 2D plane of possible candidates

thus there is a 1D line of possible candidates

thus there is a single 1D point as the possible candidate

𝐻𝐏(𝐏) is maximized when:
𝐏(𝐶=ℎ) = 1/𝑒
𝐏(𝐶=𝑡) = 1/𝑒

this is because the max of -𝐏(𝐶=𝑥)𝑙𝑛𝐏(𝐶=𝑥) is 1/𝑒

𝐻𝐏(𝐏) is maximized when:
𝐏(𝐶=ℎ) = 1/2
𝐏(𝐶=𝑡) = 1/2

𝐻𝐏(𝐏) is maximized when:
𝐏(𝐶=ℎ) = 0.3
𝐏(𝐶=𝑡) = 0.7

which is the only candidate point

Why Find Maximum Entropy Model?

maximizing entropy in effect helps us find an estimated distribution model 𝐏ˆ that:

this is what we want in the estimated distribution model 𝐏ˆ

Solution

is to maximize entropy 𝐻, subject to feature-based constraints:

  • 𝐄𝐏[𝑓𝑖] = 𝐄𝐏ˆ[𝑓𝑖] ↔ 𝛴𝑥∊𝑓𝑖𝐏𝑥 = 𝐶𝑖

adding constraints/features:

  • lowers maximum entropy
  • raises the maximum likelihood of data
  • brings the distribution model further from the uniform distribution
  • brings the distribution model closer to the empirical distribution

Maxent - Properties

the Maximum Likelihood Estimation (MLE) exponential model formulation is also convex (dual)

Subpages

Resources