letโs say we have the following event space:
NN NNS NNP NNPS VBZ VBD
and the following empirical distribution:
3 5 11 13 3 1
maximize entropyย ๐ป (un-normalized distribution):
1/๐ 1/๐ 1/๐ 1/๐ 1/๐ 1/๐
maximize entropyย ๐ป with respect to normalized probability distribution. letโs add a constraint feature ๐0 = {NN, NNS, NNP, NNPS, VBZ, VBD} with ๐[๐0] = 1:
1/6 1/6 1/6 1/6 1/6 1/6
from the empirical distribution we see that ๐* are more common ๐*. letโs add another constraint feature ๐1ย =ย {NN, NNS, NNP NNPS} with ๐[๐1] = 32/36:
8/36 8/36 8/36 8/36 2/36 2/36
we also see that proper nouns are more frequent than common nouns. letโs add another constraint feature ๐2ย =ย {NNP NNPS} with ๐[๐2] = 2/3:
4/36 4/36 12/36 12/36 2/36 2/36
we could keep refining the model (e.g. by adding a feature to distinguish singular vs plural nouns, or verb types)