letโ€™s say we have the following event space:

  • NN NNS NNP NNPS VBZ VBD

and the following empirical distribution:

  • 3 5 11 13 3 1

maximize entropyย ๐ป (un-normalized distribution):

  • 1/๐‘’ 1/๐‘’ 1/๐‘’ 1/๐‘’ 1/๐‘’ 1/๐‘’

maximize entropyย ๐ป with respect to normalized probability distribution. letโ€™s add a constraint feature ๐‘“0 = {NN, NNS, NNP, NNPS, VBZ, VBD} with ๐„[๐‘“0] = 1:

  • 1/6 1/6 1/6 1/6 1/6 1/6

from the empirical distribution we see that ๐‘* are more common ๐‘‰*. letโ€™s add another constraint feature ๐‘“1ย =ย {NN, NNS, NNP NNPS} with ๐„[๐‘“1] = 32/36:

  • 8/36 8/36 8/36 8/36 2/36 2/36

we also see that proper nouns are more frequent than common nouns. letโ€™s add another constraint feature ๐‘“2ย =ย {NNP NNPS} with ๐„[๐‘“2] = 2/3:

  • 4/36 4/36 12/36 12/36 2/36 2/36

we could keep refining the model (e.g. by adding a feature to distinguish singular vs plural nouns, or verb types)