given a sentence of words {𝑀1, …, 𝑀𝑛} we want to assign PoS Tags to each word {𝑑1, …, 𝑑𝑛} such that the probability 𝐏(𝑑1, …, 𝑑𝑛|𝑀1, …, 𝑀𝑛) is the highest:

  • 𝑑̂1, …, 𝑑̂𝑛= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯𝑑1, …, 𝑑𝑛[𝐏(𝑑1, …, 𝑑𝑛|𝑀1, …, 𝑀𝑛)]
  • 𝑑̂1, …, 𝑑̂𝑛= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯𝑑1, …, 𝑑𝑛[𝐏(𝑀1, …, 𝑀𝑛|𝑑1, …, 𝑑𝑛)𝐏(𝑑1, …, 𝑑𝑛)/𝐏(𝑀1, …, 𝑀𝑛)] # viaΒ Bayes Rule
  • 𝑑̂1, …, 𝑑̂𝑛= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯𝑑1, …, 𝑑𝑛[𝐏(𝑀1, …, 𝑀𝑛|𝑑1, …, 𝑑𝑛)𝐏(𝑑1, …, 𝑑𝑛)] # 𝐏(𝑀1, …, 𝑀𝑛) is a constant w.r.t. the argmax values
  • 𝑑̂1, …, 𝑑̂𝑛= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯𝑑1, …, 𝑑𝑛[𝛱1≀𝑖≀𝑛𝐏(𝑀𝑖|𝑑𝑖) · 𝐏(𝑑1)Β·[𝛱2≀𝑖≀𝑛𝐏(𝑑𝑖|𝑑𝑖-1)]]
    • 𝐏(𝑀1, …, 𝑀𝑛|𝑑1, …, 𝑑𝑛) β‰ˆΒ π›±1≀𝑖≀𝑛𝐏(𝑀𝑖|𝑑𝑖)Β # 𝑀𝑖is conditionally independent from all else when given 𝑑𝑖
    • 𝐏(𝑑1, …, 𝑑𝑛) β‰ˆ 𝐏(𝑑1)Β·[𝛱2≀𝑖≀𝑛𝐏(𝑑𝑖|𝑑𝑖-1)]Β # 𝑑𝑖 is conditionally independent from all else when given 𝑑𝑖-1
  • 𝑑̂1, …, 𝑑̂𝑛= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯𝑑1, …, 𝑑𝑛[𝐏(𝑑1)𝐏(𝑀1|𝑑1)Β Β·Β [𝛱2≀𝑖≀𝑛𝐏(𝑑𝑖|𝑑𝑖-1)𝐏(𝑀𝑖|𝑑𝑖)]]

with respect toΒ Hidden Markov Models (HMM):

  • 𝐏(𝑑𝑖|𝑑𝑖-1)Β - are transition probabilities (in our case tag transition probabilities)
  • 𝐏(𝑀𝑖|𝑑𝑖)Β - are emission probabilities (in our case word emission probabilities)

Learning Transition & Emission Probabilities From Training Corpus

see: Learning/Training Section ofΒ Hidden Markov Models (HMM)