Text Classification - Boolean Multinomial Naive Bayes

calculate 𝐏(𝐶=𝑐_𝑗) prior probablities:

for each 𝑐ⱼ in 𝐶:
	docsⱼ = all docs with class 𝑐ⱼ
	𝐏(𝐶=𝑐ⱼ) = | docsⱼ| / |total # documents|

calculate 𝐏(𝑤ᵢ|𝑐ⱼ) likelihoods:

in each doc remove duplicates of each word type (i.e. retain only a single instance of a word)
corpus = single doc containing all docs
𝑛 = size of corpus

for each word 𝑤ᵢ in vocabulary:
	𝑛ᵢ = # of occurence of 𝑤ᵢ in corpus
	𝐏(𝑤ᵢ|𝑐ⱼ) = (𝑛ᵢ + 𝛼) / (𝑛 + 𝛼|vocabulary|)

on testing document 𝑑 = [𝑤₁, …, 𝑤ᵣ]:

Indent

𝑎𝑟𝑔𝑚𝑎𝑥_{𝑐ⱼ∊𝐶} [ 𝐏(𝑐ⱼ) · 𝛱_{𝑖∊𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠}[𝐏(𝑤ᵢ|𝑐ⱼ)] ]

Normal Naive Bayes	Boolean Multinomial Naive Bayes

／var／log marcus chiu