Probability Rule

conditional independence states that 𝐏(𝐴𝐵|𝑌) = 𝐏(𝐴|𝐵𝑌)𝐏(𝐵|𝑌) = 𝐏(𝐴|𝑌)𝐏(𝐵|𝑌)

Therefore given input values 𝑋 we calculate 𝐏(𝑌=𝑦|𝑋=𝑥) for each 𝑦 in 𝑌, and class 𝑦 with highest probability is “assigned” to 𝑋

Learning From Training Set

when input feature 𝑋_𝑖is a discrete variable (either a bernoulli or multinoulli)

𝐏(𝑋_𝑖=𝑥_𝑖|𝑌=𝑦) = 𝑐𝑜𝑢𝑛𝑡(𝑋_𝑖=𝑥_𝑖,𝑌=𝑦) / 𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦)
𝐏(𝑋_𝑖=𝑥_𝑖|𝑌=𝑦) estimate may equal 0, we should smooth it:
𝐏(𝑋_𝑖=𝑥_𝑖|𝑌=𝑦) = [𝑐𝑜𝑢𝑛𝑡(𝑋_𝑖=𝑥_𝑖,𝑌=𝑦) + 𝑙] / [𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦) + (𝑙 * 𝑛𝑢𝑚-𝑐𝑙𝑎𝑠𝑠𝑒𝑠-𝑜𝑓-𝑋_𝑖)]

when input feature 𝑋_𝑖is a continuous variable having a gaussian distribution

𝐏(𝑋_𝑖=𝑥_𝑖|𝑌=𝑦) = 1/[𝜎√(2𝜋)] * 𝑒^{-(𝑥-𝜇)²/(2*𝜎²)}
- 𝜇 = 1/𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦) * 𝛴 𝑥_𝑖 for each training example where its class 𝑌 = 𝑦
- 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 1/[𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦) - 1] * 𝛴(𝑥_𝑖 - 𝜇)² for each training example where its class 𝑌 = 𝑦
- 𝜎 = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒