• π‘Œ - set of possible classes

  • 𝑋 - vector of input attributes π‘‹π‘–β€˜s

  • 𝑋𝑖 - is an input attribute of 𝑋 at index 𝑖 (either discrete and/or continuous)

  • 𝑦 - class value

  • π‘₯ - vector of input attribute values π‘₯π‘–β€˜s

  • π‘₯𝑖 - is an input attribute value of π‘₯ at index 𝑖

Probability Rule

  • 𝐏(π‘Œ=𝑦|𝑋=π‘₯) = 𝐏(π‘Œ=𝑦)𝐏(𝑋=π‘₯|π‘Œ=𝑦)Β / 𝐏(𝑋=π‘₯)
  • 𝐏(π‘Œ=𝑦|𝑋=π‘₯) ∝ 𝐏(π‘Œ=𝑦)𝐏(𝑋=π‘₯|π‘Œ=𝑦)

conditional independence states that 𝐏(𝐴𝐡|π‘Œ) = 𝐏(𝐴|π΅π‘Œ)𝐏(𝐡|π‘Œ) = 𝐏(𝐴|π‘Œ)𝐏(𝐡|π‘Œ)

  • 𝐏(π‘Œ=𝑦|𝑋=π‘₯) ∝ 𝐏(π‘Œ=𝑦) 𝛱π‘₯π‘–βˆŠπ‘‹π(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)

Therefore given input values 𝑋 we calculate 𝐏(π‘Œ=𝑦|𝑋=π‘₯)Β for each 𝑦 in π‘Œ, and class 𝑦 with highest probability is β€œassigned” to 𝑋

  • π‘Œ β†Β π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯𝑦 [ 𝐏(π‘Œ=𝑦) 𝛱π‘₯π‘–βˆŠπ‘‹π(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦) ]

Learning From Training Set

  • estimate 𝐏(π‘Œ=𝑦)Β for each possible 𝑦
  • estimate 𝐏(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)Β for each possible 𝑦
Estimating 𝐏(π‘Œ=𝑦)
  • 𝐏(π‘Œ=𝑦)Β = π‘π‘œπ‘’π‘›π‘‘(π‘Œ=𝑦) / total-training-examples
  • 𝐏(π‘Œ=𝑦)Β may equal 0, we should smooth it:
  • 𝐏(π‘Œ=𝑦)Β = [π‘π‘œπ‘’π‘›π‘‘(π‘Œ=𝑦) + 𝑙] / [total-training-examples + (𝑙 * π‘›π‘’π‘š-π‘π‘™π‘Žπ‘ π‘ π‘’π‘ -π‘œπ‘“-π‘Œ)]
Estimating 𝐏(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)

when input feature 𝑋𝑖is a discrete variable (either aΒ bernoulli or multinoulli)

  • 𝐏(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)Β = π‘π‘œπ‘’π‘›π‘‘(𝑋𝑖=π‘₯𝑖,π‘Œ=𝑦) / π‘π‘œπ‘’π‘›π‘‘(π‘Œ=𝑦)
  • 𝐏(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)Β estimate may equal 0, we should smooth it:
  • 𝐏(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)Β = [π‘π‘œπ‘’π‘›π‘‘(𝑋𝑖=π‘₯𝑖,π‘Œ=𝑦)Β + 𝑙] / [π‘π‘œπ‘’π‘›π‘‘(π‘Œ=𝑦) + (𝑙 * π‘›π‘’π‘š-π‘π‘™π‘Žπ‘ π‘ π‘’π‘ -π‘œπ‘“-𝑋𝑖)]

when input feature 𝑋𝑖is a continuous variable having a gaussian distribution

  • 𝐏(𝑋𝑖=π‘₯𝑖|π‘Œ=𝑦)Β = 1/[𝜎√(2πœ‹)] * 𝑒-(π‘₯-πœ‡)Β²/(2*𝜎²)
    • πœ‡Β = 1/π‘π‘œπ‘’π‘›π‘‘(π‘Œ=𝑦) * 𝛴 π‘₯𝑖 for each training example where its class π‘Œ = 𝑦
    • π‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’ = 1/[π‘π‘œπ‘’π‘›π‘‘(π‘Œ=𝑦) - 1] * 𝛴(π‘₯𝑖 - πœ‡)Β² for each training example where its classΒ π‘ŒΒ = 𝑦
    • 𝜎 = βˆšπ‘£π‘Žπ‘Ÿπ‘–π‘Žπ‘›π‘π‘’