Multinomial/N-nary Logistic Regression (MLR)

One-vs-All Algorithm

Given 𝑦 ∊ {0, 1, … , 𝑐} we divide our problem to 𝑐 binomial logistic regression problems

Each one we predict the probability that 𝑦 is a member of one of our classes

  • 𝑦 ∊ {0, 1, …, 𝑐}
  • 𝒚 - output class values unit vector (e.g. when 𝑐=3 then 𝒚𝑖∊ {[1,0,0]𝑇, [0,1,0]𝑇, [0,0,1]𝑇})
  • 𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥1, …, 𝑥𝑘])
  • 𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃1, …, 𝜃𝑘])

given 𝑛 sample/training data:

  • (𝒚(1), 𝒙(1)) = (𝒚(1), 𝑥1(1), 𝑥2(1), …, 𝑥𝑘(1)) # sample 1
  • (𝒚(2), 𝒙(2)) = (??(2), 𝑥1(2), 𝑥2(2), …, 𝑥𝑘(2)) # sample 2
  • (𝒚(𝑛), 𝒙(𝑛)) = (𝒚(𝑛), 𝑥1(𝑛), 𝑥2(𝑛), …, 𝑥𝑘(𝑛)) # sample 𝑛

𝜽(𝒙) outputs a 𝑐-dimensional vector where each entry is a scalar value between 0 and 1 inclusive (e.g. when 𝑐=3 then [0.99,0.02,0.45]𝑇is a possible output)

Compute each hypothesis:

  • 𝜽(𝒙)[0] = 𝐏(𝑌=0|𝒙;𝜽)
  • 𝜽(𝒙)[1] = 𝐏(𝑌=1|𝒙;𝜽)
  • 𝜽(𝒙)[𝑐] = 𝐏(𝑌=𝑐|𝒙;𝜽)

𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 ℎ𝜽(𝒙)[𝑖]

MLR - Cost Function

MLR - Cost Function With Regularization

MLR - Learning 𝜃s With Gradient Descent

to minimize cost function 𝐽(𝜽), we take its partial derivative with respect to each 𝜃𝑗:

  • TODO

Resources