One-vs-All Algorithm

Given 𝑦 ∊ {0, 1, … , 𝑐} we divide our problem to 𝑐 binomial logistic regression problems

Each one we predict the probability that 𝑦 is a member of one of our classes

𝑦 ∊ {0, 1, …, 𝑐}
𝒚 - output class values unit vector (e.g. when 𝑐=3 then 𝒚_𝑖∊ {[1,0,0]^𝑇, [0,1,0]^𝑇, [0,0,1]^𝑇})
𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥₁, …, 𝑥_𝑘])
𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃₁, …, 𝜃_𝑘])

given 𝑛 sample/training data:

(𝒚⁽¹⁾, 𝒙⁽¹⁾) = (𝒚⁽¹⁾, 𝑥₁⁽¹⁾, 𝑥₂⁽¹⁾, …, 𝑥_𝑘⁽¹⁾) # sample 1
(𝒚⁽²⁾, 𝒙⁽²⁾) = (??⁽²⁾, 𝑥₁⁽²⁾, 𝑥₂⁽²⁾, …, 𝑥_𝑘⁽²⁾) # sample 2
…
(𝒚^(𝑛), 𝒙^(𝑛)) = (𝒚^(𝑛), 𝑥₁^(𝑛), 𝑥₂^(𝑛), …, 𝑥_𝑘^(𝑛)) # sample 𝑛

ℎ_𝜽(𝒙) outputs a 𝑐-dimensional vector where each entry is a scalar value between 0 and 1 inclusive (e.g. when 𝑐=3 then [0.99,0.02,0.45]^𝑇is a possible output)

Compute each hypothesis:

ℎ_𝜽(𝒙)[0] = 𝐏(𝑌=0|𝒙;𝜽)
ℎ_𝜽(𝒙)[1] = 𝐏(𝑌=1|𝒙;𝜽)
…
ℎ_𝜽(𝒙)[𝑐] = 𝐏(𝑌=𝑐|𝒙;𝜽)

𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥_𝑖 ℎ_𝜽(𝒙)[𝑖]

MLR - Cost Function

Click here to expand...

the cost function of a neural network would be a generalization of the cost function used for a binomial logistic regression

Binomial Logistic Regression’s Cost Function

𝐽(𝜽) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[(𝒚^(𝑖))·𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))) + (1-𝒚^(𝑖))·𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖)))]]

Multinomial Logistic Regression’s Cost Function

𝐽(𝜽) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[𝛴_{1≤𝑗≤𝑐}[(𝒚^(𝑖)[𝑗])·𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))[𝑗]) + (1-𝒚^(𝑖)[𝑗])·𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖))[𝑗])]]]

where:

𝒚[𝑗] - is the 𝑗^𝑡ℎ entry of the vector

ℎ_𝜽(𝒙)[𝑗] - is the 𝑗^𝑡ℎ entry of the vector

MLR - Cost Function With Regularization

Click here to expand...

Binomial Logistic Regression’s Cost Function with regularization of 𝜃s

𝐽(𝜽) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[(𝒚^(𝑖))·𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))) + (1-𝒚^(𝑖))·𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖)))]] + (𝜆/2𝑛)·[𝛴_{1≤𝑖≤𝑘}(𝜃_𝑖)²]

Multinomial Logistic Regression’s Cost Function with regularization of 𝜃s

𝐽(𝜽) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[𝛴_{1≤𝑗≤𝑐}[(𝒚^(𝑖)[𝑗])·𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))[𝑗]) + (1-𝒚^(𝑖)[𝑗])·𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖))[𝑗])]]] + (𝜆/2𝑛)·[𝛴_{1≤𝑖≤𝑘}𝛴_{1≤𝑗≤𝑐}(𝜽[𝑖,𝑗])²]

where:

𝜽[𝑖,𝑗] - the coefficient 𝜃 connecting (input 𝒙_𝑖) to (𝑗^𝑡ℎ binomial logistic regression unit)

MLR - Learning 𝜃s With Gradient Descent

to minimize cost function 𝐽(𝜽), we take its partial derivative with respect to each 𝜃_𝑗:

TODO

Resources

Andrew Ng’s Video Lecture

／var／log marcus chiu

Explorer

Multinomial／N-nary Logistic Regression (MLR)

Multinomial/N-nary Logistic Regression (MLR)

One-vs-All Algorithm

MLR - Cost Function

Binomial Logistic Regression’s Cost Function

Multinomial Logistic Regression’s Cost Function

MLR - Cost Function With Regularization

Binomial Logistic Regression’s Cost Function with regularization of 𝜃s

Multinomial Logistic Regression’s Cost Function with regularization of 𝜃s

MLR - Learning 𝜃s With Gradient Descent

Resources

／var／logmarcus chiu

Explorer

Multinomial／N-nary Logistic Regression (MLR)

Multinomial/N-nary Logistic Regression (MLR)

One-vs-All Algorithm

MLR - Cost Function

Binomial Logistic Regression’s Cost Function

Multinomial Logistic Regression’s Cost Function

MLR - Cost Function With Regularization

Binomial Logistic Regression’s Cost Function with regularization of 𝜃s

Multinomial Logistic Regression’s Cost Function with regularization of 𝜃s

MLR - Learning 𝜃s With Gradient Descent

Resources

／var／log marcus chiu