Multinomial/N-nary Logistic Regression (MLR)
- generalizes binomial logistic regression whose dependent variable is a multi-class nominal
One-vs-All Algorithm
Given 𝑦 ∊ {0, 1, … , 𝑐} we divide our problem to 𝑐 binomial logistic regression problems
Each one we predict the probability that 𝑦 is a member of one of our classes
- 𝑦 ∊ {0, 1, …, 𝑐}
- 𝒚 - output class values unit vector (e.g. when 𝑐=3 then 𝒚𝑖∊ {[1,0,0]𝑇, [0,1,0]𝑇, [0,0,1]𝑇})
- 𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥1, …, 𝑥𝑘])
- 𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃1, …, 𝜃𝑘])
given 𝑛 sample/training data:
- (𝒚(1), 𝒙(1)) = (𝒚(1), 𝑥1(1), 𝑥2(1), …, 𝑥𝑘(1)) # sample 1
- (𝒚(2), 𝒙(2)) = (??(2), 𝑥1(2), 𝑥2(2), …, 𝑥𝑘(2)) # sample 2
- …
- (𝒚(𝑛), 𝒙(𝑛)) = (𝒚(𝑛), 𝑥1(𝑛), 𝑥2(𝑛), …, 𝑥𝑘(𝑛)) # sample 𝑛
ℎ𝜽(𝒙) outputs a 𝑐-dimensional vector where each entry is a scalar value between 0 and 1 inclusive (e.g. when 𝑐=3 then [0.99,0.02,0.45]𝑇is a possible output)
Compute each hypothesis:
- ℎ𝜽(𝒙)[0] = 𝐏(𝑌=0|𝒙;𝜽)
- ℎ𝜽(𝒙)[1] = 𝐏(𝑌=1|𝒙;𝜽)
- …
- ℎ𝜽(𝒙)[𝑐] = 𝐏(𝑌=𝑐|𝒙;𝜽)
𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 ℎ𝜽(𝒙)[𝑖]
MLR - Cost Function
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/categorical-regression-models/logistic-(logit)-regression-model/multinomial/n-nary-logistic-regression-(mlr)/multinomial-logistic-regression.png)
Click here to expand...
the cost function of a neural network would be a generalization of the cost function used for a binomial logistic regression
Binomial Logistic Regression’s Cost Function
- 𝐽(𝜽) = -(1/𝑚)·[𝛴1≤𝑖≤𝑛[(𝒚(𝑖))·𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))) + (1-𝒚(𝑖))·𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖)))]]
Multinomial Logistic Regression’s Cost Function
- 𝐽(𝜽) = -(1/𝑚)·[𝛴1≤𝑖≤𝑛[𝛴1≤𝑗≤𝑐[(𝒚(𝑖)[𝑗])·𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))[𝑗]) + (1-𝒚(𝑖)[𝑗])·𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖))[𝑗])]]]
where:
- 𝒚[𝑗] - is the 𝑗𝑡ℎ entry of the vector
- ℎ𝜽(𝒙)[𝑗] - is the 𝑗𝑡ℎ entry of the vector
MLR - Cost Function With Regularization
Click here to expand...
Binomial Logistic Regression’s Cost Function with regularization of 𝜃s
- 𝐽(𝜽) = -(1/𝑚)·[𝛴1≤𝑖≤𝑛[(𝒚(𝑖))·𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))) + (1-𝒚(𝑖))·𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖)))]] + (𝜆/2𝑛)·[𝛴1≤𝑖≤𝑘(𝜃𝑖)2]
Multinomial Logistic Regression’s Cost Function with regularization of 𝜃s
- 𝐽(𝜽) = -(1/𝑚)·[𝛴1≤𝑖≤𝑛[𝛴1≤𝑗≤𝑐[(𝒚(𝑖)[𝑗])·𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))[𝑗]) + (1-𝒚(𝑖)[𝑗])·𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖))[𝑗])]]] + (𝜆/2𝑛)·[𝛴1≤𝑖≤𝑘𝛴1≤𝑗≤𝑐(𝜽[𝑖,𝑗])2]
where:
- 𝜽[𝑖,𝑗] - the coefficient 𝜃 connecting (input 𝒙𝑖) to (𝑗𝑡ℎ binomial logistic regression unit)
MLR - Learning 𝜃s With Gradient Descent
to minimize cost function 𝐽(𝜽), we take its partial derivative with respect to each 𝜃𝑗:
- TODO