Linear SVM (SVM Without Kernel)
- capacity limited to only linear models (for non-linear models see Non-Linear SVM)
- like logistic regression in that it is driven by a linear function 𝜽𝑇𝒙
- unlike logistic regression in that it does not provide probabilities, only outputs a class identity
Linear SVM - Representation
same as binomial logistic regression
- 𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥0, …, 𝑥𝑘]) # 𝑥0is the bias
- 𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃0, …, 𝜃𝑘])
- 𝑦 - binary output value
Linear SVM - Cost Function With Regularization
here is the binomial logistic regression’s regularized cost function:
- 𝐽(𝜽) = -(1/𝑛)·[ 𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))) + (1-𝑦(𝑖))·𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖)))] + (𝜆/2)·[𝛴1≤𝑗≤𝑘(𝜃𝑗)2] ]
let’s represent it differently:
- 𝐽(𝜽) = -(1/𝑛)·[ 𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))) + (1-𝑦(𝑖))·𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖)))] + (𝜆/2)·[𝛴1≤𝑗≤𝑘(𝜃𝑗)2] ]
- 𝐽(𝜽) = (1/𝑛)·[ 𝛴1≤𝑖≤𝑛[𝑦(𝑖)·-𝑙𝑜𝑔(ℎ𝜽(𝒙(𝑖))) + (1-𝑦(𝑖))·-𝑙𝑜𝑔(1-ℎ𝜽(𝒙(𝑖)))] + (𝜆/2)·[𝛴1≤𝑗≤𝑘(𝜃𝑗)2] ]
- 𝐽(𝜽) = (1/𝑛)·[ 𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒙(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒙(𝑖))] + (𝜆/2)·[𝛴1≤𝑗≤𝑘(𝜃𝑗)2] ]
- 𝐽(𝜽) = [ 𝐶·𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒙(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒙(𝑖))] + (1/2)·[𝛴1≤𝑗≤𝑘(𝜃𝑗)2] ] # 𝐶 = (1/𝜆) and remove constant (1/𝑛)
- 𝐽(𝜽) = [ 𝐶·𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒙(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒙(𝑖))] + (1/2)·(𝜽𝑇𝜽) ]
where:
- 𝐶 - regularization parameter
- large 𝐶: high variance & low bias # when VERY large it behaves like Hard-Margin Classifier
- small 𝐶: low variance & high bias # when small it behaves like Soft-Margin Classifier
Linear SVM - Learning 𝜽s
goal: optimize values of 𝜽 wrt cost function 𝐽(𝜽)
Linear SVM - Hypothesis
given 𝒙 and the optimized values of 𝜽, the assigned output value is defined as (i.e. hypothesis):
- ℎ𝜽(𝒙) = 1, if 𝜽𝑇𝒙 ≥ 0
- ℎ𝜽(𝒙) = 0, otherwise