capacity limited to only linear models (for non-linear models see Non-Linear SVM)
like logistic regression in that it is driven by a linear function 𝜽^𝑇𝒙
unlike logistic regression in that it does not provide probabilities, only outputs a class identity

Linear SVM - Representation

Linear SVM - Cost Function With Regularization

here is the binomial logistic regression’s regularized cost function:

𝐽(𝜽) = -(1/𝑛)·[ 𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))) + (1-𝑦^(𝑖))·𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖)))] + (𝜆/2)·[𝛴_{1≤𝑗≤𝑘}(𝜃_𝑗)²] ]

let’s represent it differently:

𝐽(𝜽) = -(1/𝑛)·[ 𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))) + (1-𝑦^(𝑖))·𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖)))] + (𝜆/2)·[𝛴_{1≤𝑗≤𝑘}(𝜃_𝑗)²] ]
𝐽(𝜽) = (1/𝑛)·[ 𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·-𝑙𝑜𝑔(ℎ_𝜽(𝒙^(𝑖))) + (1-𝑦^(𝑖))·-𝑙𝑜𝑔(1-ℎ_𝜽(𝒙^(𝑖)))] + (𝜆/2)·[𝛴_{1≤𝑗≤𝑘}(𝜃_𝑗)²] ]
𝐽(𝜽) = (1/𝑛)·[ 𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑐𝑜𝑠𝑡₁(𝜽^𝑇𝒙^(𝑖)) + (1-𝑦^(𝑖))·𝑐𝑜𝑠𝑡₀(𝜽^𝑇𝒙^(𝑖))] + (𝜆/2)·[𝛴_{1≤𝑗≤𝑘}(𝜃_𝑗)²] ]
𝐽(𝜽) = [ 𝐶·𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑐𝑜𝑠𝑡₁(𝜽^𝑇𝒙^(𝑖)) + (1-𝑦^(𝑖))·𝑐𝑜𝑠𝑡₀(𝜽^𝑇𝒙^(𝑖))] + (1/2)·[𝛴_{1≤𝑗≤𝑘}(𝜃_𝑗)²] ] # 𝐶 = (1/𝜆) and remove constant (1/𝑛)
𝐽(𝜽) = [ 𝐶·𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑐𝑜𝑠𝑡₁(𝜽^𝑇𝒙^(𝑖)) + (1-𝑦^(𝑖))·𝑐𝑜𝑠𝑡₀(𝜽^𝑇𝒙^(𝑖))] + (1/2)·(𝜽^𝑇𝜽) ]

where:

𝐶 - regularization parameter
- large 𝐶: high variance & low bias # when VERY large it behaves like Hard-Margin Classifier
- small 𝐶: low variance & high bias # when small it behaves like Soft-Margin Classifier

goal: optimize values of 𝜽 wrt cost function 𝐽(𝜽)

given 𝒙 and the optimized values of 𝜽, the assigned output value is defined as (i.e. hypothesis):