Non-Linear SVM (SVM With Kernels)
- is an extension of Linear SVM that uses kernel functions to expand its model capacity to include complex non-linear models
- a kernel method learning algorithm
- capacity extended to complex non-linear models
- there are various kernels to use (e.g. gaussian kernel, polynomial kernel, etc)
Non-Linear SVM - Representation
- 𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥1, …, 𝑥𝑘])
- 𝑦 - binary output value (0 or 1)
- 𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃0, …, 𝜃𝑛]) # 𝜽 ∊ ℝ𝑛+1while in Linear SVM 𝜽 ∊ ℝ𝑘
given 𝑛 training examples:
- (𝒙(1),𝑦(1))
- …
- (𝒙(𝑛),𝑦(𝑛))
Non-Linear SVM - Cost Function With Regularization
here is the Linear SVM’s regularized cost function:
- 𝐽(𝜽) = [ 𝐶·𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒙(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒙(𝑖))] + (1/2)·(𝜽𝑇𝜽) ]
instead of 𝒙(𝑖)we replace it with a feature function 𝑓(𝑖):
- 𝐽(𝜽) = [ 𝐶·𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒇(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒇(𝑖))] + (1/2)·(𝜽𝑇𝑀𝜽) ]
where:
- 𝑛 - number of training examples
- 𝜽 is now a vector in ℝ𝑛+1
- 𝒇(𝑖)is now a vector in ℝ𝑛+1
- 𝒇(𝑖)= [𝑓(𝑖)0, 𝑓(𝑖)1, …, 𝑓(𝑖)𝑛]
- 𝑓(𝑖)0= 𝑘(𝒙(𝑖), 𝒙(0))
- …
- 𝑓(𝑖)𝑛= 𝑘(𝒙(𝑖), 𝒙(𝑛))
- 𝑓(𝑖)𝑗is a scalar value computed by 𝑘(𝒙(𝑖),𝒙(𝑗)) which is a kernel function of your choice
- 𝒇(𝑖)= [𝑓(𝑖)0, 𝑓(𝑖)1, …, 𝑓(𝑖)𝑛]
- hyperparameters:
- 𝐶 - regularization parameter
- large 𝐶: high variance & low bias
- small 𝐶: low variance & high bias
- 𝑀 - distance matrix of your choice
- 𝐶 - regularization parameter
Non-Linear SVM - Learning 𝜽‘s
goal: optimize values of 𝜽 wrt cost function 𝐽(𝜽)
Non-Linear SVM - Hypothesis
given 𝒙 and the learned parameters 𝜽, the assigned output value is defined as (i.e. hypothesis):
- ℎ𝜽(𝒙) = 1, if 𝜽𝑇𝒇 ≥ 0
- ℎ𝜽(𝒙) = 0, otherwise
Non-Linear SVM - Processes
- perform feature scaling on 𝒙
- choose hyperparameter 𝐶
- choose distance matrix 𝑀
- choose kernel function type (see: kernel functions)
- construct & minimize the cost function