Non-Linear SVM - Representation

𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥₁, …, 𝑥_𝑘])
𝑦 - binary output value (0 or 1)
𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃₀, …, 𝜃_𝑛]) # 𝜽 ∊ ℝ^𝑛+1while in Linear SVM 𝜽 ∊ ℝ^𝑘

given 𝑛 training examples:

Non-Linear SVM - Cost Function With Regularization

here is the Linear SVM’s regularized cost function:

𝐽(𝜽) = [ 𝐶·𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑐𝑜𝑠𝑡₁(𝜽^𝑇𝒙^(𝑖)) + (1-𝑦^(𝑖))·𝑐𝑜𝑠𝑡₀(𝜽^𝑇𝒙^(𝑖))] + (1/2)·(𝜽^𝑇𝜽) ]

instead of 𝒙^(𝑖)we replace it with a feature function 𝑓^(𝑖):

𝐽(𝜽) = [ 𝐶·𝛴_{1≤𝑖≤𝑛}[𝑦^(𝑖)·𝑐𝑜𝑠𝑡₁(𝜽^𝑇𝒇^(𝑖)) + (1-𝑦^(𝑖))·𝑐𝑜𝑠𝑡₀(𝜽^𝑇𝒇^(𝑖))] + (1/2)·(𝜽^𝑇𝑀𝜽) ]

where:

𝑛 - number of training examples
𝜽 is now a vector in ℝ^𝑛+1
𝒇^(𝑖)is now a vector in ℝ^𝑛+1
- 𝒇^(𝑖)= [𝑓^(𝑖)₀, 𝑓^(𝑖)₁, …, 𝑓^(𝑖)_𝑛]
  - 𝑓^(𝑖)₀= 𝑘(𝒙^(𝑖), 𝒙⁽⁰⁾)
  - …
  - 𝑓^(𝑖)_𝑛= 𝑘(𝒙^(𝑖), 𝒙^(𝑛))
- 𝑓^(𝑖)_𝑗is a scalar value computed by 𝑘(𝒙^(𝑖),𝒙^(𝑗)) which is a kernel function of your choice
hyperparameters:
- 𝐶 - regularization parameter
  - large 𝐶: high variance & low bias
  - small 𝐶: low variance & high bias
- 𝑀 - distance matrix of your choice

goal: optimize values of 𝜽 wrt cost function 𝐽(𝜽)

given 𝒙 and the learned parameters 𝜽, the assigned output value is defined as (i.e. hypothesis):