Non-Linear SVM (SVM With Kernels)

Non-Linear SVM - Representation

  • 𝒙 - input attribute values vector (i.e. 𝒙 = [𝑥1, …, 𝑥𝑘])
  • 𝑦 - binary output value (0 or 1)
  • 𝜽 - weight/parameter vector (i.e. 𝜽 = [𝜃0, …, 𝜃𝑛]) # 𝜽 ∊ ℝ𝑛+1while in Linear SVM 𝜽 ∊ ℝ𝑘

given 𝑛 training examples:

  • (𝒙(1),𝑦(1))
  • (𝒙(𝑛),𝑦(𝑛))

Non-Linear SVM - Cost Function With Regularization

here is the Linear SVM’s regularized cost function:

  • 𝐽(𝜽) = 𝐶·𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒙(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒙(𝑖))] + (1/2)·(𝜽𝑇𝜽) ]

instead of 𝒙(𝑖)we replace it with a feature function 𝑓(𝑖):

  • 𝐽(𝜽) = 𝐶·𝛴1≤𝑖≤𝑛[𝑦(𝑖)·𝑐𝑜𝑠𝑡1(𝜽𝑇𝒇(𝑖)) + (1-𝑦(𝑖))·𝑐𝑜𝑠𝑡0(𝜽𝑇𝒇(𝑖))] + (1/2)·(𝜽𝑇𝑀𝜽) ]

where:

  • 𝑛 - number of training examples
  • 𝜽 is now a vector in ℝ𝑛+1
  • 𝒇(𝑖)is now a vector in ℝ𝑛+1
    • 𝒇(𝑖)= [𝑓(𝑖)0, 𝑓(𝑖)1, …, 𝑓(𝑖)𝑛]
      • 𝑓(𝑖)0= 𝑘(𝒙(𝑖), 𝒙(0))
      • 𝑓(𝑖)𝑛= 𝑘(𝒙(𝑖), 𝒙(𝑛))
    • 𝑓(𝑖)𝑗is a scalar value computed by 𝑘(𝒙(𝑖),𝒙(𝑗)) which is a kernel function of your choice
  • hyperparameters:
    • 𝐶 - regularization parameter
      • large 𝐶: high variance & low bias
      • small 𝐶: low variance & high bias
    • 𝑀 - distance matrix of your choice

Non-Linear SVM - Learning 𝜽‘s

goal: optimize values of 𝜽 wrt cost function 𝐽(𝜽)

Non-Linear SVM - Hypothesis

given 𝒙 and the learned parameters 𝜽, the assigned output value is defined as (i.e. hypothesis):

  • 𝜽(𝒙) = 1, if 𝜽𝑇𝒇 ≥ 0
  • 𝜽(𝒙) = 0, otherwise

Non-Linear SVM - Processes

  • perform feature scaling on 𝒙
  • choose hyperparameter 𝐶
  • choose distance matrix 𝑀
  • choose kernel function type (see: kernel functions)
  • construct & minimize the cost function