k-Nearest Neighbors (k-NN)
  • is a non-probabilisticnon-parametric regression/instance-basedsupervised learning approach where the response of a data point is determined by the nature of its 𝑘 neighbors from the training set. It can be used in both classification and regression settings
  • is a type of Kernel Density Estimation with a uniform kernel with variable bandwidth to encompass 𝑘 nearest neighbors
  • no real training stage
  • in test time, given a test input 𝑥, we find the 𝑘 nearest neighbors to 𝑥 in the training set. then return the average of the corresponding 𝑦 values in the training set
  • has a very high capacity

k-NN - Definition of Nearest

Nearest is based on either: distance measure or similarity measure

k-NN - Bias Variance Tradeoff

  • higher the parameter 𝑘 → higher bias & lower variance (lower capacity)
  • lower the parameter 𝑘 → higher variance & lower bias (higher capacity)

k-NN - Weakness

one weakness is that it cannot learn that one feature is more discriminative than another (e.g. imagine we have a regression task with 𝒙 ∊ ℝ100 drawn from an isotropic Gaussian distribution, but only a single variable 𝑥1 is relevant to the output. Suppose further that this feature simply encodes the output directly, i.e. that 𝑦 = 𝑥1 in all cases. Nearest neighbor regression will not be able to detect this simple pattern. The nearest neighbor of most points 𝒙 will be determined by the large number of features 𝑥2 through 𝑥100, not by the lone feature 𝑥1. Thus the output on small training sets will essentially be random)

k-NN - Types

training (fast):

  • for each training example (𝑥, 𝑦) add to the list of training examples

prediction (slow):

  1. given query instance 𝑥𝑞
  2. 𝐾 = find 𝑘 instances from training examples that are “nearest” to 𝑥𝑞 (nearest based on distance measure or similarity measure)
  3. replace from below
  4. return class 𝑣

TYPE

replace step 3 of prediction:

Discrete

𝑣 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦∊𝑌𝛴[𝛿(𝑦, 𝑓(𝑥𝑖)] # for each training example 𝑥𝑖 in 𝐾

Continuous

𝑣 = [𝛴𝑓(𝑥𝑖)] / 𝑘

Discrete Distance-Weighted

𝑣 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦∊𝑌𝛴[𝑤𝑖 * 𝛿(𝑦, 𝑓(𝑥𝑖)]

Continuous Distance-Weighted

𝑣 = [𝛴𝑤𝑖 * 𝑓(𝑥𝑖)] / [𝛴𝑤𝑖]

where 𝑤𝑖is some distance such as:

Calculating Nearest Neighbor Efficiently

Resources