k-Nearest Neighbors (k-NN)
- is a non-probabilistic, non-parametric regression/instance-based, supervised learning approach where the response of a data point is determined by the nature of its 𝑘 neighbors from the training set. It can be used in both classification and regression settings
- is a type of Kernel Density Estimation with a uniform kernel with variable bandwidth to encompass 𝑘 nearest neighbors
- no real training stage
- in test time, given a test input 𝑥, we find the 𝑘 nearest neighbors to 𝑥 in the training set. then return the average of the corresponding 𝑦 values in the training set
- has a very high capacity
k-NN - Definition of Nearest
Nearest is based on either: distance measure or similarity measure
k-NN - Bias Variance Tradeoff
- higher the parameter 𝑘 → higher bias & lower variance (lower capacity)
- lower the parameter 𝑘 → higher variance & lower bias (higher capacity)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/instance-based-learning/k-nearest-neighbors-(k-nn)-regression/k-nearest-neighbors.png)
k-NN - Weakness
one weakness is that it cannot learn that one feature is more discriminative than another (e.g. imagine we have a regression task with 𝒙 ∊ ℝ100 drawn from an isotropic Gaussian distribution, but only a single variable 𝑥1 is relevant to the output. Suppose further that this feature simply encodes the output directly, i.e. that 𝑦 = 𝑥1 in all cases. Nearest neighbor regression will not be able to detect this simple pattern. The nearest neighbor of most points 𝒙 will be determined by the large number of features 𝑥2 through 𝑥100, not by the lone feature 𝑥1. Thus the output on small training sets will essentially be random)
k-NN - Types
training (fast):
- for each training example (𝑥, 𝑦) add to the list of training examples
prediction (slow):
- given query instance 𝑥𝑞
- 𝐾 = find 𝑘 instances from training examples that are “nearest” to 𝑥𝑞 (nearest based on distance measure or similarity measure)
- replace from below
- return class 𝑣
|
TYPE |
replace step 3 of prediction: |
|---|---|
|
Discrete |
𝑣 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦∊𝑌𝛴[𝛿(𝑦, 𝑓(𝑥𝑖)] # for each training example 𝑥𝑖 in 𝐾 |
|
Continuous |
𝑣 = [𝛴𝑓(𝑥𝑖)] / 𝑘 |
|
Discrete Distance-Weighted |
𝑣 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦∊𝑌𝛴[𝑤𝑖 * 𝛿(𝑦, 𝑓(𝑥𝑖)] |
|
Continuous Distance-Weighted |
𝑣 = [𝛴𝑤𝑖 * 𝑓(𝑥𝑖)] / [𝛴𝑤𝑖] |
where 𝑤𝑖is some distance such as:
- 𝑤𝑖= 1 / (some other distance measure)
- 𝑤𝑖= 1 / 𝑒𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛-𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥𝑞, 𝑥𝑖)2
- 𝑤𝑖= 1 / 𝑚𝑎𝑛ℎ𝑎𝑡𝑡𝑎𝑛-𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥𝑞, 𝑥𝑖)2
- 𝑤𝑖= (some other similarity measure)
Calculating Nearest Neighbor Efficiently
- Random Project Trees
- Nearest-Neighbor Descent (NN-Descent)
Resources
- Interactive k-NN demo - http://vision.stanford.edu/teaching/cs231n-demos/knn/