a major drawback to kernel machines is that the cost of evaluating the decision function is linear in the number of training examples (bc the 𝑖th example contributes a term 𝛼𝑖 𝑘(𝒙,𝒙𝑖) to the decision function). SVMs are able to mitigate this by learning an 𝛼 vector that contains mostly zeros, then classifying a new example then requires evaluating the kernel function ONLY for training examples that have non-zero 𝛼𝑖 (these training examples are known as support vectors)
kernel machines will still suffer from the high computational cost of training when the dataset is large
kernel machines with generic kernel functions struggle to generalize well
Deep Learning was designed to overcome these limitations of kernel machines (Hinton 2006 demonstrated that a neural network could outperform the RBF kernelSVM on the MNIST benchmark)