ML - Parametric vs Non-Parametric

	Parametric	Non-Parametric
simple	learns a function described by a finite amount of parameters	learns a function with no limit of parameters
description	Assumptions can greatly simplify the learning process, but can also limit what can be learned. Algorithms that simplify the function to a known form are called parametric machine learning algorithms these algorithms involve two steps: select a form/model for the function learn the coefficients for the function from the training data	Algorithms that do not make strong assumptions about the form of the mapping function are called non-parametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance
benefits	Simpler: These methods are easier to understand and interpret results. Speed: Parametric models are very fast to learn from data. Less Data: They do not require as much training data and can work well even if the fit to the data is not perfect	Flexibility: Capable of fitting a large number of functional forms. Power: No assumptions (or weak assumptions) about the underlying function. Performance: Can result in higher performance models for prediction
limitations	Constrained: By choosing a functional form these methods are highly constrained to the specified form. Limited Complexity: The methods are more suited to simpler problems. Poor Fit: In practice the methods are unlikely to match the underlying mapping function	More data: Require a lot more training data to estimate the mapping function. Slower: A lot slower to train as they often have far more parameters to train. Overfitting: More of a risk to overfit the training data and it is harder to explain why specific predictions are made
Bias-Variance Trade-Off	generally have: higher bias lower variance	generally have: lower bias higher variance
Probability	Parametric Probability Distribution Models	Non-Parametric Probability Distribution Models
Example Algorithms	Artificial Neural Networks (ANN) Perceptron Classification Models Linear Discriminant Analysis (LDA) Naive Bayes	Artificial Neural Networks (ANN) Decision Trees (e.g. CART and C4.5) Non-Parametric Regression and Semi-Parametric Regression methods have been developed based on kernels, splines, and wavelets Kernel Distribution Estimation (KDE): Histogram - KDE with bandwidth = 0 k-Nearest Neighbors - KDE with uniform kernel Support Vector Machines (SVM) - KDE with a Gaussian kernel Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate Analysis without any distributional assumption