Linear Model vs Non-Linear Models
comparing linear regression / logistic regression to neural networks
- non-linear hypothesis - (linear regression / logistic regression) problem
- non-linear hypothesis - neural networks (part 1)
- non-linear hypothesis - neural networks (part 2)
summary:
- linear regression and logistic regression cannot learn non-linear functions wrt inputs
- neural networks can learn non-linear functions wrt inputs
Linear Models
types of linear models:
linear models can be fit efficiently and reliably (either in closed form or with convex optimization)
downside of linear models is its capacity is limited to linear functions (i.e. cannot understand the interaction between any 2 input variables)
Non-Linear Models (Extending Linear Models)
to extend linear models to represent non-linear functions of 𝑥, we can apply the linear model not to 𝑥 itself but to a transformed input 𝜙(𝑥), where 𝜙 is a non-linear transformation
equivalently we can apply the kernel trick to obtain a non-linear learning algorithm based on implicitly applying the 𝜙 mapping
how to choose the feature mapping 𝜙:
- use a very generic 𝜙, such as the infinite-dimensional 𝜙 that is implicitly used by kernel machines based on the RBF kernel.
- if 𝜙(x) is high enough dimension:
- we can always have enough capacity to fit the training set
- generalization to the test set often remains poor
- very generic feature mappings are usually based only on the principle of local smoothness and do not encode enough prior information to solve advanced problems
- if 𝜙(x) is high enough dimension:
- manually engineer 𝜙:
- before the advent of deep learning this was the dominant approach
- requires decades of human effort for each separate task
- auto learn 𝜙 with deep learning
- we have a model: 𝑦 = 𝑓(𝑥;𝜃,𝑤) = 𝜙(𝑥;𝜃)????
- we now have parameters:
- 𝜃 that we use to learn 𝜙 from a broad class of functions
- 𝑤 that map 𝜙 to the desired output
- this is an example of deep feedforward network with 𝜙 defining a hidden layer
- this approach is the only one of the three that gives up on the convexity of the problem
- this approach can capture the benefit:
- first approach by being highly generic (by using a very broad family 𝜙(𝑥;𝜃)
- second approach by allowing human practitioners to encode their knowledge to help generalization by designing families 𝜙(𝑥;𝜃) that the human practitioner expect will perform well (the advantage is that the human designer only needs to find the right general function family rather than finding precisely the right function)
Deep Learning Modelss
other models will apply the third principle to:
- learning deterministic feature mappings from 𝑥 to 𝑦 (lacking feedback connections)
- learning stochastic mappings
- learning function with feedback
- learning probability distributions over a single vector
a common example of Deep Learning Model is the Feed-Forward Neural Network - Multi-Layer Perceptrons (MLP)