Ridge Regression
- is a type of Linear Regression Model for estimating the coefficients of multiple-regression models in scenarios where independent variables are highly correlated (i.e. colinear problem)
- utilizes Adjusted R Squared?
Ridge Regression
The solution or estimator for 𝛽ˆ using ridge regression is defined as:
where:
If 𝑋T𝑋 = 𝐷, then:
The downside to reducing the variance is that the estimator is biased:
The mean square error (MSE) is given by:
The aim would be to set 𝜆𝑗 so this is minimized.
Not able to minimize exactly since 𝛽𝑗 is unknown.
Note that:
and so 𝛽𝑅𝑗ˆ is being shrunk towards the origin. Known as a shrinkage estimator.
Another Way to Derive
The ridge estimator can be derived using “regularization”, or the inclusion of a penalty term in the objective function.
Suppose instead of minimizing:
- 𝐼(𝛽) = (𝑦 - 𝑋𝛽)T(𝑦 - 𝑋𝛽)
We minimize:
- 𝐼(𝛽) = (𝑦 - 𝑋𝛽)T(𝑦 - 𝑋𝛽) + 𝜆||𝛽||2
Now we get:
It is easy to see how to make this more general with different 𝜆s.
Regularization methods for estimating 𝛽 are now standard:
- 𝐼(𝛽) = (𝑦 - 𝑋𝛽)T(𝑦 - 𝑋𝛽) + 𝑃(𝛽)
for some penalty term 𝑃.
The penalty terms prevent the estimator 𝛽 from becoming large and indeed some can set some components of the estimator to be 0.
Ridge Regression - Example
Click here to expand...
Take:
- 𝑛 = 100
- 𝑝 = 5
𝑥𝑖𝑗 are independent standard uniform for 𝑗 = 1:4 and for 𝑗=5 we take 𝑥𝑖5 = 𝑥𝑖1 + 0.01𝑧𝑖 where 𝑧𝑖 are indepedent standard normal.
The:
- true 𝜎=1 which we assume to be known
- true 𝛽T = [2, -1, 3, -2, 0]
The first and last columns of 𝑋T𝑋 are highly colinear.
The smallest eigenvalue of 𝑋T𝑋 is 0.005. This will cause a high variance for some of the 𝛽𝑗.
The diagonal elements of (𝑋T𝑋)-1 are (95.92, 0.10, 0.10, 0.11, 96.33).
The estimator of 𝛽 is:
- 𝛽ˆT = (9.78, -1.19, 3.00, -2.09, -7.77)
The first and fifth estimators are unreliable, as anticipated.
We can get 𝛽𝑅ˆ for a range of 𝜆 values.
In practice a choice of 𝜆 could be close to 0, with no need for a large value.
A plot of the 𝛽𝑅1ˆ and 𝛽𝑅5ˆ as 𝜆 ranges between 0 and 5 is shown below
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/ridge-regression/ridge-regression-example.png)