Ridge Regression

Ridge Regression

The solution or estimator for 𝛽ˆ using ridge regression is defined as:

where:

If 𝑋T𝑋 = 𝐷, then:

The downside to reducing the variance is that the estimator is biased:

The mean square error (MSE) is given by:

The aim would be to set 𝜆𝑗 so this is minimized.

Not able to minimize exactly since 𝛽𝑗 is unknown.

Note that:

and so 𝛽𝑅𝑗ˆ is being shrunk towards the origin. Known as a shrinkage estimator.

Another Way to Derive

The ridge estimator can be derived using “regularization”, or the inclusion of a penalty term in the objective function.

Suppose instead of minimizing:

  • 𝐼(𝛽) = (𝑦 - 𝑋𝛽)T(𝑦 - 𝑋𝛽)

We minimize:

  • 𝐼(𝛽) = (𝑦 - 𝑋𝛽)T(𝑦 - 𝑋𝛽) + 𝜆||𝛽||2

Now we get:

It is easy to see how to make this more general with different 𝜆s.

Regularization methods for estimating 𝛽 are now standard:

  • 𝐼(𝛽) = (𝑦 - 𝑋𝛽)T(𝑦 - 𝑋𝛽) + 𝑃(𝛽)

for some penalty term 𝑃.

The penalty terms prevent the estimator 𝛽 from becoming large and indeed some can set some components of the estimator to be 0.

Ridge Regression - Example

Subpages

Resources