Regularization - Parameter Weight Decay
regularization is a preference for smaller weights/parameter values
- 𝐽(𝜃) = 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔-𝑠𝑒𝑡) + 𝜆·𝛺(𝜃)
where:
- 𝐽(𝜃) is the cost function or objective function
- 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔-𝑠𝑒𝑡) is the mean square error function ( objective-function)
- 𝛺(𝜃) regularizer function (usually 𝛺(𝜃) = 𝜃𝑇𝜃)
- 𝜆 controls the power of regularizer over 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔-𝑠𝑒𝑡)
- smaller 𝜆 or 𝜆 = 0 imposes no preference and may lead to overfitting
- larger 𝜆 forces weights to become smaller, which may lead to underfitting
- medium 𝜆 good fit