Regularization - Parameter Weight Decay

regularization is a preference for smaller weights/parameter values

  • 𝐽(𝜃) = 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔-𝑠𝑒𝑡) + 𝜆·𝛺(𝜃)

where:

  • 𝐽(𝜃) is the cost function or objective function
  • 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔-𝑠𝑒𝑡) is the mean square error function ( objective-function)
  • 𝛺(𝜃) regularizer function (usually 𝛺(𝜃) = 𝜃𝑇𝜃)
  • 𝜆 controls the power of regularizer over 𝑀𝑆𝐸(𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔-𝑠𝑒𝑡)
    • smaller 𝜆 or 𝜆 = 0 imposes no preference and may lead to overfitting
    • larger 𝜆 forces weights to become smaller, which may lead to underfitting
    • medium 𝜆 good fit