Max Norm Constraints/Regularization
  • is a regularization method that enforces an absolute upper bound on the magnitude of the weight vector for every neuron and uses projected gradient descent to enforce the constraint
  • In practice, this corresponds to performing the parameter update as normal and then enforcing the constraint by clamping the weight vector 𝑤⃗ of every neuron to satisfy ‖𝑤⃗‖2<𝑐.
    • typical values of 𝑐 are on orders of 3 or 4
    • ‖·‖2 is the L2 norm
  • Some people report improvements when using this form of regularization
  • One of its appealing properties is that the model cannot “explode” even when the learning rates are set too high because the updates are always bounded