Softmax Activation Function

Function & Its Derivative

  • 𝑓(𝑧, 𝐳) = (𝑒𝑧) / (𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])
  • 𝛿𝑓(𝑧, 𝐳)/𝛿𝑧 = 𝑓(𝑧, 𝐳) * [1 - 𝑓(𝑧, 𝐳)] # see derivation

Function With Cross-Entropy Loss Function & Its Derivative

Using cross-entropy loss function -𝑙𝑛(·):

  • 𝐿(𝑧) = -𝑙𝑛(𝑓(𝑧, 𝐳))
  • 𝐿(𝑧) = -𝑙𝑛((𝑒𝑧) / ((𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]))
  • 𝐿(𝑧) = 𝑙𝑛(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝑙𝑛(𝑒𝑧)
  • 𝐿(𝑧) = 𝑙𝑛(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝑧

Derivative of loss-function 𝐿(𝑧) w.r.t. 𝑧:

  • 𝛿𝐿/𝛿𝑧 = 𝛿/𝛿𝑧·𝑙𝑛(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝛿/𝛿𝑧·𝑧
  • 𝛿𝐿/𝛿𝑧 = 1/(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])*𝛿/𝛿𝑧·𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖] - 𝛿/𝛿𝑧·𝑧 # via derivative of logarithm
  • 𝛿𝐿/𝛿𝑧 = 1/(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])*(𝑒𝑧) - 𝛿/𝛿𝑧·𝑧
  • 𝛿𝐿/𝛿𝑧 = (𝑒𝑧)/(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝛿/𝛿𝑧·𝑧
  • 𝛿𝐿/𝛿𝑧 = 𝑓(𝑧, 𝐳) - 𝛿/𝛿𝑧·𝑧 # substitute softmax function

either:

  • 𝛿𝐿/𝛿𝑧 = 𝑓(𝑧, 𝐳) - 1 # if 𝑧 and 𝑧 are the same variable in 𝛿/𝛿𝑧·𝑧
  • 𝛿𝐿/𝛿𝑧 = 𝑓(𝑧, 𝐳) - 0 # if 𝑧 and 𝑧 are NOT the same variable in 𝛿/𝛿𝑧·𝑧

Subpages