Softmax Activation Function
- is a type of activation function used as the final output layer
- is a type of loss layer
- uses the soft-argmax function
Function & Its Derivative
- 𝑓(𝑧, 𝐳) = (𝑒𝑧) / (𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])
- 𝛿𝑓(𝑧, 𝐳)/𝛿𝑧 = 𝑓(𝑧, 𝐳) * [1 - 𝑓(𝑧, 𝐳)] # see derivation
Function With Cross-Entropy Loss Function & Its Derivative
Using cross-entropy loss function -𝑙𝑛(·):
- 𝐿(𝑧) = -𝑙𝑛(𝑓(𝑧, 𝐳))
- 𝐿(𝑧) = -𝑙𝑛((𝑒𝑧) / ((𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]))
- 𝐿(𝑧) = 𝑙𝑛(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝑙𝑛(𝑒𝑧)
- 𝐿(𝑧) = 𝑙𝑛(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝑧
Derivative of loss-function 𝐿(𝑧) w.r.t. 𝑧:
- 𝛿𝐿/𝛿𝑧 = 𝛿/𝛿𝑧·𝑙𝑛(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝛿/𝛿𝑧·𝑧
- 𝛿𝐿/𝛿𝑧 = 1/(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])*𝛿/𝛿𝑧·𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖] - 𝛿/𝛿𝑧·𝑧 # via derivative of logarithm
- 𝛿𝐿/𝛿𝑧 = 1/(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])*(𝑒𝑧) - 𝛿/𝛿𝑧·𝑧
- 𝛿𝐿/𝛿𝑧 = (𝑒𝑧)/(𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖]) - 𝛿/𝛿𝑧·𝑧
- 𝛿𝐿/𝛿𝑧 = 𝑓(𝑧, 𝐳) - 𝛿/𝛿𝑧·𝑧 # substitute softmax function
either:
- 𝛿𝐿/𝛿𝑧 = 𝑓(𝑧, 𝐳) - 1 # if 𝑧 and 𝑧 are the same variable in 𝛿/𝛿𝑧·𝑧
- 𝛿𝐿/𝛿𝑧 = 𝑓(𝑧, 𝐳) - 0 # if 𝑧 and 𝑧 are NOT the same variable in 𝛿/𝛿𝑧·𝑧