A perceptron contains 2 phases:

  1. weighted sum function (ideally linear) - calculate a “weighted sum” of its input and its bias/constant
  2. activation function (ideally non-linear) - then decide whether it should be “fired” or not
    1. synonymous to non-linear layer

Weighted Sum Function

  • outputs a 𝑧 value ranging from (-∞ to +∞)
  • doesn’t have a builtin mechanism whether to fire the perceptron or not, this is why we have activation functions

example weighted sum function

Indent

𝑧 = [𝛴1≤𝑖≤𝑛(𝑤𝑒𝑖𝑔ℎ𝑡𝑖* 𝑖𝑛𝑝𝑢𝑡𝑖)] + [𝑤𝑒𝑖𝑔ℎ𝑡0 * 𝑏𝑖𝑎𝑠/𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡]
𝑧 = [𝛴1≤𝑖≤𝑛(𝑤𝑖* 𝑥𝑖)] + [𝑤0 * 𝑥0]
𝑧 = [𝛴0≤𝑖≤𝑛(𝑤𝑖* 𝑥𝑖)]
𝑧 = 𝑤𝑇𝑥

Independent Activation Functions

there are various types of activation functions each with there pros and cons

AF

Output Function

Output Range

Pros

Cons

Step Function

𝑓(𝑧) = 1, if 𝑧 > threshold
𝑓(𝑧) = 0, if 𝑧 ≤ threshold

0 or 1

  • hard to train for classifying 3 or more classes (bc each node for classification outputs 0 or 1 not a range of values in which we could obtain max or softmax

Linear

𝑓(𝑧) = 𝑐𝑧

for some scalar 𝑐

(-inf, +inf)

  • is linear, which means derivative with respect to 𝑌 is always a constant c (i.e. the gradient has no relationship with 𝑌)
  • output is not bounded which could blow up activations

Sigmoid

𝑓(𝑧) = 1/(1+𝑒-𝑧)

(0, 1)

  • non-linear
  • output is bounded therefore won’t blow up activations
  • outputs a probability
  • can be used to classify NOT mutually exclusive classes

Tanh

𝑓(𝑧) = 𝑡𝑎𝑛ℎ(𝑧) = 2 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(2𝑧) - 1

(-1, 1)

  • non-linear
  • output is bounded therefore won’t blow up activations
  • outputs are zero centered

ReLU

𝑓(𝑧) = 𝑚𝑎𝑥(0, 𝑧)

[0, +inf)

  • non-linear
  • sparsity of activation
  • less computationally expensive than sigmoid and tanh
  • does not saturate in + region
  • Dying ReLU Problem (i.e. bc of the horizontal line the gradient will be 0 and thus stop responding to variations in error/input)
  • outputs are not zero centered

Softplus

𝑓(𝑧) = 𝑙𝑜𝑔(1+𝑒𝑧)

(0, +inf)

Dependent Activation Functions

Activation Function

Output Function

Output Range

Description

Softmax

  • 𝑓(𝑧, 𝐳) = (𝑒𝑧) / (𝛴𝑧𝑖∈𝐳[𝑒𝑧𝑖])

where: 𝑧 is some element in set 𝐳

for example, 𝐳=[2,-1,3]:

  • 𝑓(𝑧=2, 𝐳) = 0.265
  • 𝑓(𝑧=-1, 𝐳) = 0.013
  • 𝑓(𝑧=3, 𝐳) = 0.721

[0, 1]

  • non-linear
  • output is bounded therefore won’t blow up activations
  • outputs a probability
  • summation of all outputs equal 1
  • used for classifying mutually exclusive classes

Activation Functions Comparisons

Resources