A perceptron contains 2 phases:

weighted sum function (ideally linear) - calculate a “weighted sum” of its input and its bias/constant
activation function (ideally non-linear) - then decide whether it should be “fired” or not
1. synonymous to non-linear layer

Weighted Sum Function

outputs a 𝑧 value ranging from (-∞ to +∞)
doesn’t have a builtin mechanism whether to fire the perceptron or not, this is why we have activation functions

example weighted sum function

Indent

𝑧 = [𝛴_{1≤𝑖≤𝑛}(𝑤𝑒𝑖𝑔ℎ𝑡_𝑖* 𝑖𝑛𝑝𝑢𝑡_𝑖)] + [𝑤𝑒𝑖𝑔ℎ𝑡₀ * 𝑏𝑖𝑎𝑠/𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡]
𝑧 = [𝛴_{1≤𝑖≤𝑛}(𝑤_𝑖* 𝑥_𝑖)] + [𝑤₀ * 𝑥₀]
𝑧 = [𝛴_{0≤𝑖≤𝑛}(𝑤_𝑖* 𝑥_𝑖)]
𝑧 = 𝑤^𝑇𝑥

Independent Activation Functions

there are various types of activation functions each with there pros and cons

AF	Output Function	Output Range	Pros	Cons
Step Function	𝑓(𝑧) = 1, if 𝑧 > threshold 𝑓(𝑧) = 0, if 𝑧 ≤ threshold	0 or 1		hard to train for classifying 3 or more classes (bc each node for classification outputs 0 or 1 not a range of values in which we could obtain max or softmax
Linear	𝑓(𝑧) = 𝑐𝑧 for some scalar 𝑐	(-inf, +inf)		is linear, which means derivative with respect to 𝑌 is always a constant c (i.e. the gradient has no relationship with 𝑌) output is not bounded which could blow up activations
Sigmoid	𝑓(𝑧) = 1/(1+𝑒^-𝑧)	(0, 1)	non-linear output is bounded therefore won’t blow up activations outputs a probability can be used to classify NOT mutually exclusive classes	saturation causes Vanishing Gradient Problem outputs are not 0 centered exp() is a bit compute expensive
Tanh	𝑓(𝑧) = 𝑡𝑎𝑛ℎ(𝑧) = 2 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(2𝑧) - 1	(-1, 1)	non-linear output is bounded therefore won’t blow up activations outputs are zero centered	saturation causes Vanishing Gradient Problem
ReLU	𝑓(𝑧) = 𝑚𝑎𝑥(0, 𝑧)	[0, +inf)	non-linear sparsity of activation less computationally expensive than sigmoid and tanh does not saturate in + region	Dying ReLU Problem (i.e. bc of the horizontal line the gradient will be 0 and thus stop responding to variations in error/input) outputs are not zero centered
Softplus	𝑓(𝑧) = 𝑙𝑜𝑔(1+𝑒^𝑧)	(0, +inf)

Activation Function	Output Function	Output Range	Description
Softmax	𝑓(𝑧, 𝐳) = (𝑒^𝑧) / (𝛴_{𝑧_𝑖∈𝐳}[𝑒^𝑧_𝑖]) where: 𝑧 is some element in set 𝐳 for example, 𝐳=[2,-1,3]: 𝑓(𝑧=2, 𝐳) = 0.265 𝑓(𝑧=-1, 𝐳) = 0.013 𝑓(𝑧=3, 𝐳) = 0.721	[0, 1]	non-linear output is bounded therefore won’t blow up activations outputs a probability summation of all outputs equal 1 used for classifying mutually exclusive classes

Activation Function

Output Function

Output Range

Description

where: 𝑧 is some element in set 𝐳

for example, 𝐳=[2,-1,3]:

[0, 1]