Vanilla/Feed-Forward Neural Networks (FNN/FFNN/FFN) - Multi-Layer/Multilayer Perceptrons (MLP)

is the simplest type of artificial neural network architecture wherein connections between the perceptrons do not form a cycle

those with cycles/feedbacks are recurrent neural networks

FNN - Prerequisite

read: Binary Logistic Regression (BLR)

FNN - Model Representation

Click here to expand...

given 𝑛 sample/training data:

(𝑦⁽¹⁾, 𝒙⁽¹⁾) = (𝑦⁽¹⁾, 𝑥₁⁽¹⁾, 𝑥₂⁽¹⁾, …, 𝑥_𝑘⁽¹⁾) # sample 1

(𝑦⁽²⁾, 𝒙⁽²⁾) = (𝑦⁽²⁾, 𝑥₁⁽²⁾, 𝑥₂⁽²⁾, …, 𝑥_𝑘⁽²⁾) # sample 2

…

(𝑦^(𝑛), 𝒙^(𝑛)) = (𝑦^(𝑛), 𝑥₁^(𝑛), 𝑥₂^(𝑛), …, 𝑥_𝑘^(𝑛)) # sample 𝑛

we define:

𝐿 - total number of layers in the network

𝑠_𝑙 - number of perceptrons (not counting the bias unit) in layer 𝑙

𝑠_𝐿- number of output units

Binomial Classification (2 classes)

𝑦 = 0 or 1

𝑠_𝐿= 1 (e.g. 1 output unit)

ℎ_𝜃(𝒙) outputs a scalar value between 0 and 1 inclusive (e.g. 0.99 or 0.45 is a possible output)

Multinomial Classification (𝑐 classes)

𝑦∊ ℝ^𝑐 (e.g. when 𝑐=3 then 𝑦_𝑖∊ {[1,0,0]^𝑇, [0,1,0]^𝑇, [0,0,1]^𝑇})

𝑠_𝐿= 𝑐 (e.g. 𝑐 output units)

ℎ_𝜃(𝒙) outputs a 𝑐-dimensional vector where each entry is a scalar value between 0 and 1 inclusive (e.g. when 𝑐=3 then [0.99,0.02,0.45]^𝑇is a possible output)

FNN - Cost Function

Click here to expand...

Neural Network’s Cost Function (binomial classification)

𝐽(𝜃) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[(𝑦^(𝑖))·𝑙𝑜𝑔(ℎ_𝜃(𝒙^(𝑖))) + (1-𝑦^(𝑖))·𝑙𝑜𝑔(1-ℎ_𝜃(𝒙^(𝑖)))]] # same as binomial logistic regression

Neural Network’s Cost Function (multinomial classification)

𝐽(𝜃) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[𝛴_{1≤𝑗≤𝑐}[(𝑦^(𝑖)[𝑗])·𝑙𝑜𝑔(ℎ_𝜃(𝒙^(𝑖))[𝑗]) + (1-𝑦^(𝑖)[𝑗])·𝑙𝑜𝑔(1-ℎ_𝜃(𝒙^(𝑖))[𝑗])]]] # same as multinomial logistic regression

where:

𝑦[𝑗] - is the 𝑗^𝑡ℎ entry of the vector

ℎ_𝜃(𝒙)[𝑗] - is the 𝑗^𝑡ℎ entry of the vector

FNN - Cost Function With Regularization of 𝜃s

Click here to expand...

Neural Network’s Cost Function with regularization of 𝜃s (binomial classification)

𝐽(𝜃) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[𝛴_{1≤𝑗≤𝑐}[(𝑦^(𝑖))·𝑙𝑜𝑔(ℎ_𝜃(𝒙^(𝑖))) + (1-𝑦^(𝑖))·𝑙𝑜𝑔(1-ℎ_𝜃(𝒙^(𝑖)))]]] + (𝜆/2𝑛)·[𝛴_{1≤𝑙≤𝐿}𝛴_{1≤𝑖≤𝑠𝑙}𝛴_{1≤𝑗≤𝑠𝑙+1}(𝜃_𝑙[𝑖,𝑗])²] # similar to binomial logistic regressioni

Neural Network’s Cost Function with regularization of 𝜃s (multinomial classification)

𝐽(𝜃) = -(1/𝑚)·[𝛴_{1≤𝑖≤𝑛}[𝛴_{1≤𝑗≤𝑐}[(𝑦^(𝑖)[𝑗])·𝑙𝑜𝑔(ℎ_𝜃(𝒙^(𝑖))[𝑗]) + (1-𝑦^(𝑖)[𝑗])·𝑙𝑜𝑔(1-ℎ_𝜃(𝒙^(𝑖))[𝑗])]]] + (𝜆/2𝑛)·[𝛴_{1≤𝑙≤𝐿}𝛴_{1≤𝑖≤𝑠_𝑙}𝛴_{1≤𝑗≤𝑠_𝑙+1}(𝜃_𝑙[𝑖,𝑗])²] # similar to multinomial logistic regression

where:

𝜃_𝑙[𝑖,𝑗] - the coefficient 𝜃 connecting (perceptron 𝑖 at layer 𝑙) to (perceptron 𝑗 at layer 𝑙+1)

FNN - Learning 𝜃s With Gradient Descent & Backpropagation

Click here to expand...

need to compute (𝛿/𝛿𝜃_𝑙[𝑖,𝑗]) 𝐽(𝜃) wrt to every 𝜃_𝑙[𝑖,𝑗]

Given 1 Training Data (𝑦, 𝑥₁, …, 𝑥_𝑘)

forward propagation:

𝑎₁= [𝑥₁, …, 𝑥_𝑘]^𝑇

𝑧₂ = 𝜃₁𝑎₁

𝑎₂= 𝑔(𝑧₂)

𝑧₃ = 𝜃₂𝑎₂

𝑎₃= 𝑔(𝑧₃)

…

𝑧_𝐿 = 𝜃_𝐿-1𝑎_𝐿-1

𝑎_𝐿= 𝑔(𝑧_𝐿)

ℎ_𝜃(𝑥₁, …, 𝑥_𝑘) = 𝑎_𝐿

𝛿_𝑙[𝑗]= error of node 𝑗 at layer 𝑙

for each output unit 𝑗 at the last layer 𝐿:

𝛿_𝐿[𝑗] = ℎ_𝜃(𝑥₁, …, 𝑥_𝑘)[𝑗] - 𝑦[𝑗]

𝛿_𝐿[𝑗] = 𝑎_𝐿[𝑗] - 𝑦[𝑗]

in vector format

𝛿_𝐿= ℎ_𝜃(𝑥₁, …, 𝑥_𝑘) - 𝑦

𝛿_𝐿= 𝑎_𝐿 - 𝑦

for previous layers (𝐿-1 to 1):

𝛿_𝐿-1= (𝜃_𝐿-1)^𝑇𝛿_𝐿 · 𝑔’(𝑧_𝐿-1) = (𝜃_𝐿-1)^𝑇𝛿_𝐿 · 𝑎_𝐿-1 · (1 - 𝑎_𝐿-1)

𝛿_𝐿-2= (𝜃_𝐿-2)^𝑇𝛿_𝐿-1 · 𝑔’(𝑧_𝐿-2) = (𝜃_𝐿-2)^𝑇𝛿_𝐿-1 · 𝑎_𝐿-2 · (1 - 𝑎_𝐿-2)

…

𝛿₂= (𝜃₂)^𝑇𝛿₃ · 𝑔’(𝑧₂) = (𝜃₂)^𝑇𝛿₃ · 𝑎₂ · (1 - 𝑎₂)

no need for 𝛿₁

Given Training Set {(𝑦⁽¹⁾, 𝑥₁⁽¹⁾, 𝑥₂⁽¹⁾, …, 𝑥_𝑘⁽¹⁾), …, (𝑦^(𝑛), 𝑥₁^(𝑛), 𝑥₂^(𝑛), …, 𝑥_𝑘^(𝑛))}

set 𝛥_𝑙[𝑖,𝑗] = 0 for all 𝑙𝑖𝑗

for 𝑖 = 1 to 𝑛

set 𝑎₁= [𝑥₁^(𝑖), …, 𝑥_𝑘^(𝑖)]^𝑇

perform forward propagation to compute 𝑎_𝑙 for 𝑙 = 2 to 𝐿

using 𝑦_𝑖, compute 𝛿_𝐿= 𝑎_𝐿 - 𝑦_𝑖

compute 𝛿_𝐿-1, …, 𝛿₂

𝛥_𝑙[𝑖,𝑗] ← 𝛥_𝑙[𝑖,𝑗]+ 𝑎_𝑙[𝑗]·𝛿_𝑙+1[𝑖] # vectorized form 𝛥_𝑙 ← 𝛥_𝑙 + (𝛿_𝑙+1)·(𝑎_𝑙)^𝑇

(𝛿/𝛿𝜃_𝑙[𝑖,𝑗])𝐽(𝜃) = (1/𝑚)·𝛥_𝑙[𝑖,𝑗] + 𝜆·𝜃_𝑙[𝑖,𝑗] # 𝑗 ≠ 0

(𝛿/𝛿𝜃_𝑙[𝑖,𝑗])𝐽(𝜃) = (1/𝑚)·𝛥_𝑙[𝑖,𝑗] # 𝑗 = 0

Resources

https://glassboxmedicine.com/2019/01/17/introduction-to-neural-networks/

／var／log marcus chiu

Explorer

Vanilla／Feed-Forward Neural Networks (FNN／FFNN／FFN) - Multi-Layer／Multilayer Perceptrons (MLP)

Vanilla/Feed-Forward Neural Networks (FNN/FFNN/FFN) - Multi-Layer/Multilayer Perceptrons (MLP)

FNN - Prerequisite

FNN - Model Representation

Binomial Classification (2 classes)

Multinomial Classification (𝑐 classes)

FNN - Cost Function

Neural Network’s Cost Function (binomial classification)

Neural Network’s Cost Function (multinomial classification)

FNN - Cost Function With Regularization of 𝜃s

Neural Network’s Cost Function with regularization of 𝜃s (binomial classification)

Neural Network’s Cost Function with regularization of 𝜃s (multinomial classification)

FNN - Learning 𝜃s With Gradient Descent & Backpropagation

Given 1 Training Data (𝑦, 𝑥₁, …, 𝑥_𝑘)

Given Training Set {(𝑦⁽¹⁾, 𝑥₁⁽¹⁾, 𝑥₂⁽¹⁾, …, 𝑥_𝑘⁽¹⁾), …, (𝑦^(𝑛), 𝑥₁^(𝑛), 𝑥₂^(𝑛), …, 𝑥_𝑘^(𝑛))}

Resources

／var／logmarcus chiu

Explorer

Vanilla／Feed-Forward Neural Networks (FNN／FFNN／FFN) - Multi-Layer／Multilayer Perceptrons (MLP)

Vanilla/Feed-Forward Neural Networks (FNN/FFNN/FFN) - Multi-Layer/Multilayer Perceptrons (MLP)

FNN - Prerequisite

FNN - Model Representation

Binomial Classification (2 classes)

Multinomial Classification (𝑐 classes)

FNN - Cost Function

Neural Network’s Cost Function (binomial classification)

Neural Network’s Cost Function (multinomial classification)

FNN - Cost Function With Regularization of 𝜃s

Neural Network’s Cost Function with regularization of 𝜃s (binomial classification)

Neural Network’s Cost Function with regularization of 𝜃s (multinomial classification)

FNN - Learning 𝜃s With Gradient Descent & Backpropagation

Given 1 Training Data (𝑦, 𝑥1, …, 𝑥𝑘)

Given Training Set {(𝑦(1), 𝑥1(1), 𝑥2(1), …, 𝑥𝑘(1)), …, (𝑦(𝑛), 𝑥1(𝑛), 𝑥2(𝑛), …, 𝑥𝑘(𝑛))}

Resources

／var／log marcus chiu

Given 1 Training Data (𝑦, 𝑥₁, …, 𝑥_𝑘)

Given Training Set {(𝑦⁽¹⁾, 𝑥₁⁽¹⁾, 𝑥₂⁽¹⁾, …, 𝑥_𝑘⁽¹⁾), …, (𝑦^(𝑛), 𝑥₁^(𝑛), 𝑥₂^(𝑛), …, 𝑥_𝑘^(𝑛))}