Long-Short Term Memory (LSTM)
- is a type of gated recurrent neural network aimed to deal with the vanishing gradient problem present in traditional RNNs
- its relative insensitivity to gap length is its advantage over traditional RNNs, hidden Markov models, and other sequence learning methods
LSTM - How it Works
see: LSTM - Understanding LSTM Networks
LSTM - Structure
|
Structure Diagram |
Equations |
Range |
Description |
|---|---|---|---|
|
|
𝑥𝑡 |
∊ℝ𝑛 |
input vector |
|
All 𝑊s |
∊ℝℎx(𝑛+ℎ) |
weight matrices | |
|
All 𝑏s |
∊ℝℎ |
bias vectors | |
|
𝑓𝑡 = 𝜎(𝑊𝑓[ℎ𝑡-1,𝑥𝑡] + 𝑏𝑓) |
∊(0,1)ℎ |
forget gate’s activation vector | |
|
𝑖𝑡 = 𝜎(𝑊𝑖[ℎ𝑡-1,𝑥𝑡] + 𝑏𝑖) |
∊(0,1)ℎ |
input/update gate’s activation vector | |
|
𝐶𝑡 = 𝑡𝑎𝑛ℎ(𝑊𝐶[ℎ𝑡-1,𝑥𝑡] + 𝑏𝐶) |
∊(-1,1)ℎ |
cell input activation vector | |
|
𝑜𝑡 = 𝜎(𝑊𝑜[ℎ𝑡-1,𝑥𝑡] + 𝑏𝑜) |
∊(0,1)ℎ |
output gate’s activation vector | |
|
𝑐𝑡 = 𝑓𝑡⊙ 𝑐𝑡-1 + 𝑖𝑡⊙ 𝐶𝑡-1 |
∊ℝℎ |
cell state vector | |
|
ℎ𝑡 = 𝑜𝑡 ⊙ 𝑡𝑎𝑛ℎ(𝑐𝑡) |
∊(-1,1)ℎ |
output vector |
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/artificial-neural-networks-(ann)/ann---architectures/gated-recurrent-neural-networks-(gated-rnn)/long-short-term-memory-(lstm)/lstm-structure.png)