Transformer Neural Networks (TNN) - Transformers
- is a deep learning model introduced in 2017, used primarily in the field of natural language processing
- like Recurrent Neural Networks (RNNs) and Gated RNNs (e.g. LSTM & GRU), Transformers are designed to handle sequential data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, Transformers do not require that the sequential data be processed in order. For example, if the input data is a natural language sentence, the Transformer does not need to process the beginning of it before the end. Due to this feature, the Transformer allows for much more parallelization than RNNs and therefore reduces training times
- is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence aligned RNNs or convolution
How it Works - TL;DR
- 3Blue1Brown - Attention in Transformers
- 3Blue1Brown - Storing Facts in Multi-Layer Perceptron
- StatQuest - Transformer Neural Networks
- StatQuest - Decoder-Only Transformers
Transformer architecture outline:
input → word embeddings + positional encodings → [multi-head self-attention block → multi-layer perceptron]*96 → word un-embedding
First self-attention block architecture:
|
|
|
MLP architecture:
|
|
Each token that has gone through the previous attention block will go through the following steps:
|
Transformer - Timeline of Models
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/artificial-neural-networks-(ann)/ann---architectures/transformer-neural-networks-(tnn)---transformers/transformers-timeline-of-models.png)
Subpages
Resources
- Attention Is All You Need.pdf - 2017 white paper
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/artificial-neural-networks-(ann)/ann---architectures/transformer-neural-networks-(tnn)---transformers/transformer-encoder.png)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/artificial-neural-networks-(ann)/ann---architectures/transformer-neural-networks-(tnn)---transformers/transformer-mlp.png)