Tokenizer - Tokenization Algorithm is the process of converting raw text into smaller units called tokens, which can then be processed by a machine learning model Subpages Character-Based Tokenizers Subword-Based Tokenizers Word-Based Tokenizers Resources https://huggingface.co/docs/transformers/en/tokenizer_summary