/var/logmarcus chiu

/var/log

❯

Computer

❯

Artificial Intelligence (AI) - Cognitive Computing - Machine Intelligence

❯

AI - Subfields

❯

Natural Language Processing (NLP) - Computational Linguistics

Tokenizer - Tokenization Algorithm

Created on Oct 11, 2025

Tokenizer - Tokenization Algorithm
  • is the process of converting raw text into smaller units called tokens, which can then be processed by a machine learning model

Subpages

  • Character-Based Tokenizers
  • Subword-Based Tokenizers
  • Word-Based Tokenizers

Resources

  • https://huggingface.co/docs/transformers/en/tokenizer_summary