Causal Language Modeling (CLM)

  • is an autoregressive method where the model is trained to predict the next token in a sequence given the previous tokens
  • is used in models like GPT-2 and GPT-3
  • is well-suited for tasks such as text generation and summarization
  • however, CLM models have unidirectional context, meaning they only consider the past and not the future context when generating predictions.

Masked Language Modeling (MLM)

  • is a training method used in models like BERT, where some tokens in the input sequence are masked, and the model learns to predict the masked tokens based on the surrounding context
  • has the advantage of bidirectional context, allowing the model to consider both past and future tokens when making predictions
  • useful for tasks like text classification, sentiment analysis, and named entity recognition

Sequence-to-Sequence (Seq2Seq)

  • consist of an encoder-decoder architecture, where the encoder processes the input sequence and the decoder generates the output sequence
  • is commonly used in tasks like machine translation, summarization, and question-answering
  • seq2seq models can handle more complex tasks that involve input-output transformations, making them versatile for a wide range of NLP tasks

Resources