Natural Language Processing (NLP) - Computational Linguistics
  • other names include Speech & Language Processing, Human Language Technology, and Speech Recognition & Synthesis
  • is a subfield of linguistics and artificial intelligence that is concerned with the computer’s ability to read, understand and derive meaning from natural languages
  • is an interdisciplinary field concerned with the computational modeling of natural language

NLP - Tutorials

NLP - Subpages

NLP - System Types

  • rule-based vs statistical -
  • manual vs automatic -

NLP - Tasks

  • Audio Related
    • Speech Recognition - speech to text
      • Speech Segmentation - the task of separating speech into smaller units
    • Speech Synthesis - text to speech
  • Visual RelatedComputer Vision (CV)
    • Optical Character Recognition (OCR) - image to text
    • Text-to-Image - text to image
  • Grammar Induction - generate a formal grammar that describes a language’s syntax
  • Segmentation/Tokenizer:
    • Sentence Segmentation (Sentence Boundary Disambiguation) - task of separating a body of text into sentences
    • Tokenization (Word Segmentation) - process of breaking a body of text into tokens (e.g. words and/or phrases)
    • Morphological Segmentation - the task of separating words into individual morphemes and identifying the classes of morphemes
  • Normalization - process of normalizing a token (e.g. U.S.A to USA?)
    • Lower/Upper Casing -
    • Stemming - the task of reducing inflected/derived words to their root form (removing affixes) (e.g. automates automatic automation → automat)
      • porter’s algorithm - the most common english stemmer
    • Lemmatization - the task of removing inflectional words and return the lemma (base dictionary form of a word) and grouping together different forms of the same word (e.g. am are is → be | car cars car’s cars’ → car)
      • also takes into consideration the context of the word in order to solve other problems like disambiguation
  • Part-of-Speech (PoS) Tagging - the task of determining the Part of Speech (PoS) for each word in a sentence
  • Syntactic Parsing - is a method of syntactic analysis of a sentence (e.g. the task of determining the parse-tree of a given sentence)
    • Constituency Parsing - focuses on building out parse-tree of constituents
    • Dependency Parsing - focuses on the relationships between words in a sentence (e.g. marking words like primary-objects and predicates)
  • Word/Phrase Semantics:
    • Morphology - components of words that carry meanings aside from actual definition of word (e.g. singular vs plural)
    • Lexical Semantics - meaning of individual words (in context)
    • Compositional Semantics - meaning of phrases/groups of words (e.g. distinction between Western Europe and Eastern Europe)
  • Distribution Semantics - theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data
  • Machine Translation - task of translating a document from one language to another
  • Information Extraction - the task of extracting information (e.g. entities, relations, events, temporal, etc) from a body of text
    • Named Entity Recognition - the task of determining proper names in a body of text
    • Relationship Extraction - the task of identifying the relationships among entities in a body of text (e.g. who is married to whom)
    • etc
    • Textual Entailment Recognition - given 2 text fragments, determine if one being true (either):
      • entails the other
      • entails the other’s negation
      • allows the other to be true or false
  • Text Classification -
    • Sentiment Analysis - the task of determining the sentiment of a body of text or a word
      • positive, neutral, or negative
      • emotion (happy, sad, angry, etc)
      • etc
    • Topic Segmentation - the task of separating a body of text into segments each of which are devoted to a topic
    • Topic Recognition/Labeling - the task of identifying the topic of text
    • Language Detection - determining the language of the text
    • Intent Detection - determining the underlying goal/intent of a given text
    • Sentence Type Identification -
      • request/command - e.g. open the door
      • statement - e.g. the door is open
      • question - e.g. is the door open?
  • Disambiguating Ambiguity - the task of disambiguating the ambiguous nature of human language
  • Automatic Summarization - the task of producing a summary of a body of text
  • Referring Expressions Detection - a more general task of coreference resolution. the task of identifying “bridging relationships”. (e.g. “he enter the house through the front door” the front door is a referring expression and the bridging relationship to be identified is the fact that the door is of John’s house)
    • Co-Reference Resolution - the task of determining which words (“mentions”) refer to the same objects (“entities”). makes use of knowledge about how words like that or pronouns like it or she refer to previous parts of the discourse
      • Anaphora Resolution - a specific type of coreference resolution concerned with matching up pronouns with the nouns or name-entities to which they refer
  • Question-Answering - given question, determine the meaning of words, then determine the answer. (see Search Engines - Types)
  • Conversational Agents or Dialogue Systems - superset of question-answering. computer programs that are able to converse with humans in natural language
  • Discourse Analysis - a number of tasks:
    • identifying the discourse structure of connected text
    • recognizing and classifying speech-acts in text (e.g. yes-no question, content question, statement, assertion, etc)