Entity Relation Extraction (RE) - Relation Extraction (RE)
  • is the task of extracting semantic relationships from unstructured text, these relationships:
    • occur between two or more entities of a certain type (e.g. Person, Organisation, Location)
    • fall into a number of semantic categories (e.g. married to, employed by, lives in)
  • usually done after entities are extracted i.e. Named Entity Recognition (NER)
  • a subfield of Information Extraction (IE) which itself is a subfield of Natural Language Processing (NLP)

RE - Progress

RE - 2 Steps/Tasks

given two or more entities:

  1. determine whether a relation exists between them
  2. determine the type of relation between them

RE - Methods

RE - Possible Features to Consider

Possible Feature Candidates

<e1> American Airlines </e1> , a unit of AMR, immediately matched the move, spokesman <e2> Tim Wagner </e2> said

  • entity-based features
    • entity types
    • entity head
    • concatenation of entity types
    • entity level (e.g. NAME, NOMINAL, PRONOUN, etc)
  • word-based features
    • between-entity bag of words
    • word(s) before entity
    • word(s) after entity
  • syntactic features
    • constituent path
    • base syntactic chunk path
    • typed dependency path
  • entity-based features
    • entity types: {e1=ORG, e2=PERSON}
    • entity head: {e1=airlines, e2=Wagner}
    • concatenation of entity types: ORG-PERSON
    • entity level: {e1=NAME, e2=NAME}
  • word-based features
    • between-entity bag of words: {a, unit, of AMR, immediately, matched, the, move, spokesman}
    • word(s) before entity: {e1=NONE, e2=spokesman}
    • word(s) after entity: {e1=a, e2=said}
  • syntactic features
    • constituent path: NP↑NP↑S↑S↓NP
    • base syntactic chunk path: NP→NP→PP→NP→VP→NP→NP
    • typed dependency path: Airlines ←𝑠𝑢𝑏𝑗 matched ←𝑐𝑜𝑚𝑝 said →𝑠𝑢𝑏𝑗 Wagner