- Information Retrieval (IR) retrieves relevant facts from unstructured data that are NOT specified in advance
- Information Extraction (IE) extracting relevant facts from unstructured data that are specified in advance
both IE and IR are subtasks of Natural Language Processing (NLP) - Computational Linguistics
IR/IE - Other
an implementation of Feature Extraction that extracts specific Features (i.e. entities, relations, events, etc) from text
- (Entity - Named Entity) Recognition/Identification/Chunking/Extraction/Resolution (NER)
- Coherence (Relation) Extraction
- Coreference/Co-Reference Extraction
- Entity Relation Extraction (RE)
- Event Extraction
- Feature Conversion - Text Embeddings/Embedding
- Numerical Expression Recognition
- Referring Expression Extraction
- Semantic Relation Extraction
- Temporal Expression Recognition - Temporal Analysis
IR/IE - Model Types
|
To effectively retrieve relevant documents, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. The picture on the right illustrates the relationship between some common models. models are categorized according to two dimensions:
|
|
Dimension #1 - Mathematical Basis
|
Basis Type |
Description |
Example Models |
|---|---|---|
|
Set-Theoretic Models |
represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets | |
|
Algebraic Models |
represent documents and queries usually as vectors, matrices, or tuples. The similarity of the query vector and document vector is represented as a scalar value | |
|
Probabilistic Models |
treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query. Probabilistic theorems like Bayes’ Theorem are often used in these models |
|
|
Feature-based Retrieval Models |
View documents as vectors of values of feature functions and seek the best way to combine these features into a single relevance score, typically by learning to rank methods. Feature functions are arbitrary functions of document and query, and as such can easily incorporate almost any other retrieval model as just another feature |
Dimension #2 - Properties of the Model
|
Properties of the Model |
Description |
|---|---|
|
Models without Term-Interdependencies |
treat different terms/words as independent. This fact is usually represented in vector space models by the orthogonality assumption of term vectors or in probabilistic models by an independency assumption for term variables |
|
Models with Immanent Term Interdependencies |
allow a representation of interdependencies between terms. However, the degree of interdependency between two terms is defined by the model itself. It is usually directly or indirectly derived (e.g. by dimensional reduction) from the co-occurrence of those terms in the whole set of documents |
|
Models with Transcendent Term Interdependencies |
allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined. They rely on an external source for the degree of interdependency between two terms. (For example, human or sophisticated algorithms) |
---cognitive-computing---machine-intelligence/ai---subfields/natural-language-processing-(nlp)---computational-linguistics/information-retrieval-(ir)---information-extraction-(ie)/information-retrieval-models.png)