Information Retrieval (IR) retrieves relevant facts from unstructured data that are NOT specified in advance
Information Extraction (IE) extracting relevant facts from unstructured data that are specified in advance

both IE and IR are subtasks of Natural Language Processing (NLP) - Computational Linguistics

IR/IE - Other

an implementation of Feature Extraction that extracts specific Features (i.e. entities, relations, events, etc) from text

IR/IE - Model Types

To effectively retrieve relevant documents, the documents are typically transformed into a suitable representation. Each retrieval strategy incorporates a specific model for its document representation purposes. The picture on the right illustrates the relationship between some common models.

models are categorized according to two dimensions:

Mathematical Basis
Properties of the Model

Dimension #1 - Mathematical Basis

Basis Type	Description	Example Models
Set-Theoretic Models	represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets	Standard Boolean model Extended Boolean model Fuzzy Retrieval
Algebraic Models	represent documents and queries usually as vectors, matrices, or tuples. The similarity of the query vector and document vector is represented as a scalar value	Vector Space Model Generalized Vector Space Model (Enhanced) Topic-Based Vector Space Model Extended Boolean Model Latent Semantic Indexing a.k.a. Latent Semantic Analysis
Probabilistic Models	treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query. Probabilistic theorems like Bayes’ Theorem are often used in these models	Binary Independence Model Probabilistic Relevance Model on which is based the okapi (BM25) relevance function Uncertain Inference Language Models Divergence-From-Randomness Model Latent Dirichlet Allocation
Feature-based Retrieval Models	View documents as vectors of values of feature functions and seek the best way to combine these features into a single relevance score, typically by learning to rank methods. Feature functions are arbitrary functions of document and query, and as such can easily incorporate almost any other retrieval model as just another feature

Dimension #2 - Properties of the Model

Properties of the Model	Description
Models without Term-Interdependencies	treat different terms/words as independent. This fact is usually represented in vector space models by the orthogonality assumption of term vectors or in probabilistic models by an independency assumption for term variables
Models with Immanent Term Interdependencies	allow a representation of interdependencies between terms. However, the degree of interdependency between two terms is defined by the model itself. It is usually directly or indirectly derived (e.g. by dimensional reduction) from the co-occurrence of those terms in the whole set of documents
Models with Transcendent Term Interdependencies	allow a representation of interdependencies between terms, but they do not allege how the interdependency between two terms is defined. They rely on an external source for the degree of interdependency between two terms. (For example, human or sophisticated algorithms)

／var／log marcus chiu

Explorer

Information Retrieval (IR) - Information Extraction (IE)

IR/IE - Other

IR/IE - Model Types

Dimension #1 - Mathematical Basis

Dimension #2 - Properties of the Model

／var／logmarcus chiu

Explorer

Information Retrieval (IR) - Information Extraction (IE)

IR/IE - Other

IR/IE - Model Types

Dimension #1 - Mathematical Basis

Dimension #2 - Properties of the Model

／var／log marcus chiu