probabilistic inference recap

given a probabilistic model, Probabilistic Inference is answering a probabilistic query over that model. It is also used for the computation/estimation of: distributions, the distribution’s parameters (if it’s a parametric distribution), the distribution’s probabilities, and/or the distribution’s characteristics

syntax definition

Click here to expand...

A graphical model 𝒢 is a tuple 𝒢 = ⟨𝐗, 𝐃, 𝐒, 𝐅, 𝐂⟩ where:

𝐗 = {𝑋₁, …, 𝑋_𝑛} set of ordered variables

𝐃 = {𝐷₁, …, 𝐷_𝑛} set of corresponding domains of each variable 𝑋_𝑖 (e.g. if 𝑋₁is a boolean variable then 𝐷₁= {true, false}). The size of each 𝐷_𝑖corresponds to the cardinality of variable 𝑋_𝑖

𝐒 = {𝑆₁, …, 𝑆_𝑚} set of variable scopes, where each variable scope 𝑆_𝑖is a subset of 𝐗 (i.e. 𝑆_𝑖 ⊆ 𝐗)

𝐅 = {𝐹₁, …, 𝐹_𝑚} set of factors/functions, where each factor/function 𝐹_𝑖 is defined over its corresponding variable scope 𝑆_𝑖and maps any assignment over its scope to a real value

in context of Bayesian Networks, 𝐅 = set of conditional probability tables

in context of Markov Networks, 𝐅 = set of factors

global function - is a function whose scope includes all variables (i.e. 𝑆_𝑖 = 𝐗)

local functions - is a function whose scope is a proper subset of variables (i.e. 𝑆_𝑖⊂ 𝐗)

𝐂 is a set of combination operators which defines how functions are combined. common combination operators are:

summation operator (𝛴)

multiplication operator (𝛱)

AND operator (∧) - for Boolean functions

relational join operator (⨝) - when the functions are relation

marginalization operator - for reasoning queries

max operator - e.g. = argmax_𝑦[ 𝐹_𝑖(𝑥,𝑦) ] = 𝐹_𝑗(𝑥) where 𝐹_𝑗is a new function with scope over variable 𝑥

the set of local functions can be combined in a variety of ways (e.g. combination operators) to generate a new local function or even a global function

Link to original

and let:

𝐐 = 𝑄₁, …, 𝑄_𝑟be the set of query variables

𝐄 = 𝐸₁, …, 𝐸_𝑠be the set of evidence variables

𝐇 = 𝐻₁, …, 𝐻_𝑡be the remainder of the variables(called the hidden variables (non-evidence, non-query variables))

𝐪 = 𝑞₁, …, 𝑞_𝑟be a grounded instantiation of 𝐐

𝐞 = 𝑒₁, …, 𝑒_𝑠be a grounded instantiation of 𝐄

𝐡 = 𝘩₁, …, 𝘩_𝑡be a grounded instantiation of 𝐇

𝑋, 𝐸, and 𝑌 are disjoint exhaustive sets of all variables 𝐗

therefore:

the complete set of variables is 𝐗 = 𝐐 ∪ 𝐄 ∪ 𝐇 = {𝑋₁, …, 𝑋_𝑛} = {𝑄₁, …, 𝑄_𝑟, 𝐸₁, …, 𝐸_𝑠, 𝐻₁, …, 𝐻_𝑡}

the complete set of instantiations is 𝒙 = 𝐪 ∪ 𝐞 ∪ 𝐡 = {𝑥₁, …, 𝑥_𝑛} = {𝑞₁, …, 𝑞_𝑟, 𝑒₁, …, 𝑒_𝑠, ℎ₁, …, ℎ_𝑡}

Link to original

representations

Click here to expand...

Parametric Probability Distribution Models - function/model based

Non-Parametric Probability Distribution Models - instance-based

see also: ML - Parametric vs Non-Parametric

Univariate Probability Distribution Models

Continuous Models (Probability Density Functions)

Discrete Models (Probability Mass Functions)

Multivariate Probability Distribution Models

Probabilistic Graphical Models (PGM)

Multi-Model Probability Distribution

representations of global structures

Probabilistic Graphical Models (PGM)

representations of local structures

representations of normalized distributions:

Conditional Probability Distributions (CPD)

see: CPD Representations

representations of un-normalized distributions:

Potential Functions

Link to original

probability query types

Click here to expand...

product comes from a joint probability 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) being factorized into a product of conditional probabilities

when we take the log of products it becomes sum of factors

Query

Product
Sum

Marginalization𝛴

Maximization𝑎𝑟𝑔𝑚𝑎𝑥

Description

Posterior Conditional Query𝐏(𝐐=𝐪|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)

Belief Updating Query
(a type of posterior conditional query)𝐏(𝑄_𝑖=𝑞_𝑖|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝑄_𝑖=𝑞_𝑖, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)

✔

✔

✘

sum-product problem

sum-log-sums problem

Prior Marginal Query𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]

Probability of Evidence Query𝐏(𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐇=𝐡, 𝐄=𝐞) ]

✔

✔

✘

sum-product problem

sum-log-sums problem

both queries essentially the same where 𝐐 = 𝐄

#P-Hard Problem

Maximum a Posterior (𝑴𝑨𝑷) Query𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]
where: 𝐐∪𝐄⊂𝐗 and 𝐐∪𝐇∪𝐄=𝐗

✔

✔

✔

max-sum-product problem

max-sum-log-sums problem

most probable assignment because they include maximization

NP^PP-Hard Problem

Most Probable Explanation (𝑴𝑷𝑬) Query𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]
where: 𝐐∪𝐄=𝐗

✔

✘

✔

max-product problem

max-log-sums problem

most probable assignment because they include maximization

NP-Hard Problem

more details: Probabilistic Inference - Query／Task Types (Posterior Conditional - Prior Marginal ／ Probability of Evidence - MPE - MAP)

Link to original

resources:

Click here to expand...

Daphne Koller’s Video Lectures

Link to original

Query	Product Sum	Marginalization𝛴	Maximization𝑎𝑟𝑔𝑚𝑎𝑥	Description
Posterior Conditional Query𝐏(𝐐=𝐪\|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞) Belief Updating Query (a type of posterior conditional query)𝐏(𝑄_𝑖=𝑞_𝑖\|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝑄_𝑖=𝑞_𝑖, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)	✔	✔	✘	sum-product problem sum-log-sums problem
Prior Marginal Query𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ] Probability of Evidence Query𝐏(𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐇=𝐡, 𝐄=𝐞) ]	✔	✔	✘	sum-product problem sum-log-sums problem both queries essentially the same where 𝐐 = 𝐄 #P-Hard Problem
Maximum a Posterior (𝑴𝑨𝑷) Query𝑀𝐴𝑃(𝐐=?\|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ] where: 𝐐∪𝐄⊂𝐗 and 𝐐∪𝐇∪𝐄=𝐗	✔	✔	✔	max-sum-product problem max-sum-log-sums problem most probable assignment because they include maximization NP^PP-Hard Problem
Most Probable Explanation (𝑴𝑷𝑬) Query𝑀𝑃𝐸(𝐐=?\|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝐏(𝐐=𝐪, 𝐄=𝐞) ] where: 𝐐∪𝐄=𝐗	✔	✘	✔	max-product problem max-log-sums problem most probable assignment because they include maximization NP-Hard Problem

is a type of exact inference algorithm that naively computes the probability of a probabilistic query
an extension of this is called the Variable Elimination Algorithm where it pushes the summation operator into the factor products

Inference by Enumeration Algorithm - Intuition

answering probabilistic queries often involves the summation/marginalization of full joint probabilities

for example, given a full joint probability 𝐏(𝑎, 𝑏, 𝑐) the computation of a probabilistic query 𝐏(𝑎|𝑏) is as follows

𝐏(𝑎|𝑏) = 𝐏(𝑎,𝑏) / 𝐏(𝑏) # because bayes rule
𝐏(𝑎|𝑏) = [ 𝛴_𝑐∊𝐶 𝐏(𝑎,𝑏,𝑐) ] / [ 𝛴_𝑎∊𝐴𝛴_𝑐∊_𝐶𝐏(𝑎,𝑏,𝑐) ] # because marginal probability rule

however, storing a full joint probability in table-formis not practical because it takes 𝑂(𝑘^𝑛) space, where:

𝑛 is the number of variables
𝑘 is the max cardinality of each variable

to solve this problem, we often encode the full joint probability in a probabilistic graphical model. They require much less space but still retain the ability to represent the full joint probability. The only downside of is that it requires an extra step to obtain the value of the full joint probability

acquiring these values depends on the probabilistic graphical model:

in the context of Bayesian Networks, a full joint probability is the product of conditional probabilities
- 𝑃(𝑋₁, …, 𝑋_𝑛) = 𝛱_{_{𝑋_𝑖∈𝐗}} [ 𝐏(𝑋_𝑖|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋_𝑖)) ]
in the context of Markov Networks, a full joint probability is the product of potential functions
- 𝑃(𝑋₁, …, 𝑋_𝑛) = (1/𝘡) * 𝛱_𝑐∊_𝐶[ 𝜙_𝑐(𝒙_𝑐) ]

order to solve this query 𝐏(𝑎|𝑏) we need to know the value of the full joint probability 𝐏(𝑎,𝑏,𝑐)

Once the probability value is acquired we plug it into the equation, then we continue to enumerate through each summation 𝛴 and each time we calculate the full joint probability value based on the probabilistic graphical model

𝐏(𝑎|𝑏) = [ 𝛴_𝑐∊𝐶 𝐏(𝑎,𝑏,𝑐) ] / [ 𝛴_𝑎∊𝐴𝛴_𝑐∊_𝐶𝐏(𝑎,𝑏,𝑐) ]

the Inference by Enumeration algorithm simply enumerates through each summation 𝛴

Inference by Enumeration in Specific Probabilistic Graphical Models

／var／log marcus chiu

Explorer

Probabilistic Inference - Inference by Enumeration Algorithm

Inference by Enumeration Algorithm - Intuition

Inference by Enumeration in Specific Probabilistic Graphical Models

／var／logmarcus chiu

Explorer

Probabilistic Inference - Inference by Enumeration Algorithm

Inference by Enumeration Algorithm - Intuition

Inference by Enumeration in Specific Probabilistic Graphical Models

／var／log marcus chiu