probabilistic inference recap

given a probabilistic model, Probabilistic Inference is answering a probabilistic query over that model. It is also used for the computation/estimation of: distributions, the distribution’s parameters (if it’s a parametric distribution), the distribution’s probabilities, and/or the distribution’s characteristics

syntax definition

Click here to expand...

A graphical model 𝒢 is a tuple 𝒢 = ⟨𝐗, 𝐃, 𝐒, 𝐅, 𝐂⟩ where:

𝐗 = {𝑋₁, …, 𝑋_𝑛} set of ordered variables

𝐃 = {𝐷₁, …, 𝐷_𝑛} set of corresponding domains of each variable 𝑋_𝑖 (e.g. if 𝑋₁is a boolean variable then 𝐷₁= {true, false}). The size of each 𝐷_𝑖corresponds to the cardinality of variable 𝑋_𝑖

𝐒 = {𝑆₁, …, 𝑆_𝑚} set of variable scopes, where each variable scope 𝑆_𝑖is a subset of 𝐗 (i.e. 𝑆_𝑖 ⊆ 𝐗)

𝐅 = {𝐹₁, …, 𝐹_𝑚} set of factors/functions, where each factor/function 𝐹_𝑖 is defined over its corresponding variable scope 𝑆_𝑖and maps any assignment over its scope to a real value

in context of Bayesian Networks, 𝐅 = set of conditional probability tables

in context of Markov Networks, 𝐅 = set of factors

global function - is a function whose scope includes all variables (i.e. 𝑆_𝑖 = 𝐗)

local functions - is a function whose scope is a proper subset of variables (i.e. 𝑆_𝑖⊂ 𝐗)

𝐂 is a set of combination operators which defines how functions are combined. common combination operators are:

summation operator (𝛴)

multiplication operator (𝛱)

AND operator (∧) - for Boolean functions

relational join operator (⨝) - when the functions are relation

marginalization operator - for reasoning queries

max operator - e.g. = argmax_𝑦[ 𝐹_𝑖(𝑥,𝑦) ] = 𝐹_𝑗(𝑥) where 𝐹_𝑗is a new function with scope over variable 𝑥

the set of local functions can be combined in a variety of ways (e.g. combination operators) to generate a new local function or even a global function

Link to original

and let:

𝐐 = 𝑄₁, …, 𝑄_𝑟be the set of query variables

𝐄 = 𝐸₁, …, 𝐸_𝑠be the set of evidence variables

𝐇 = 𝐻₁, …, 𝐻_𝑡be the remainder of the variables(called the hidden variables (non-evidence, non-query variables))

𝐪 = 𝑞₁, …, 𝑞_𝑟be a grounded instantiation of 𝐐

𝐞 = 𝑒₁, …, 𝑒_𝑠be a grounded instantiation of 𝐄

𝐡 = 𝘩₁, …, 𝘩_𝑡be a grounded instantiation of 𝐇

𝑋, 𝐸, and 𝑌 are disjoint exhaustive sets of all variables 𝐗

therefore:

the complete set of variables is 𝐗 = 𝐐 ∪ 𝐄 ∪ 𝐇 = {𝑋₁, …, 𝑋_𝑛} = {𝑄₁, …, 𝑄_𝑟, 𝐸₁, …, 𝐸_𝑠, 𝐻₁, …, 𝐻_𝑡}

the complete set of instantiations is 𝒙 = 𝐪 ∪ 𝐞 ∪ 𝐡 = {𝑥₁, …, 𝑥_𝑛} = {𝑞₁, …, 𝑞_𝑟, 𝑒₁, …, 𝑒_𝑠, ℎ₁, …, ℎ_𝑡}

Link to original

representations

Click here to expand...

Parametric Probability Distribution Models - function/model based

Non-Parametric Probability Distribution Models - instance-based

see also: ML - Parametric vs Non-Parametric

Univariate Probability Distribution Models

Continuous Models (Probability Density Functions)

Discrete Models (Probability Mass Functions)

Multivariate Probability Distribution Models

Probabilistic Graphical Models (PGM)

Multi-Model Probability Distribution

representations of global structures

Probabilistic Graphical Models (PGM)

representations of local structures

representations of normalized distributions:

Conditional Probability Distributions (CPD)

see: CPD Representations

representations of un-normalized distributions:

Potential Functions

Link to original

probability query types

Click here to expand...

product comes from a joint probability 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) being factorized into a product of conditional probabilities

when we take the log of products it becomes sum of factors

Query

Product
Sum

Marginalization𝛴

Maximization𝑎𝑟𝑔𝑚𝑎𝑥

Description

Posterior Conditional Query𝐏(𝐐=𝐪|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)

Belief Updating Query
(a type of posterior conditional query)𝐏(𝑄_𝑖=𝑞_𝑖|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝑄_𝑖=𝑞_𝑖, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)

✔

✔

✘

sum-product problem

sum-log-sums problem

Prior Marginal Query𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]

Probability of Evidence Query𝐏(𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐇=𝐡, 𝐄=𝐞) ]

✔

✔

✘

sum-product problem

sum-log-sums problem

both queries essentially the same where 𝐐 = 𝐄

#P-Hard Problem

Maximum a Posterior (𝑴𝑨𝑷) Query𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]
where: 𝐐∪𝐄⊂𝐗 and 𝐐∪𝐇∪𝐄=𝐗

✔

✔

✔

max-sum-product problem

max-sum-log-sums problem

most probable assignment because they include maximization

NP^PP-Hard Problem

Most Probable Explanation (𝑴𝑷𝑬) Query𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]
where: 𝐐∪𝐄=𝐗

✔

✘

✔

max-product problem

max-log-sums problem

most probable assignment because they include maximization

NP-Hard Problem

more details: Probabilistic Inference - Query／Task Types (Posterior Conditional - Prior Marginal ／ Probability of Evidence - MPE - MAP)

Link to original

resources:

Click here to expand...

Daphne Koller’s Video Lectures

Link to original

Query	Product Sum	Marginalization𝛴	Maximization𝑎𝑟𝑔𝑚𝑎𝑥	Description
Posterior Conditional Query𝐏(𝐐=𝐪\|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞) Belief Updating Query (a type of posterior conditional query)𝐏(𝑄_𝑖=𝑞_𝑖\|𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝑄_𝑖=𝑞_𝑖, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)	✔	✔	✘	sum-product problem sum-log-sums problem
Prior Marginal Query𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ] Probability of Evidence Query𝐏(𝐄=𝐞) = 𝛴_𝐡∊𝐇[ 𝐏(𝐇=𝐡, 𝐄=𝐞) ]	✔	✔	✘	sum-product problem sum-log-sums problem both queries essentially the same where 𝐐 = 𝐄 #P-Hard Problem
Maximum a Posterior (𝑴𝑨𝑷) Query𝑀𝐴𝑃(𝐐=?\|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ] where: 𝐐∪𝐄⊂𝐗 and 𝐐∪𝐇∪𝐄=𝐗	✔	✔	✔	max-sum-product problem max-sum-log-sums problem most probable assignment because they include maximization NP^PP-Hard Problem
Most Probable Explanation (𝑴𝑷𝑬) Query𝑀𝑃𝐸(𝐐=?\|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪[ 𝐏(𝐐=𝐪, 𝐄=𝐞) ] where: 𝐐∪𝐄=𝐗	✔	✘	✔	max-product problem max-log-sums problem most probable assignment because they include maximization NP-Hard Problem

Bucket Elimination (BE) Algorithm - Variable Elimination (VE) Algorithm

a type of exact probabilistic inference algorithm
is an extension of the Inference by Enumeration algorithm that pushes the summation operators into the factor product and eliminates variable summation operators by generating new factors (which makes it a type of dynamic programming)
induces a cluster tree on which VE Algorithm is performed, thus it is a type of specialized belief propagation algorithm over trees
each run of BE/VE only solves one probabilistic query such as 𝐏(𝑄|𝐄=𝐞) at a time. Extensions of BE/VE include:
- Bucket Tree Elimination (BTE) Algorithm - used to solve 𝐏(𝑄|𝐄=𝐞) for all 𝑄∊{all variables}

BE/VE Algorithm - Solving Probabilistic Queries

Click here to expand...

BE/VE Algorithm - Operator Types

Click here to expand...

𝛱 product-combination - merges 2 factors into 1 factor

Click here to expand...

𝐹(𝐴,𝐵)𝐹(𝐵,𝐶) = 𝐹(𝐴,𝐵,𝐶)

𝛴 sum-marginalization - “sum out” a variable over ALL values in that variable’s domain

Click here to expand...

𝛴_𝑐∊𝐶𝐹(𝐴,𝐵,𝐶=𝑐) = 𝐹(𝐴,𝐵)

𝑚𝑎𝑥 max-marginalization - “max-out” a variable over ALL values in that variable’s domain

Click here to expand...

𝑚𝑎𝑥_𝑐∊𝐶𝐹(𝐴,𝐵,𝐶=𝑐) = 𝐹(𝐴,𝐵)

𝛴 evidence-instantiation/reduction - over SINGLE value in variable domain (used for variable instantiation when given evidence)

Click here to expand...

𝐹(𝐴,𝐵,𝐶=𝑐1) = 𝐹(𝐴,𝐵)

Link to original

VE Solving Prior Query - Probability of Evidence Query: 𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]

Click here to expand... probabilistic queries often involves the summation/marginalization of full joint probabilities

for example, given a Gibbs Distribution of a full joint probability 𝐏(𝐴,𝐵,𝐶), the computation of a probabilistic query 𝐏(𝐴,𝐵) is as follows

𝐏(𝐴,𝐵) = 𝛴_𝑐∊𝐶 𝐏(𝐴,𝐵,𝐶) # because marginal probability rule

however, storing a full joint probability in table-formis not practical because it takes 𝑂(𝑘^𝑛) space, where:

𝑛 is the number of variable

𝑘 is the max cardinality of each variable

to solve this problem, we often encode the full joint probability into a probabilistic graphical model. They require much less space but they still retain the ability to represent the full joint probability.

When using a probabilistic graphical model the full joint probability is FACTORIZED

how it is FACTORIZED depends on the type of model used:

for Bayesian Networks, a full joint probability is FACTORIZED into a product of conditional probabilities

𝑃(𝑋₁, …, 𝑋_𝑛) = 𝛱_{_{𝑋_𝑖∈𝐗}} [ 𝐏(𝑋_𝑖|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋_𝑖)) ]

for Markov Networks, a full joint probability is FACTORIZED into a product of potential functions

𝑃(𝑋₁, …, 𝑋_𝑛) = 𝛱_𝑐∊_𝐶[ (1/𝘡) * 𝜙_𝑐(𝒙_𝑐) ]

Let’s say the full joint probability 𝐏(𝐴,𝐵,𝐶) is FACTORIZED as so

𝐏(𝐴,𝐵,𝐶) = 𝐹₁(𝐴,𝐵)𝐹₂(𝐴,𝐶)𝐹₃(𝐵,𝐶)

therefore the query becomes

𝐏(𝐴,𝐵) = 𝛴_𝑐∊𝐶 𝐹₁(𝐴,𝐵)𝐹₂(𝐴,𝐶)𝐹₃(𝐵,𝐶)

𝐏(𝐴,𝐵) = 𝐹₁(𝐴,𝐵) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐵,𝐶)

At this point, everything we have done so far is the same as inference by enumeration. Inference by Enumeration takes the resulting formula and enumerates through each summation one-by-one. In Variable Elimination we exploit the fact that many of these computations are repetitions and instead of computing them from scratch every time we systematically order it these computations (similar to Dynamic Programming). Variable Elimination have the following steps below:

move all irrelevant terms outside of the innermost summation 𝛴_𝑖∊𝐼 (this includes the new terms created in step 2/3)

compute the innermost summation 𝛴_𝑖∊𝐼and replace it with the resulting new factor 𝑚_𝑖(all terms within summation minus term 𝑖) the computation is a 2 step process:

combination step - compute product 𝛱 of all factors within innermost summation (see example in Probabilistic Inference - Bucket Elimination (BE) - Variable Elimination (VE))

marginalization step - compute summation 𝛴 of the resulting product (see example in Probabilistic Inference - Bucket Elimination (BE) - Variable Elimination (VE)) this is the new factor

repeat steps 1 to 2 until all summations 𝛴_𝑖∊𝐼are replaced

for example,

𝐏(𝐴,𝐵) = 𝐹₁(𝐴,𝐵) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐵,𝐶)

𝐏(𝐴,𝐵) = 𝐹₁(𝐴,𝐵) 𝑚_𝑐(𝐴,𝐵)

and that’s basically it!

NOTE: the elimination algorithm has no benefit if the innermost term includes all variables, that is, 𝑋_𝑖 is dependent on all the other variables (e.g. 𝐏(𝑋_𝑖|𝑎𝑙𝑙-𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠-𝑚𝑖𝑛𝑢𝑠-𝑋_𝑖)). However, in most problems, the number of variables in the innermost term is less than the total number of variables

answering

VE Solving Belief Updating Query: 𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = [𝛴_𝐡∊𝐇 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

Click here to expand... 𝐏(𝐴|𝐵=𝑏), given a Gibbs Distribution of the full joint probability 𝐏(𝐴,𝐵,𝐶)?

steps are similar to computing Prior Probabilistic Query except there is an additional step that instantiates the given evidence variables as shown below in step 5

𝐏(𝐴|𝐵=𝑏) = 𝐏(𝐴,𝐵) / 𝐏(𝐵) # because Bayes Rule

𝐏(𝐴|𝐵=𝑏) = [ 𝛴_𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] / [ 𝛴_𝑎∊𝐴𝛴_𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] # because marginal probability rule

𝐏(𝐴|𝐵=𝑏) = [ 𝛴_𝑐∊𝐶𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] / [ 𝛴_𝑎∊𝐴𝛴_𝑐∊𝐶𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] # factorize full joint probability

𝐏(𝐴|𝐵=𝑏) = [ 𝐹₁(𝐴,𝐵=𝑏) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] / [ 𝛴_𝑎∊𝐴𝐹₁(𝐴,𝐵=𝑏) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ]# move all irrelevant terms as far left

𝐏(𝐴|𝐵=𝑏) = [ 𝐹₁(𝐴) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] / [ 𝛴_𝑎∊𝐴𝐹₁(𝐴) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] # instantiate given evidences

𝐏(𝐴|𝐵=𝑏) = [ 𝐹₁(𝐴) 𝑚_𝑐(𝐴) ] / [ 𝛴_𝑎∊𝐴𝐹₁(𝐴) 𝑚_𝑐(𝐴) ] # compute new term 𝑚_𝑐 for the innermost summation

𝐏(𝐴|𝐵=𝑏) = [ 𝐹₁(𝐴) 𝑚_𝑐(𝐴) ] / [ 𝑚_𝑎() ] # compute new term 𝑚_𝑎 for the innermost summation

What if we want to compute a conditional probabilistic query

VE Solving Conditional Posterior Query: 𝐏(𝐐=𝐪|𝐄=𝐞) = [𝛴_𝐡∊𝐇 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

Click here to expand... 𝐏(𝐴,𝐶|𝐵=𝑏), given a Gibbs Distribution of the full joint probability 𝐏(𝐴,𝐵,𝐶)?

𝐏(𝐴,𝐶|𝐵=𝑏) = 𝐏(𝐴,𝐵,𝐶) / 𝐏(𝐵) # because Bayes Rule

𝐏(𝐴,𝐶|𝐵=𝑏) = [𝐏(𝐴,𝐵=𝑏,𝐶) ] / [ 𝛴_𝑎∊𝐴𝛴_𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] # because marginal probability rule

𝐏(𝐴,𝐶|𝐵=𝑏) = [𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] / [ 𝛴_𝑎∊𝐴𝛴_𝑐∊𝐶𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] # factorize full joint probability

𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] / [ 𝛴_𝑎∊𝐴𝐹₁(𝐴,𝐵=𝑏) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ]# move all irrelevant terms as far left

𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹₁(𝐴)𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] / [ 𝛴_𝑎∊𝐴𝐹₁(𝐴) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] # instantiate given evidences

𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹₁(𝐴)𝐹₂(𝐴,𝐶)𝐹₃(𝐶)] / [ 𝛴_𝑎∊𝐴𝐹₁(𝐴) 𝑚_𝑐(𝐴) ] # compute new term 𝑚_𝑐 for the innermost summation

𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹₁(𝐴)𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] / [ 𝑚_𝑎() ] # compute new term 𝑚_𝑎 for the innermost summation

etc

What if we want to compute a conditional probabilistic query

VE Solving MPE Query: 𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪 [ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]

Click here to expand... MPE seeks an assignment to ALL variables that has the maximal probability given evidence

What if we want to compute 𝑀𝑃𝐸(𝑎𝑙𝑙-𝑛𝑜𝑛-𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒-𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠|𝐵=𝑏), given a Gibbs Distribution of the full joint probability𝐏(𝐴,𝐵,𝐶)?

steps are similar as computing Belief Updating Query, except instead of sum-marginalize operators (𝛴) we use sum-marginalize operator (𝑚𝑎𝑥)

if a function/factor is a constant, we have 2 options:

when computing the MAP value/probability: we place it directly in the first bucket

when computing just the MAP tuple: we can remove it

Example

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴𝑚𝑎𝑥_𝐶𝐏(𝐴,𝐶|𝐵=𝑏)

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴𝑚𝑎𝑥_𝐶𝐏(𝐴,𝐵=𝑏,𝐶) / 𝐏(𝐵=𝑏) # because Bayes Rule

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 𝑚𝑎𝑥_𝐶𝐏(𝐴,𝐵=𝑏,𝐶)# because 𝐏(𝐵=𝑏) is constant

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 𝑚𝑎𝑥_𝐶[ 𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] # since 𝐏(𝐴,𝐵=𝑏,𝐶) is a Gibbs Distirbution, factorize full joint probability

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 𝑚𝑎𝑥_𝐶[ 𝐹₁(𝐴)𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] # instantiate given evidences

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴[ 𝐹₁(𝐴) 𝑚𝑎𝑥_𝐶 [ 𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] ] # move all irrelevant terms as far left

𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴[ 𝐹₁(𝐴) 𝑚_𝑐(𝐴) ] # compute new term 𝑚_𝑐 for the innermost summation

etc

VE Solving MAP Query: 𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪 [ 𝛴_𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]

Click here to expand... MAP seeks an assignment to a SUBSET of variables that has the maximal probability given evidence

What if we want to compute 𝑀𝐴𝑃(𝐴|𝐵=𝑏), given a Gibbs Distribution of the full joint probability𝐏(𝐴,𝐵,𝐶)?

is a combination of VE Algorithm - Solving Conditional Posterior Query and VE Algorithm - Solving MPE Query:

sum-marginalize operators (𝛴) eliminates non-MAP variables

max-marginalize operators (𝑚𝑎𝑥) eliminates MAP variables

if a function/factor is a constant, we have 2 options:

when computing the MAP value/probability: we place it directly in the first bucket

when computing just the MAP tuple: we can remove it

Example

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴𝐏(𝐴|𝐵=𝑏)

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴𝐏(𝐴,𝐵=𝑏) / 𝐏(𝐵=𝑏) # because Bayes Rule

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴𝐏(𝐴,𝐵=𝑏)# because 𝐏(𝐵=𝑏) is constant

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 [ 𝛴_𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] # becausemarginal probability rule

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 [ 𝛴_𝑐∊𝐶𝐹₁(𝐴,𝐵=𝑏)𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ] # since 𝐏(𝐴,𝐵=𝑏,𝐶) is a Gibbs Distirbution, factorize full joint probability

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 [ 𝐹₁(𝐴,𝐵=𝑏) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐵=𝑏,𝐶) ]# move all irrelevant terms as far left

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 [ 𝐹₁(𝐴) 𝛴_𝑐∊𝐶𝐹₂(𝐴,𝐶)𝐹₃(𝐶) ] # instantiate given evidences

𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥_𝐴 [ 𝐹₁(𝐴) 𝑚_𝑐(𝐴) ] # compute new term 𝑚_𝑐 for the innermost summation

BE/VE Algorithm - Choosing Elimination Orderings

Click here to expand...

The algorithm with variable elimination order 𝐈 has exponential time complexity based on:

treewidth of the clique tree = (size of the largest clique) - 1

A “good” variable elimination order will make the treewidth small. Its definition is one less than the smallest achievable value of the cardinality of the largest elimination clique, ranging over all possible elimination ordering. However, finding such k as well as the “best” elimination ordering is NP-hard. As such there are heuristics one may follow to better optimize performance by order:

see: Depth - Induced-Width - Tree-Width)

BE/VE Algorithm - Complexity

Click here to expand...

see: VE Algorithm - Complexity (Non-MAP vs MAP)

BE/VE Algorithm - In Specific Probabilistic Graphical Models

BE/VE Algorithm - Variants

Propagation - Collect & Distribute Algorithm
Bucket Elimination - computes an upper-bound estimation in a much faster way by bounding the size of buckets

Other Resources

https://ermongroup.github.io/cs228-notes/inference/ve/

／var／log marcus chiu

Explorer

Probabilistic Inference - Bucket Elimination (BE) - Variable Elimination (VE)

BE/VE Algorithm - Solving Probabilistic Queries

BE/VE Algorithm - Operator Types

VE Solving Prior Query - Probability of Evidence Query: 𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]

VE Solving Belief Updating Query: 𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = [𝛴_𝐡∊𝐇 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

VE Solving Conditional Posterior Query: 𝐏(𝐐=𝐪|𝐄=𝐞) = [𝛴_𝐡∊𝐇 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

VE Solving MPE Query: 𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪 [ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]

Example

VE Solving MAP Query: 𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪 [ 𝛴_𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]

Example

BE/VE Algorithm - Choosing Elimination Orderings

BE/VE Algorithm - Complexity

BE/VE Algorithm - In Specific Probabilistic Graphical Models

BE/VE Algorithm - Variants

Other Resources

／var／logmarcus chiu

Explorer

Probabilistic Inference - Bucket Elimination (BE) - Variable Elimination (VE)

BE/VE Algorithm - Solving Probabilistic Queries

BE/VE Algorithm - Operator Types

VE Solving Prior Query - Probability of Evidence Query: 𝐏(𝐐=𝐪) = 𝛴𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]

VE Solving Belief Updating Query: 𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = [𝛴𝐡∊𝐇 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

VE Solving Conditional Posterior Query: 𝐏(𝐐=𝐪|𝐄=𝐞) = [𝛴𝐡∊𝐇 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

VE Solving MPE Query: 𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪 [ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]

Example

VE Solving MAP Query: 𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪 [ 𝛴𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]

Example

BE/VE Algorithm - Choosing Elimination Orderings

BE/VE Algorithm - Complexity

BE/VE Algorithm - In Specific Probabilistic Graphical Models

BE/VE Algorithm - Variants

Other Resources

／var／log marcus chiu

VE Solving Prior Query - Probability of Evidence Query: 𝐏(𝐐=𝐪) = 𝛴_𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]

VE Solving Belief Updating Query: 𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = [𝛴_𝐡∊𝐇 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

VE Solving Conditional Posterior Query: 𝐏(𝐐=𝐪|𝐄=𝐞) = [𝛴_𝐡∊𝐇 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)

VE Solving MPE Query: 𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪 [ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]

VE Solving MAP Query: 𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥_𝐪 [ 𝛴_𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]