probabilistic inference recap
Link to originalgiven a probabilistic model, Probabilistic Inference is answering a probabilistic query over that model. It is also used for the computation/estimation of: distributions, the distribution’s parameters (if it’s a parametric distribution), the distribution’s probabilities, and/or the distribution’s characteristics
Click here to expand...
Link to originalLink to originalA graphical model 𝒢 is a tuple 𝒢 = ⟨𝐗, 𝐃, 𝐒, 𝐅, 𝐂⟩ where:
- 𝐗 = {𝑋1, …, 𝑋𝑛} set of ordered variables
- 𝐃 = {𝐷1, …, 𝐷𝑛} set of corresponding domains of each variable 𝑋𝑖 (e.g. if 𝑋1is a boolean variable then 𝐷1= {true, false}). The size of each 𝐷𝑖corresponds to the cardinality of variable 𝑋𝑖
- 𝐒 = {𝑆1, …, 𝑆𝑚} set of variable scopes, where each variable scope 𝑆𝑖is a subset of 𝐗 (i.e. 𝑆𝑖 ⊆ 𝐗)
- 𝐅 = {𝐹1, …, 𝐹𝑚} set of factors/functions, where each factor/function 𝐹𝑖 is defined over its corresponding variable scope 𝑆𝑖and maps any assignment over its scope to a real value
- in context of Bayesian Networks, 𝐅 = set of conditional probability tables
- in context of Markov Networks, 𝐅 = set of factors
- global function - is a function whose scope includes all variables (i.e. 𝑆𝑖 = 𝐗)
- local functions - is a function whose scope is a proper subset of variables (i.e. 𝑆𝑖⊂ 𝐗)
- 𝐂 is a set of combination operators which defines how functions are combined. common combination operators are:
- summation operator (𝛴)
- multiplication operator (𝛱)
- AND operator (∧) - for Boolean functions
- relational join operator (⨝) - when the functions are relation
- marginalization operator - for reasoning queries
- max operator - e.g. = argmax𝑦[ 𝐹𝑖(𝑥,𝑦) ] = 𝐹𝑗(𝑥) where 𝐹𝑗is a new function with scope over variable 𝑥
the set of local functions can be combined in a variety of ways (e.g. combination operators) to generate a new local function or even a global function
and let:
- 𝐐 = 𝑄1, …, 𝑄𝑟be the set of query variables
- 𝐄 = 𝐸1, …, 𝐸𝑠be the set of evidence variables
- 𝐇 = 𝐻1, …, 𝐻𝑡be the remainder of the variables(called the hidden variables (non-evidence, non-query variables))
- 𝐪 = 𝑞1, …, 𝑞𝑟be a grounded instantiation of 𝐐
- 𝐞 = 𝑒1, …, 𝑒𝑠be a grounded instantiation of 𝐄
- 𝐡 = 𝘩1, …, 𝘩𝑡be a grounded instantiation of 𝐇
𝑋, 𝐸, and 𝑌 are disjoint exhaustive sets of all variables 𝐗
therefore:
- the complete set of variables is 𝐗 = 𝐐 ∪ 𝐄 ∪ 𝐇 = {𝑋1, …, 𝑋𝑛} = {𝑄1, …, 𝑄𝑟, 𝐸1, …, 𝐸𝑠, 𝐻1, …, 𝐻𝑡}
- the complete set of instantiations is 𝒙 = 𝐪 ∪ 𝐞 ∪ 𝐡 = {𝑥1, …, 𝑥𝑛} = {𝑞1, …, 𝑞𝑟, 𝑒1, …, 𝑒𝑠, ℎ1, …, ℎ𝑡}
Click here to expand...
Link to original
- Parametric Probability Distribution Models - function/model based
- Non-Parametric Probability Distribution Models - instance-based
see also: ML - Parametric vs Non-Parametric
representations of global structures
representations of local structures
- representations of normalized distributions:
- representations of un-normalized distributions:
Click here to expand...
Link to original
- product comes from a joint probability 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) being factorized into a product of conditional probabilities
- when we take the log of products it becomes sum of factors
Query
Product
SumMarginalization𝛴
Maximization𝑎𝑟𝑔𝑚𝑎𝑥
Description
Posterior Conditional Query𝐏(𝐐=𝐪|𝐄=𝐞) = 𝛴𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)
Belief Updating Query
(a type of posterior conditional query)𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = 𝛴𝐡∊𝐇[ 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)✔
✔
✘
- sum-product problem
- sum-log-sums problem
Prior Marginal Query𝐏(𝐐=𝐪) = 𝛴𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]
Probability of Evidence Query𝐏(𝐄=𝐞) = 𝛴𝐡∊𝐇[ 𝐏(𝐇=𝐡, 𝐄=𝐞) ]
✔
✔
✘
- sum-product problem
- sum-log-sums problem
- both queries essentially the same where 𝐐 = 𝐄
- #P-Hard Problem
Maximum a Posterior (𝑴𝑨𝑷) Query𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪[ 𝛴𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]
where: 𝐐∪𝐄⊂𝐗 and 𝐐∪𝐇∪𝐄=𝐗✔
✔
✔
- max-sum-product problem
- max-sum-log-sums problem
- most probable assignment because they include maximization
- NPPP-Hard Problem
Most Probable Explanation (𝑴𝑷𝑬) Query𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪[ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]
where: 𝐐∪𝐄=𝐗✔
✘
✔
- max-product problem
- max-log-sums problem
- most probable assignment because they include maximization
- NP-Hard Problem
more details: Probabilistic Inference - Query/Task Types (Posterior Conditional - Prior Marginal / Probability of Evidence - MPE - MAP)
resources:
Click here to expand...
Bucket Elimination (BE) Algorithm - Variable Elimination (VE) Algorithm
- a type of exact probabilistic inference algorithm
- is an extension of the Inference by Enumeration algorithm that pushes the summation operators into the factor product and eliminates variable summation operators by generating new factors (which makes it a type of dynamic programming)
- induces a cluster tree on which VE Algorithm is performed, thus it is a type of specialized belief propagation algorithm over trees
- each run of BE/VE only solves one probabilistic query such as 𝐏(𝑄|𝐄=𝐞) at a time. Extensions of BE/VE include:
- Bucket Tree Elimination (BTE) Algorithm - used to solve 𝐏(𝑄|𝐄=𝐞) for all 𝑄∊{all variables}
BE/VE Algorithm - Solving Probabilistic Queries
Click here to expand...
BE/VE Algorithm - Operator Types
Click here to expand...
Link to original𝛱 product-combination - merges 2 factors into 1 factor
Click here to expand...
𝐹(𝐴,𝐵)𝐹(𝐵,𝐶) = 𝐹(𝐴,𝐵,𝐶)
𝛴 sum-marginalization - “sum out” a variable over ALL values in that variable’s domain
Click here to expand...
𝛴𝑐∊𝐶𝐹(𝐴,𝐵,𝐶=𝑐) = 𝐹(𝐴,𝐵)
𝑚𝑎𝑥 max-marginalization - “max-out” a variable over ALL values in that variable’s domain
Click here to expand...
𝑚𝑎𝑥𝑐∊𝐶𝐹(𝐴,𝐵,𝐶=𝑐) = 𝐹(𝐴,𝐵)
𝛴 evidence-instantiation/reduction - over SINGLE value in variable domain (used for variable instantiation when given evidence)
Click here to expand...
𝐹(𝐴,𝐵,𝐶=𝑐1) = 𝐹(𝐴,𝐵)
VE Solving Prior Query - Probability of Evidence Query: 𝐏(𝐐=𝐪) = 𝛴𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]
Click here to expand... probabilistic queries often involves the summation/marginalization of full joint probabilities
for example, given a Gibbs Distribution of a full joint probability 𝐏(𝐴,𝐵,𝐶), the computation of a probabilistic query 𝐏(𝐴,𝐵) is as follows
- 𝐏(𝐴,𝐵) = 𝛴𝑐∊𝐶 𝐏(𝐴,𝐵,𝐶) # because marginal probability rule
however, storing a full joint probability in table-formis not practical because it takes 𝑂(𝑘𝑛) space, where:
- 𝑛 is the number of variable
- 𝑘 is the max cardinality of each variable
to solve this problem, we often encode the full joint probability into a probabilistic graphical model. They require much less space but they still retain the ability to represent the full joint probability.
When using a probabilistic graphical model the full joint probability is FACTORIZED
how it is FACTORIZED depends on the type of model used:
- for Bayesian Networks, a full joint probability is FACTORIZED into a product of conditional probabilities
- 𝑃(𝑋1, …, 𝑋𝑛) = 𝛱𝑋𝑖∈𝐗 [ 𝐏(𝑋𝑖|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋𝑖)) ]
- for Markov Networks, a full joint probability is FACTORIZED into a product of potential functions
- 𝑃(𝑋1, …, 𝑋𝑛) = 𝛱𝑐∊𝐶 [ (1/𝘡) * 𝜙𝑐(𝒙𝑐) ]
Let’s say the full joint probability 𝐏(𝐴,𝐵,𝐶) is FACTORIZED as so
- 𝐏(𝐴,𝐵,𝐶) = 𝐹1(𝐴,𝐵)𝐹2(𝐴,𝐶)𝐹3(𝐵,𝐶)
therefore the query becomes
- 𝐏(𝐴,𝐵) = 𝛴𝑐∊𝐶 𝐹1(𝐴,𝐵)𝐹2(𝐴,𝐶)𝐹3(𝐵,𝐶)
- 𝐏(𝐴,𝐵) = 𝐹1(𝐴,𝐵) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐵,𝐶)
At this point, everything we have done so far is the same as inference by enumeration. Inference by Enumeration takes the resulting formula and enumerates through each summation one-by-one. In Variable Elimination we exploit the fact that many of these computations are repetitions and instead of computing them from scratch every time we systematically order it these computations (similar to Dynamic Programming). Variable Elimination have the following steps below:
- move all irrelevant terms outside of the innermost summation 𝛴𝑖∊𝐼 (this includes the new terms created in step 2/3)
- compute the innermost summation 𝛴𝑖∊𝐼and replace it with the resulting new factor 𝑚𝑖(all terms within summation minus term 𝑖) the computation is a 2 step process:
- combination step - compute product 𝛱 of all factors within innermost summation (see example in Probabilistic Inference - Bucket Elimination (BE) - Variable Elimination (VE))
- marginalization step - compute summation 𝛴 of the resulting product (see example in Probabilistic Inference - Bucket Elimination (BE) - Variable Elimination (VE)) this is the new factor
- repeat steps 1 to 2 until all summations 𝛴𝑖∊𝐼are replaced
for example,
- 𝐏(𝐴,𝐵) = 𝐹1(𝐴,𝐵) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐵,𝐶)
- 𝐏(𝐴,𝐵) = 𝐹1(𝐴,𝐵) 𝑚𝑐(𝐴,𝐵)
and that’s basically it!
NOTE: the elimination algorithm has no benefit if the innermost term includes all variables, that is, 𝑋𝑖 is dependent on all the other variables (e.g. 𝐏(𝑋𝑖|𝑎𝑙𝑙-𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠-𝑚𝑖𝑛𝑢𝑠-𝑋𝑖)). However, in most problems, the number of variables in the innermost term is less than the total number of variables
answering
VE Solving Belief Updating Query: 𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = [𝛴𝐡∊𝐇 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)
Click here to expand... 𝐏(𝐴|𝐵=𝑏), given a Gibbs Distribution of the full joint probability 𝐏(𝐴,𝐵,𝐶)?
steps are similar to computing Prior Probabilistic Query except there is an additional step that instantiates the given evidence variables as shown below in step 5
- 𝐏(𝐴|𝐵=𝑏) = 𝐏(𝐴,𝐵) / 𝐏(𝐵) # because Bayes Rule
- 𝐏(𝐴|𝐵=𝑏) = [ 𝛴𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] / [ 𝛴𝑎∊𝐴𝛴𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] # because marginal probability rule
- 𝐏(𝐴|𝐵=𝑏) = [ 𝛴𝑐∊𝐶𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] / [ 𝛴𝑎∊𝐴𝛴𝑐∊𝐶𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] # factorize full joint probability
- 𝐏(𝐴|𝐵=𝑏) = [ 𝐹1(𝐴,𝐵=𝑏) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] / [ 𝛴𝑎∊𝐴𝐹1(𝐴,𝐵=𝑏) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ]# move all irrelevant terms as far left
- 𝐏(𝐴|𝐵=𝑏) = [ 𝐹1(𝐴) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐶) ] / [ 𝛴𝑎∊𝐴𝐹1(𝐴) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐶) ] # instantiate given evidences
- 𝐏(𝐴|𝐵=𝑏) = [ 𝐹1(𝐴) 𝑚𝑐(𝐴) ] / [ 𝛴𝑎∊𝐴𝐹1(𝐴) 𝑚𝑐(𝐴) ] # compute new term 𝑚𝑐 for the innermost summation
- 𝐏(𝐴|𝐵=𝑏) = [ 𝐹1(𝐴) 𝑚𝑐(𝐴) ] / [ 𝑚𝑎() ] # compute new term 𝑚𝑎 for the innermost summation
What if we want to compute a conditional probabilistic query
VE Solving Conditional Posterior Query: 𝐏(𝐐=𝐪|𝐄=𝐞) = [𝛴𝐡∊𝐇 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞)] / 𝐏(𝐄=𝐞)
Click here to expand... 𝐏(𝐴,𝐶|𝐵=𝑏), given a Gibbs Distribution of the full joint probability 𝐏(𝐴,𝐵,𝐶)?
- 𝐏(𝐴,𝐶|𝐵=𝑏) = 𝐏(𝐴,𝐵,𝐶) / 𝐏(𝐵) # because Bayes Rule
- 𝐏(𝐴,𝐶|𝐵=𝑏) = [𝐏(𝐴,𝐵=𝑏,𝐶) ] / [ 𝛴𝑎∊𝐴𝛴𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] # because marginal probability rule
- 𝐏(𝐴,𝐶|𝐵=𝑏) = [𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] / [ 𝛴𝑎∊𝐴𝛴𝑐∊𝐶𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] # factorize full joint probability
- 𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] / [ 𝛴𝑎∊𝐴𝐹1(𝐴,𝐵=𝑏) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ]# move all irrelevant terms as far left
- 𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹1(𝐴)𝐹2(𝐴,𝐶)𝐹3(𝐶) ] / [ 𝛴𝑎∊𝐴𝐹1(𝐴) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐶) ] # instantiate given evidences
- 𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹1(𝐴)𝐹2(𝐴,𝐶)𝐹3(𝐶)] / [ 𝛴𝑎∊𝐴𝐹1(𝐴) 𝑚𝑐(𝐴) ] # compute new term 𝑚𝑐 for the innermost summation
- 𝐏(𝐴,𝐶|𝐵=𝑏) = [ 𝐹1(𝐴)𝐹2(𝐴,𝐶)𝐹3(𝐶) ] / [ 𝑚𝑎() ] # compute new term 𝑚𝑎 for the innermost summation
- etc
What if we want to compute a conditional probabilistic query
VE Solving MPE Query: 𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪 [ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]
Click here to expand... MPE seeks an assignment to ALL variables that has the maximal probability given evidence
What if we want to compute 𝑀𝑃𝐸(𝑎𝑙𝑙-𝑛𝑜𝑛-𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒-𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠|𝐵=𝑏), given a Gibbs Distribution of the full joint probability𝐏(𝐴,𝐵,𝐶)?
steps are similar as computing Belief Updating Query, except instead of sum-marginalize operators (𝛴) we use sum-marginalize operator (𝑚𝑎𝑥)
if a function/factor is a constant, we have 2 options:
- when computing the MAP value/probability: we place it directly in the first bucket
- when computing just the MAP tuple: we can remove it
Example
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴𝑚𝑎𝑥𝐶𝐏(𝐴,𝐶|𝐵=𝑏)
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴𝑚𝑎𝑥𝐶𝐏(𝐴,𝐵=𝑏,𝐶) / 𝐏(𝐵=𝑏) # because Bayes Rule
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 𝑚𝑎𝑥𝐶𝐏(𝐴,𝐵=𝑏,𝐶)# because 𝐏(𝐵=𝑏) is constant
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 𝑚𝑎𝑥𝐶[ 𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] # since 𝐏(𝐴,𝐵=𝑏,𝐶) is a Gibbs Distirbution, factorize full joint probability
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 𝑚𝑎𝑥𝐶[ 𝐹1(𝐴)𝐹2(𝐴,𝐶)𝐹3(𝐶) ] # instantiate given evidences
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴[ 𝐹1(𝐴) 𝑚𝑎𝑥𝐶 [ 𝐹2(𝐴,𝐶)𝐹3(𝐶) ] ] # move all irrelevant terms as far left
- 𝑀𝑃𝐸(𝐴,𝐶|𝐵=𝑏) = 𝑚𝑎𝑥𝐴[ 𝐹1(𝐴) 𝑚𝑐(𝐴) ] # compute new term 𝑚𝑐 for the innermost summation
- etc
VE Solving MAP Query: 𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪 [ 𝛴𝐡∊𝐇 [ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]
Click here to expand... MAP seeks an assignment to a SUBSET of variables that has the maximal probability given evidence
What if we want to compute 𝑀𝐴𝑃(𝐴|𝐵=𝑏), given a Gibbs Distribution of the full joint probability𝐏(𝐴,𝐵,𝐶)?
is a combination of VE Algorithm - Solving Conditional Posterior Query and VE Algorithm - Solving MPE Query:
- sum-marginalize operators (𝛴) eliminates non-MAP variables
- max-marginalize operators (𝑚𝑎𝑥) eliminates MAP variables
if a function/factor is a constant, we have 2 options:
- when computing the MAP value/probability: we place it directly in the first bucket
- when computing just the MAP tuple: we can remove it
Example
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴𝐏(𝐴|𝐵=𝑏)
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴𝐏(𝐴,𝐵=𝑏) / 𝐏(𝐵=𝑏) # because Bayes Rule
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴𝐏(𝐴,𝐵=𝑏)# because 𝐏(𝐵=𝑏) is constant
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 [ 𝛴𝑐∊𝐶𝐏(𝐴,𝐵=𝑏,𝐶) ] # becausemarginal probability rule
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 [ 𝛴𝑐∊𝐶𝐹1(𝐴,𝐵=𝑏)𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ] # since 𝐏(𝐴,𝐵=𝑏,𝐶) is a Gibbs Distirbution, factorize full joint probability
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 [ 𝐹1(𝐴,𝐵=𝑏) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐵=𝑏,𝐶) ]# move all irrelevant terms as far left
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 [ 𝐹1(𝐴) 𝛴𝑐∊𝐶𝐹2(𝐴,𝐶)𝐹3(𝐶) ] # instantiate given evidences
- 𝑀𝐴𝑃(𝐴|𝐵=𝑏) = 𝑚𝑎𝑥𝐴 [ 𝐹1(𝐴) 𝑚𝑐(𝐴) ] # compute new term 𝑚𝑐 for the innermost summation
BE/VE Algorithm - Choosing Elimination Orderings
Click here to expand...
The algorithm with variable elimination order 𝐈 has exponential time complexity based on:
- treewidth of the clique tree = (size of the largest clique) - 1
A “good” variable elimination order will make the treewidth small. Its definition is one less than the smallest achievable value of the cardinality of the largest elimination clique, ranging over all possible elimination ordering. However, finding such k as well as the “best” elimination ordering is NP-hard. As such there are heuristics one may follow to better optimize performance by order:
BE/VE Algorithm - Complexity
Click here to expand...
BE/VE Algorithm - In Specific Probabilistic Graphical Models
BE/VE Algorithm - Variants
- Propagation - Collect & Distribute Algorithm
- Bucket Elimination - computes an upper-bound estimation in a much faster way by bounding the size of buckets
𝛴 sum-marginalization - “sum out” a variable over ALL values in that variable’s domain
𝑚𝑎𝑥 max-marginalization - “max-out” a variable over ALL values in that variable’s domain---variable-elimination-(ve)/be/ve-algorithm---operator-types/../../../../../../../../../mathematics/probability---statistics---information-theory---econometrics/probability/probabilistic-inference/probabilistic-inference---algorithms/probabilistic-inference---exact-inference/probabilistic-inference---bucket-elimination-(be)---variable-elimination-(ve)/be/ve-algorithm---operator-types/factor-maximization.png)
---variable-elimination-(ve)/be/ve-algorithm---operator-types/../../../../../../../../../mathematics/probability---statistics---information-theory---econometrics/probability/probabilistic-inference/probabilistic-inference---algorithms/probabilistic-inference---exact-inference/probabilistic-inference---bucket-elimination-(be)---variable-elimination-(ve)/be/ve-algorithm---operator-types/factor-reduction-instantiation-.png)