probabilistic inference recap
Link to originalgiven a probabilistic model, Probabilistic Inference is answering a probabilistic query over that model. It is also used for the computation/estimation of: distributions, the distribution’s parameters (if it’s a parametric distribution), the distribution’s probabilities, and/or the distribution’s characteristics
Click here to expand...
Link to originalLink to originalA graphical model 𝒢 is a tuple 𝒢 = ⟨𝐗, 𝐃, 𝐒, 𝐅, 𝐂⟩ where:
- 𝐗 = {𝑋1, …, 𝑋𝑛} set of ordered variables
- 𝐃 = {𝐷1, …, 𝐷𝑛} set of corresponding domains of each variable 𝑋𝑖 (e.g. if 𝑋1is a boolean variable then 𝐷1= {true, false}). The size of each 𝐷𝑖corresponds to the cardinality of variable 𝑋𝑖
- 𝐒 = {𝑆1, …, 𝑆𝑚} set of variable scopes, where each variable scope 𝑆𝑖is a subset of 𝐗 (i.e. 𝑆𝑖 ⊆ 𝐗)
- 𝐅 = {𝐹1, …, 𝐹𝑚} set of factors/functions, where each factor/function 𝐹𝑖 is defined over its corresponding variable scope 𝑆𝑖and maps any assignment over its scope to a real value
- in context of Bayesian Networks, 𝐅 = set of conditional probability tables
- in context of Markov Networks, 𝐅 = set of factors
- global function - is a function whose scope includes all variables (i.e. 𝑆𝑖 = 𝐗)
- local functions - is a function whose scope is a proper subset of variables (i.e. 𝑆𝑖⊂ 𝐗)
- 𝐂 is a set of combination operators which defines how functions are combined. common combination operators are:
- summation operator (𝛴)
- multiplication operator (𝛱)
- AND operator (∧) - for Boolean functions
- relational join operator (⨝) - when the functions are relation
- marginalization operator - for reasoning queries
- max operator - e.g. = argmax𝑦[ 𝐹𝑖(𝑥,𝑦) ] = 𝐹𝑗(𝑥) where 𝐹𝑗is a new function with scope over variable 𝑥
the set of local functions can be combined in a variety of ways (e.g. combination operators) to generate a new local function or even a global function
and let:
- 𝐐 = 𝑄1, …, 𝑄𝑟be the set of query variables
- 𝐄 = 𝐸1, …, 𝐸𝑠be the set of evidence variables
- 𝐇 = 𝐻1, …, 𝐻𝑡be the remainder of the variables(called the hidden variables (non-evidence, non-query variables))
- 𝐪 = 𝑞1, …, 𝑞𝑟be a grounded instantiation of 𝐐
- 𝐞 = 𝑒1, …, 𝑒𝑠be a grounded instantiation of 𝐄
- 𝐡 = 𝘩1, …, 𝘩𝑡be a grounded instantiation of 𝐇
𝑋, 𝐸, and 𝑌 are disjoint exhaustive sets of all variables 𝐗
therefore:
- the complete set of variables is 𝐗 = 𝐐 ∪ 𝐄 ∪ 𝐇 = {𝑋1, …, 𝑋𝑛} = {𝑄1, …, 𝑄𝑟, 𝐸1, …, 𝐸𝑠, 𝐻1, …, 𝐻𝑡}
- the complete set of instantiations is 𝒙 = 𝐪 ∪ 𝐞 ∪ 𝐡 = {𝑥1, …, 𝑥𝑛} = {𝑞1, …, 𝑞𝑟, 𝑒1, …, 𝑒𝑠, ℎ1, …, ℎ𝑡}
Click here to expand...
Link to original
- Parametric Probability Distribution Models - function/model based
- Non-Parametric Probability Distribution Models - instance-based
see also: ML - Parametric vs Non-Parametric
representations of global structures
representations of local structures
- representations of normalized distributions:
- representations of un-normalized distributions:
Click here to expand...
Link to original
- product comes from a joint probability 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) being factorized into a product of conditional probabilities
- when we take the log of products it becomes sum of factors
Query
Product
SumMarginalization𝛴
Maximization𝑎𝑟𝑔𝑚𝑎𝑥
Description
Posterior Conditional Query𝐏(𝐐=𝐪|𝐄=𝐞) = 𝛴𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)
Belief Updating Query
(a type of posterior conditional query)𝐏(𝑄𝑖=𝑞𝑖|𝐄=𝐞) = 𝛴𝐡∊𝐇[ 𝐏(𝑄𝑖=𝑞𝑖, 𝐇=𝐡, 𝐄=𝐞) ] / 𝐏(𝐄=𝐞)✔
✔
✘
- sum-product problem
- sum-log-sums problem
Prior Marginal Query𝐏(𝐐=𝐪) = 𝛴𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡) ]
Probability of Evidence Query𝐏(𝐄=𝐞) = 𝛴𝐡∊𝐇[ 𝐏(𝐇=𝐡, 𝐄=𝐞) ]
✔
✔
✘
- sum-product problem
- sum-log-sums problem
- both queries essentially the same where 𝐐 = 𝐄
- #P-Hard Problem
Maximum a Posterior (𝑴𝑨𝑷) Query𝑀𝐴𝑃(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪[ 𝛴𝐡∊𝐇[ 𝐏(𝐐=𝐪, 𝐇=𝐡, 𝐄=𝐞) ] ]
where: 𝐐∪𝐄⊂𝐗 and 𝐐∪𝐇∪𝐄=𝐗✔
✔
✔
- max-sum-product problem
- max-sum-log-sums problem
- most probable assignment because they include maximization
- NPPP-Hard Problem
Most Probable Explanation (𝑴𝑷𝑬) Query𝑀𝑃𝐸(𝐐=?|𝐄=𝐞) = 𝑎𝑟𝑔𝑚𝑎𝑥𝐪[ 𝐏(𝐐=𝐪, 𝐄=𝐞) ]
where: 𝐐∪𝐄=𝐗✔
✘
✔
- max-product problem
- max-log-sums problem
- most probable assignment because they include maximization
- NP-Hard Problem
more details: Probabilistic Inference - Query/Task Types (Posterior Conditional - Prior Marginal / Probability of Evidence - MPE - MAP)
resources:
Click here to expand...
- is a type of exact inference algorithm that naively computes the probability of a probabilistic query
- an extension of this is called the Variable Elimination Algorithm where it pushes the summation operator into the factor products
Inference by Enumeration Algorithm - Intuition
answering probabilistic queries often involves the summation/marginalization of full joint probabilities
for example, given a full joint probability 𝐏(𝑎, 𝑏, 𝑐) the computation of a probabilistic query 𝐏(𝑎|𝑏) is as follows
- 𝐏(𝑎|𝑏) = 𝐏(𝑎,𝑏) / 𝐏(𝑏) # because bayes rule
- 𝐏(𝑎|𝑏) = [ 𝛴𝑐∊𝐶 𝐏(𝑎,𝑏,𝑐) ] / [ 𝛴𝑎∊𝐴𝛴𝑐∊𝐶𝐏(𝑎,𝑏,𝑐) ] # because marginal probability rule
however, storing a full joint probability in table-formis not practical because it takes 𝑂(𝑘𝑛) space, where:
- 𝑛 is the number of variables
- 𝑘 is the max cardinality of each variable
to solve this problem, we often encode the full joint probability in a probabilistic graphical model. They require much less space but still retain the ability to represent the full joint probability. The only downside of is that it requires an extra step to obtain the value of the full joint probability
acquiring these values depends on the probabilistic graphical model:
- in the context of Bayesian Networks, a full joint probability is the product of conditional probabilities
- 𝑃(𝑋1, …, 𝑋𝑛) = 𝛱𝑋𝑖∈𝐗 [ 𝐏(𝑋𝑖|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋𝑖)) ]
- in the context of Markov Networks, a full joint probability is the product of potential functions
- 𝑃(𝑋1, …, 𝑋𝑛) = (1/𝘡) * 𝛱𝑐∊𝐶 [ 𝜙𝑐(𝒙𝑐) ]
order to solve this query 𝐏(𝑎|𝑏) we need to know the value of the full joint probability 𝐏(𝑎,𝑏,𝑐)
Once the probability value is acquired we plug it into the equation, then we continue to enumerate through each summation 𝛴 and each time we calculate the full joint probability value based on the probabilistic graphical model
- 𝐏(𝑎|𝑏) = [ 𝛴𝑐∊𝐶 𝐏(𝑎,𝑏,𝑐) ] / [ 𝛴𝑎∊𝐴𝛴𝑐∊𝐶𝐏(𝑎,𝑏,𝑐) ]
the Inference by Enumeration algorithm simply enumerates through each summation 𝛴