Likelihood Weighting Sampling
- is a specific form of importance sampling for Bayesian Networks where the variables are sampled in the order defined by the network, and evidence is used to update the weights
- a special kind of importance sampling in which the proposal distribution equals the belief network obtained by clamping evidence
- is a combination of both:
- forward sampling - a special case of likelihood weighting that solves only probability of evidence queries (i.e. 𝐏(𝐸=𝑒))
- importance sampling - generalizes likelihood weighting
- performs POORLY when evidence is at the leaves
- syntax
Click here to expand...
Link to originalLink to originalA graphical model 𝒢 is a tuple 𝒢 = ⟨𝐗, 𝐃, 𝐒, 𝐅, 𝐂⟩ where:
- 𝐗 = {𝑋1, …, 𝑋𝑛} set of ordered variables
- 𝐃 = {𝐷1, …, 𝐷𝑛} set of corresponding domains of each variable 𝑋𝑖 (e.g. if 𝑋1is a boolean variable then 𝐷1= {true, false}). The size of each 𝐷𝑖corresponds to the cardinality of variable 𝑋𝑖
- 𝐒 = {𝑆1, …, 𝑆𝑚} set of variable scopes, where each variable scope 𝑆𝑖is a subset of 𝐗 (i.e. 𝑆𝑖 ⊆ 𝐗)
- 𝐅 = {𝐹1, …, 𝐹𝑚} set of factors/functions, where each factor/function 𝐹𝑖 is defined over its corresponding variable scope 𝑆𝑖and maps any assignment over its scope to a real value
- in context of Bayesian Networks, 𝐅 = set of conditional probability tables
- in context of Markov Networks, 𝐅 = set of factors
- global function - is a function whose scope includes all variables (i.e. 𝑆𝑖 = 𝐗)
- local functions - is a function whose scope is a proper subset of variables (i.e. 𝑆𝑖⊂ 𝐗)
- 𝐂 is a set of combination operators which defines how functions are combined. common combination operators are:
- summation operator (𝛴)
- multiplication operator (𝛱)
- AND operator (∧) - for Boolean functions
- relational join operator (⨝) - when the functions are relation
- marginalization operator - for reasoning queries
- max operator - e.g. = argmax𝑦[ 𝐹𝑖(𝑥,𝑦) ] = 𝐹𝑗(𝑥) where 𝐹𝑗is a new function with scope over variable 𝑥
the set of local functions can be combined in a variety of ways (e.g. combination operators) to generate a new local function or even a global function
and let:
- 𝐐 = 𝑄1, …, 𝑄𝑟be the set of query variables
- 𝐄 = 𝐸1, …, 𝐸𝑠be the set of evidence variables
- 𝐇 = 𝐻1, …, 𝐻𝑡be the remainder of the variables(called the hidden variables (non-evidence, non-query variables))
- 𝐪 = 𝑞1, …, 𝑞𝑟be a grounded instantiation of 𝐐
- 𝐞 = 𝑒1, …, 𝑒𝑠be a grounded instantiation of 𝐄
- 𝐡 = 𝘩1, …, 𝘩𝑡be a grounded instantiation of 𝐇
𝑋, 𝐸, and 𝑌 are disjoint exhaustive sets of all variables 𝐗
therefore:
- the complete set of variables is 𝐗 = 𝐐 ∪ 𝐄 ∪ 𝐇 = {𝑋1, …, 𝑋𝑛} = {𝑄1, …, 𝑄𝑟, 𝐸1, …, 𝐸𝑠, 𝐻1, …, 𝐻𝑡}
- the complete set of instantiations is 𝒙 = 𝐪 ∪ 𝐞 ∪ 𝐡 = {𝑥1, …, 𝑥𝑛} = {𝑞1, …, 𝑞𝑟, 𝑒1, …, 𝑒𝑠, ℎ1, …, ℎ𝑡}
Solving Posterior Conditional Query
Click here to expand...
is a specific version of Normalized Importance Sampling where weights are only applied to the evidence
LW-Sample(𝓑, 𝑬=𝒆) { // 𝓑 - bayesian network // 𝑬=𝒆 - set of evidence variables w = 1 for each variable 𝑋ᵢ in topological order based on 𝓑 if 𝑋ᵢ is an evidence variable 𝑋ᵢ = 𝑒ᵢ w = w * 𝐏(𝑋ᵢ|𝑝𝑎??𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋ᵢ)) else 𝑋ᵢ = sample from 𝐏(𝑋ᵢ|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋ᵢ) return (w, sample={𝑋₁, ..., 𝑋n}) // weighted sample }this process generates a weighted particle. We can now estimate a conditional probability 𝐏(𝑸=𝒒|𝑬=𝒆) by using LW-Sample 𝑛 times to generate a set of weighted particles {⟨𝑤[1],𝑠𝑎𝑚𝑝𝑙𝑒[1]⟩, …, ⟨𝑤[𝑛],𝑠𝑎𝑚𝑝𝑙𝑒[𝑛]⟩}. We then estimate:
- 𝐏(𝑸=𝒒|𝑬=𝒆) ≈ 𝐏ˆ(𝑸=𝒒|𝑬=𝒆) = [𝛴1≤𝑖≤𝑛[𝑤[𝑖]·𝐼{𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] = 𝒒}]] / [𝛴1≤𝑖≤𝑛[𝑤[𝑖]]]
where:
- 𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] - is a subset of 𝑠𝑎𝑚𝑝𝑙𝑒[𝑖] that contains variables 𝑸
- 𝐼{𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] = 𝒒}] - is an indicator function that equals 1 when 𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] is assigned 𝒒
Solving Probability of Evidence Query
Click here to expand...
To solve 𝐏(𝑸=𝒒) we can use the same algorithm as Solving Posterior Conditional Query. However we can simplify it
LW-Sample(𝓑) { // 𝓑 - bayesian network for each variable 𝑋ᵢ in topological order based on 𝓑 𝑋ᵢ = sample from 𝐏(𝑋ᵢ|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋ᵢ) return {𝑋₁, ..., 𝑋n} // sample }this process generates a sample particle. We can now estimate a conditional probability 𝐏(𝑸=𝒒) by using LW-Sample 𝑛 times to generate a set of weighted particles {⟨𝑠𝑎𝑚𝑝𝑙𝑒[1], …, 𝑠𝑎𝑚𝑝𝑙𝑒[𝑛]}. We then estimate:
- 𝐏(𝑸=𝒒) ≈ 𝐏ˆ(𝑸=𝒒) = [𝛴1≤𝑖≤𝑛 𝐼{𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] = 𝒒}] / [𝑛]
where:
- 𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] - is a subset of 𝑠𝑎𝑚𝑝𝑙𝑒[𝑖] that contains variables 𝑸
- 𝐼{𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] = 𝒒}] - is an indicator function that equals 1 when 𝑠𝑎𝑚𝑝𝑙𝑒𝑸[𝑖] is assigned 𝒒
NOTE: This is the same as Logic Sampling
Likelihood Weighting - Limitations
Click here to expand...
one of the limitations of likelihood weighting is that an evidence node affects the sample only for nodes that are its descendants. The effect on nodes that are non-descendants is accounted for only by weights. In cases where much of the evidence is at the leaves of the network, we are essentially sampling from the prior distribution, which is often very far from the desired posterior
Likelihood Weighting or Importance Sampling
Click here to expand...
Solving Probability of Evidence Query of a Bayesian Network
Click here to expand...
compute proposal distribution 𝐐(𝑯) over the clamped bayesian network
express 𝐏(𝑬=𝒆) as follows:
- 𝐏(𝑬=𝒆) = 𝛴𝘩1∊𝐻1 … 𝛴𝘩𝑡∊𝐻𝑡 [ 𝐏(𝑯=𝒉, 𝑬=𝒆) ]
- 𝐏(𝑬=𝒆) = 𝛴𝘩1∊𝐻1 … 𝛴𝘩𝑡∊𝐻𝑡 [ 𝐏(𝑯=𝒉, 𝑬=𝒆) 𝐐(𝑯=𝒉)/𝐐(𝑯=𝒉) ]
- 𝐏(𝑬=𝒆) = 𝛴𝘩1∊𝐻1 … 𝛴𝘩𝑡∊𝐻𝑡 [ [𝐏(𝑯=𝒉, 𝑬=𝒆)/𝐐(𝑯=𝒉)] · 𝐐(𝑯=𝒉) ]
- 𝐏(𝑬=𝒆) = 𝐄𝐐[ 𝐏(𝑯=𝒉, 𝑬=𝒆) / 𝐐(𝑯=𝒉) ]
- 𝐏(𝑬=𝒆) = 𝐄𝐐[ [𝐏(𝐻1=𝘩1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻1))· … ·𝐏(𝐻𝑡=𝘩𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻𝑡))·𝐏(𝐸1=𝑒1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸𝑠))· … ·𝐏(𝐸𝑠=𝑒𝑠|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸𝑠))] / [𝐐(𝐻1=𝘩1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻1))· … ·𝐐(𝐻𝑡=𝘩𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻𝑡))] ]
- 𝐏(𝑬=𝒆) = 𝐄𝐐[ [𝐏(𝐻1=𝘩1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻1))· … ·𝐏(𝐻𝑡=𝘩𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻𝑡))·𝐏(𝐸1=𝑒1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸𝑠))· … ·𝐏(𝐸𝑠=𝑒𝑠|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸𝑠))] / [𝐏(𝐻1=𝘩1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻1))· … ·𝐏(𝐻𝑡=𝘩𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻𝑡))] ] # 𝐐 is over the same distribution as 𝐏
- 𝐏(𝑬=𝒆) = 𝐄𝐐[ 𝐏(𝐸1=𝑒1|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸𝑠))· … ·𝐏(𝐸𝑠=𝑒𝑠|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸𝑠)) ]
- 𝐏(𝑬=𝒆) = 𝐄𝐐[ 𝐏(𝑬=𝒆|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑬)) ]
generate 𝑛 samples from proposal distribution 𝐐 and estimate 𝐏(𝑬=𝒆) using the following Monte Carlo estimate:
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛 [ 𝐏(𝑬=𝒆 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑬) 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑤𝑖𝑡𝘩 𝑣𝑎𝑙𝑢𝑒𝑠 𝑑𝑟𝑎𝑤𝑛 𝑓𝑟𝑜𝑚 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑟 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) ]
given evidence 𝑬=𝒆, we clamp the bayesian network
Solving Probability of Evidence Query of a Bayesian Network - Example
Click here to expand...
- given a bayesian network below
- given evidence (𝑔=0, 𝑠=0), we clamp the conditional probability tables of the bayesian network
- compute proposal distribution 𝐐 defined over the clamped bayesian network
- 𝐏ˆ(𝑬=𝒆) = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒-𝑜𝑓[ 𝐏(𝑠𝑎𝑚𝑝𝑙𝑒, 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) / 𝐐(𝑠𝑎𝑚𝑝𝑙𝑒) ]
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛 [ 𝐏(𝑠𝑎𝑚𝑝𝑙𝑒𝑖, 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) / 𝐐(𝑠𝑎𝑚𝑝𝑙𝑒𝑖) ]
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛 [ 𝐏(𝑯=𝒉𝑖, 𝑬=𝒆)/𝐐(𝑯=𝒉𝑖) ]
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛 [ 𝐏(𝐷=𝑑𝑖, 𝐼=𝑖𝑖, 𝐺=0, 𝐿=𝑙𝑖, 𝑆=0) / 𝐐(𝐷=𝑑𝑖, 𝐼=𝑖𝑖, 𝐿=𝑙𝑖) ]
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛[ [𝐏(𝐷=𝑑𝑖)·𝐏(𝐼=𝑖𝑖)·𝐏(𝐺=0|𝐷=𝑑𝑖,𝐼=𝑖𝑖)·𝐏(𝐿=𝑙𝑖|𝐺=0)·𝐏(𝑆=0|𝐼=𝑖𝑖)] / [𝐐(𝐷=𝑑𝑖)·𝐐(𝐼=𝑖𝑖)·𝐐(𝐿=𝑙𝑖|𝐺=0)] ]
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛[ [𝐏(𝐷=𝑑𝑖)·𝐏(𝐼=𝑖𝑖)·𝐏(𝐺=0|𝐷=𝑑𝑖,𝐼=𝑖𝑖)·𝐏(𝐿=𝑙𝑖|𝐺=0)·𝐏(𝑆=0|𝐼=𝑖𝑖)] / [𝐏(𝐷=𝑑𝑖)·𝐏(𝐼=𝑖𝑖)·𝐏(𝐿=𝑙𝑖|𝐺=0)] ]
- 𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛[ 𝐏(𝐺=0|𝐷=𝑑𝑖,𝐼=𝑖𝑖)·𝐏(𝑆=0|𝐼=𝑖𝑖) ]
bayesian network
clamped bayesian network
𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴1≤𝑖≤𝑛 [ 𝐏(𝐺=0|𝐷=𝑑𝑖,𝐼=𝑖𝑖)·𝐏(𝑆=0|𝐼=𝑖𝑖) ]
- sample 𝑖=1 = (𝐷=0, 𝐼=0, 𝐿=0)
- 𝐏(𝐺=0|𝐷=0,𝐼=0)·𝐏(𝑆=0|𝐼=0) = 0.3 * 0.95 = 0.285
- sample 𝑖=2 = (𝐷=0, 𝐼=1, 𝐿=1)
- 𝐏(𝐺=0|𝐷=0,𝐼=1)·𝐏(𝑆=0|𝐼=1) = 0.9 * 0.2 = 0.18
- and so on…
𝐏ˆ(𝑬=𝒆) = (1/𝑛) (0.285 + 0.18 + … + 𝑛𝑡𝘩 𝑠𝑎𝑚𝑝𝑙𝑒)
NOTE: performs POORLY when evidence is at the leaves
/likelihood-weighting-sampling/importance-sampling-bayesian-network-example.png)
/likelihood-weighting-sampling/likelihood-weighting-bayesian-network.png)