Likelihood Weighting Sampling

is a specific form of importance sampling for Bayesian Networks where the variables are sampled in the order defined by the network, and evidence is used to update the weights
a special kind of importance sampling in which the proposal distribution equals the belief network obtained by clamping evidence
is a combination of both:
- forward sampling - a special case of likelihood weighting that solves only probability of evidence queries (i.e. 𝐏(𝐸=𝑒))
- importance sampling - generalizes likelihood weighting
performs POORLY when evidence is at the leaves
syntax
Click here to expand...
A graphical model 𝒢 is a tuple 𝒢 = ⟨𝐗, 𝐃, 𝐒, 𝐅, 𝐂⟩ where:

𝐗 = {𝑋₁, …, 𝑋_𝑛} set of ordered variables

𝐃 = {𝐷₁, …, 𝐷_𝑛} set of corresponding domains of each variable 𝑋_𝑖 (e.g. if 𝑋₁is a boolean variable then 𝐷₁= {true, false}). The size of each 𝐷_𝑖corresponds to the cardinality of variable 𝑋_𝑖

𝐒 = {𝑆₁, …, 𝑆_𝑚} set of variable scopes, where each variable scope 𝑆_𝑖is a subset of 𝐗 (i.e. 𝑆_𝑖 ⊆ 𝐗)

𝐅 = {𝐹₁, …, 𝐹_𝑚} set of factors/functions, where each factor/function 𝐹_𝑖 is defined over its corresponding variable scope 𝑆_𝑖and maps any assignment over its scope to a real value

in context of Bayesian Networks, 𝐅 = set of conditional probability tables

in context of Markov Networks, 𝐅 = set of factors

global function - is a function whose scope includes all variables (i.e. 𝑆_𝑖 = 𝐗)

local functions - is a function whose scope is a proper subset of variables (i.e. 𝑆_𝑖⊂ 𝐗)

𝐂 is a set of combination operators which defines how functions are combined. common combination operators are:

summation operator (𝛴)

multiplication operator (𝛱)

AND operator (∧) - for Boolean functions

relational join operator (⨝) - when the functions are relation

marginalization operator - for reasoning queries

max operator - e.g. = argmax_𝑦[ 𝐹_𝑖(𝑥,𝑦) ] = 𝐹_𝑗(𝑥) where 𝐹_𝑗is a new function with scope over variable 𝑥

the set of local functions can be combined in a variety of ways (e.g. combination operators) to generate a new local function or even a global function

Link to original

and let:

𝐐 = 𝑄₁, …, 𝑄_𝑟be the set of query variables

𝐄 = 𝐸₁, …, 𝐸_𝑠be the set of evidence variables

𝐇 = 𝐻₁, …, 𝐻_𝑡be the remainder of the variables(called the hidden variables (non-evidence, non-query variables))

𝐪 = 𝑞₁, …, 𝑞_𝑟be a grounded instantiation of 𝐐

𝐞 = 𝑒₁, …, 𝑒_𝑠be a grounded instantiation of 𝐄

𝐡 = 𝘩₁, …, 𝘩_𝑡be a grounded instantiation of 𝐇

𝑋, 𝐸, and 𝑌 are disjoint exhaustive sets of all variables 𝐗

therefore:

the complete set of variables is 𝐗 = 𝐐 ∪ 𝐄 ∪ 𝐇 = {𝑋₁, …, 𝑋_𝑛} = {𝑄₁, …, 𝑄_𝑟, 𝐸₁, …, 𝐸_𝑠, 𝐻₁, …, 𝐻_𝑡}

the complete set of instantiations is 𝒙 = 𝐪 ∪ 𝐞 ∪ 𝐡 = {𝑥₁, …, 𝑥_𝑛} = {𝑞₁, …, 𝑞_𝑟, 𝑒₁, …, 𝑒_𝑠, ℎ₁, …, ℎ_𝑡}
Link to original

Solving Posterior Conditional Query

Click here to expand...
is a specific version of Normalized Importance Sampling where weights are only applied to the evidence
LW-Sample(𝓑, 𝑬=𝒆) {
	// 𝓑 - bayesian network
	// 𝑬=𝒆 - set of evidence variables
	w = 1
	for each variable 𝑋ᵢ in topological order based on 𝓑
		if 𝑋ᵢ is an evidence variable
			𝑋ᵢ = 𝑒ᵢ
			w = w * 𝐏(𝑋ᵢ|𝑝𝑎??𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋ᵢ))
		else
			𝑋ᵢ = sample from 𝐏(𝑋ᵢ|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋ᵢ)
	return (w, sample={𝑋₁, ..., 𝑋n}) // weighted sample
}
this process generates a weighted particle. We can now estimate a conditional probability 𝐏(𝑸=𝒒|𝑬=𝒆) by using LW-Sample 𝑛 times to generate a set of weighted particles {⟨𝑤[1],𝑠𝑎𝑚𝑝𝑙𝑒[1]⟩, …, ⟨𝑤[𝑛],𝑠𝑎𝑚𝑝𝑙𝑒[𝑛]⟩}. We then estimate:

𝐏(𝑸=𝒒|𝑬=𝒆) ≈ 𝐏ˆ(𝑸=𝒒|𝑬=𝒆) = [𝛴_{1≤𝑖≤𝑛}[𝑤[𝑖]·𝐼{𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] = 𝒒}]] / [𝛴_{1≤𝑖≤𝑛}[𝑤[𝑖]]]

where:

𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] - is a subset of 𝑠𝑎𝑚𝑝𝑙𝑒[𝑖] that contains variables 𝑸

𝐼{𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] = 𝒒}] - is an indicator function that equals 1 when 𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] is assigned 𝒒

Solving Probability of Evidence Query

Click here to expand...
To solve 𝐏(𝑸=𝒒) we can use the same algorithm as Solving Posterior Conditional Query. However we can simplify it
LW-Sample(𝓑) {
	// 𝓑 - bayesian network
	for each variable 𝑋ᵢ in topological order based on 𝓑
		𝑋ᵢ = sample from 𝐏(𝑋ᵢ|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑋ᵢ)
	return {𝑋₁, ..., 𝑋n} // sample
}
this process generates a sample particle. We can now estimate a conditional probability 𝐏(𝑸=𝒒) by using LW-Sample 𝑛 times to generate a set of weighted particles {⟨𝑠𝑎𝑚𝑝𝑙𝑒[1], …, 𝑠𝑎𝑚𝑝𝑙𝑒[𝑛]}. We then estimate:

𝐏(𝑸=𝒒) ≈ 𝐏ˆ(𝑸=𝒒) = [𝛴_{1≤𝑖≤𝑛} 𝐼{𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] = 𝒒}] / [𝑛]

where:

𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] - is a subset of 𝑠𝑎𝑚𝑝𝑙𝑒[𝑖] that contains variables 𝑸

𝐼{𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] = 𝒒}] - is an indicator function that equals 1 when 𝑠𝑎𝑚𝑝𝑙𝑒_𝑸[𝑖] is assigned 𝒒

NOTE: This is the same as Logic Sampling

Likelihood Weighting - Limitations

Click here to expand...

one of the limitations of likelihood weighting is that an evidence node affects the sample only for nodes that are its descendants. The effect on nodes that are non-descendants is accounted for only by weights. In cases where much of the evidence is at the leaves of the network, we are essentially sampling from the prior distribution, which is often very far from the desired posterior

Likelihood Weighting or Importance Sampling

Click here to expand...

Solving Probability of Evidence Query of a Bayesian Network

Click here to expand...

compute proposal distribution 𝐐(𝑯) over the clamped bayesian network

express 𝐏(𝑬=𝒆) as follows:

𝐏(𝑬=𝒆) = 𝛴_{𝘩₁∊𝐻₁} … 𝛴_{𝘩_𝑡∊𝐻_𝑡} [ 𝐏(𝑯=𝒉, 𝑬=𝒆) ]

𝐏(𝑬=𝒆) = 𝛴_{𝘩₁∊𝐻₁} … 𝛴_{𝘩_𝑡∊𝐻_𝑡} [ 𝐏(𝑯=𝒉, 𝑬=𝒆) 𝐐(𝑯=𝒉)/𝐐(𝑯=𝒉) ]

𝐏(𝑬=𝒆) = 𝛴_{𝘩₁∊𝐻₁} … 𝛴_{𝘩_𝑡∊𝐻_𝑡} [ [𝐏(𝑯=𝒉, 𝑬=𝒆)/𝐐(𝑯=𝒉)] · 𝐐(𝑯=𝒉) ]

𝐏(𝑬=𝒆) = 𝐄_𝐐[ 𝐏(𝑯=𝒉, 𝑬=𝒆) / 𝐐(𝑯=𝒉) ]

𝐏(𝑬=𝒆) = 𝐄_𝐐[ [𝐏(𝐻₁=𝘩₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻₁))· … ·𝐏(𝐻_𝑡=𝘩_𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻_𝑡))·𝐏(𝐸₁=𝑒₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸_𝑠))· … ·𝐏(𝐸_𝑠=𝑒_𝑠|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸_𝑠))] / [𝐐(𝐻₁=𝘩₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻₁))· … ·𝐐(𝐻_𝑡=𝘩_𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻_𝑡))] ]

𝐏(𝑬=𝒆) = 𝐄_𝐐[ [𝐏(𝐻₁=𝘩₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻₁))· … ·𝐏(𝐻_𝑡=𝘩_𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻_𝑡))·𝐏(𝐸₁=𝑒₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸_𝑠))· … ·𝐏(𝐸_𝑠=𝑒_𝑠|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸_𝑠))] / [𝐏(𝐻₁=𝘩₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻₁))· … ·𝐏(𝐻_𝑡=𝘩_𝑡|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐻_𝑡))] ] # 𝐐 is over the same distribution as 𝐏

𝐏(𝑬=𝒆) = 𝐄_𝐐[ 𝐏(𝐸₁=𝑒₁|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸_𝑠))· … ·𝐏(𝐸_𝑠=𝑒_𝑠|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝐸_𝑠)) ]

𝐏(𝑬=𝒆) = 𝐄_𝐐[ 𝐏(𝑬=𝒆|𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑬)) ]

generate 𝑛 samples from proposal distribution 𝐐 and estimate 𝐏(𝑬=𝒆) using the following Monte Carlo estimate:

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛} [ 𝐏(𝑬=𝒆 | 𝑝𝑎𝑟𝑒𝑛𝑡𝑠-𝑜𝑓(𝑬) 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑤𝑖𝑡𝘩 𝑣𝑎𝑙𝑢𝑒𝑠 𝑑𝑟𝑎𝑤𝑛 𝑓𝑟𝑜𝑚 𝑝𝑟𝑜𝑝𝑜𝑠𝑎𝑙 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑜𝑟 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) ]

given evidence 𝑬=𝒆, we clamp the bayesian network

Solving Probability of Evidence Query of a Bayesian Network - Example

Click here to expand...

given a bayesian network below

given evidence (𝑔=0, 𝑠=0), we clamp the conditional probability tables of the bayesian network

compute proposal distribution 𝐐 defined over the clamped bayesian network

𝐏ˆ(𝑬=𝒆) = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒-𝑜𝑓[ 𝐏(𝑠𝑎𝑚𝑝𝑙𝑒, 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) / 𝐐(𝑠𝑎𝑚𝑝𝑙𝑒) ]

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛} [ 𝐏(𝑠𝑎𝑚𝑝𝑙𝑒^𝑖, 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒) / 𝐐(𝑠𝑎𝑚𝑝𝑙𝑒^𝑖) ]

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛} [ 𝐏(𝑯=𝒉^𝑖, 𝑬=𝒆)/𝐐(𝑯=𝒉^𝑖) ]

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛} [ 𝐏(𝐷=𝑑^𝑖, 𝐼=𝑖^𝑖, 𝐺=0, 𝐿=𝑙^𝑖, 𝑆=0) / 𝐐(𝐷=𝑑^𝑖, 𝐼=𝑖^𝑖, 𝐿=𝑙^𝑖) ]

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛}[ [𝐏(𝐷=𝑑^𝑖)·𝐏(𝐼=𝑖^𝑖)·𝐏(𝐺=0|𝐷=𝑑^𝑖,𝐼=𝑖^𝑖)·𝐏(𝐿=𝑙^𝑖|𝐺=0)·𝐏(𝑆=0|𝐼=𝑖^𝑖)] / [𝐐(𝐷=𝑑^𝑖)·𝐐(𝐼=𝑖^𝑖)·𝐐(𝐿=𝑙^𝑖|𝐺=0)] ]

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛}[ [𝐏(𝐷=𝑑^𝑖)·𝐏(𝐼=𝑖^𝑖)·𝐏(𝐺=0|𝐷=𝑑^𝑖,𝐼=𝑖^𝑖)·𝐏(𝐿=𝑙^𝑖|𝐺=0)·𝐏(𝑆=0|𝐼=𝑖^𝑖)] / [𝐏(𝐷=𝑑^𝑖)·𝐏(𝐼=𝑖^𝑖)·𝐏(𝐿=𝑙^𝑖|𝐺=0)] ]

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛}[ 𝐏(𝐺=0|𝐷=𝑑^𝑖,𝐼=𝑖^𝑖)·𝐏(𝑆=0|𝐼=𝑖^𝑖) ]

bayesian network

clamped bayesian network

𝐏ˆ(𝑬=𝒆) = (1/𝑛) 𝛴_{1≤𝑖≤𝑛} [ 𝐏(𝐺=0|𝐷=𝑑^𝑖,𝐼=𝑖^𝑖)·𝐏(𝑆=0|𝐼=𝑖^𝑖) ]

sample 𝑖=1 = (𝐷=0, 𝐼=0, 𝐿=0)

𝐏(𝐺=0|𝐷=0,𝐼=0)·𝐏(𝑆=0|𝐼=0) = 0.3 * 0.95 = 0.285

sample 𝑖=2 = (𝐷=0, 𝐼=1, 𝐿=1)

𝐏(𝐺=0|𝐷=0,𝐼=1)·𝐏(𝑆=0|𝐼=1) = 0.9 * 0.2 = 0.18

and so on…

𝐏ˆ(𝑬=𝒆) = (1/𝑛) (0.285 + 0.18 + … + 𝑛^𝑡𝘩 𝑠𝑎𝑚𝑝𝑙𝑒)

NOTE: performs POORLY when evidence is at the leaves

／var／log marcus chiu

Explorer

Likelihood Weighting Sampling

Likelihood Weighting Sampling

Solving Posterior Conditional Query

Solving Probability of Evidence Query

Likelihood Weighting - Limitations

Likelihood Weighting or Importance Sampling

Solving Probability of Evidence Query of a Bayesian Network

Solving Probability of Evidence Query of a Bayesian Network - Example

bayesian network

clamped bayesian network

／var／logmarcus chiu

Explorer

Likelihood Weighting Sampling

Likelihood Weighting Sampling

Solving Posterior Conditional Query

Solving Probability of Evidence Query

Likelihood Weighting - Limitations

Likelihood Weighting or Importance Sampling

Solving Probability of Evidence Query of a Bayesian Network

Solving Probability of Evidence Query of a Bayesian Network - Example

bayesian network

clamped bayesian network

／var／log marcus chiu