Dependent Sampling (Markov Chain Monte Carlo (MCMC) Methods)
  • is where the next value sampled from a distribution is DEPENDENT on the previous sample (as oppose to Independent Sampling)

Dependent Sampling - Types

TODO

assume we want to sample from a probability distribution:

where:

  • 𝜃 - is a single random variable (later it can be extended to multiple variables)
  • 𝑋 - is a set of observed variables

by bayesian rule we have:

  • 𝐏(𝜃|𝑋) = 𝐏(𝑋|𝜃)𝐏(𝜃) / 𝐏(𝑋)

where:

  • 𝐏(𝜃|𝑋) - the probability distribution in question
  • 𝐏(𝑋- denominator is the normalizing constant which scales the distribution 𝐏(𝜃|𝑋)
  • 𝐏(𝑋|𝜃)𝐏(𝜃) - numerator determines the SHAPE of the distribution 𝐏(𝜃|𝑋)

computing 𝐏(𝑋) takes a long time and we typically want to avoid that

we could use dependent sampling to sample from a distribution without knowing the value of the normalizing constant 𝐏(𝑋) of the probability distribution 𝐏(𝜃|𝑋)

  • 𝐏(𝜃|𝑋)𝐏(𝑋|𝜃)𝐏(𝜃) / 𝐏(𝑋)
  • 𝐏(𝜃|𝑋) ∝ 𝐏(𝑋|𝜃)𝐏(𝜃)

the graph below shows 3 distributions of 𝐏(𝜃|𝑋) where each are divided by a different normalizing constant

diagram.draw.io

notice that:

  • the shape between the 3 distributions are similar. The numerator 𝐏(𝑋|𝜃)𝐏(𝜃) determines the shape.
  • the vertical scale between the 3 distributions are different. This scale is determined by the denominator 𝐏(𝑋)

all we need is to determine the shape, which is the distribution of 𝐏(𝑋|𝜃)𝐏(𝜃). We don’t care about the scale part, in other words no need to determine the normalizing constant 𝐏(𝑋)

Comparing Probability Value Between 𝜃1 and 𝜃2

we want to determine the probability ratio:

  • 𝐏(𝜃1|𝑋) / 𝐏(𝜃2|𝑋)

using Baye’s Rule:

  • 𝐏(𝜃1|𝑋) / 𝐏(𝜃2|𝑋) = [𝐏(𝑋|𝜃1)𝐏(𝜃1)/𝐏(𝑋)][𝐏(𝑋|𝜃2)𝐏(𝜃2)/𝐏(𝑋)]
  • 𝐏(𝜃1|𝑋) / 𝐏(𝜃2|𝑋) = [𝐏(𝑋|𝜃1)𝐏(𝜃1)] / [𝐏(𝑋|𝜃2)𝐏(𝜃2)]

notice we don’t need to compute the normalizing constant 𝐏(𝑋)

notice that the ratio of un-normalized posteriors [𝐏(𝑋|𝜃1)𝐏(𝜃1)] / [𝐏(𝑋|𝜃2)𝐏(𝜃2)] reflects the ratio of normalized posteriors 𝐏(𝜃1|𝑋) / 𝐏(𝜃2|𝑋)

therefore, we could design a sampler that uses the ratio of un-normalized posteriors at 2 different points in parameter space [𝐏(𝑋|𝜃1)𝐏(𝜃1)] / [𝐏(𝑋|𝜃2)𝐏(𝜃2)] and we use that ratio to govern how we step around parameter space. This would give us an exact view of whats happening in the ratio of normalized posterior space 𝐏(𝜃1|𝑋) / 𝐏(𝜃2|𝑋). In other words we could sample from 𝐏(𝜃1|𝑋) / 𝐏(𝜃2|𝑋) solely from [𝐏(𝑋|𝜃1)𝐏(𝜃1)] / [𝐏(𝑋|𝜃2)𝐏(𝜃2)]

see video at time 7:23

Dependent Sampling - General Algorithm

do a series of local steps around parameter space

see video at time 7:23