Histogram vs KDE - Estimating Probability Density Functions

Histogram

Classification

To estimate univariate probability density distribution 𝐏(𝑋=𝑥)

Given a set of 𝑛 samples 𝐷={𝑥1, 𝑥2, …, 𝑥𝑛} i.i.d drawn from a random variable 𝑋

  • 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴1≤𝑖≤𝑛𝛴𝑏𝑖𝑛∊𝐵𝐼𝑁𝑆[𝐼(𝑥𝑖∊𝑏𝑖𝑛)·𝐼(𝑥∊𝑏𝑖𝑛)]
  • 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝑐𝑜𝑢𝑛𝑡(𝐷=𝑥)

where:

  • ℎ ≥ 0 - bin-width
  • 𝑛 - total number of observed samples
  • 𝐵𝐼𝑁𝑆 - set of all bins
  • 𝐼() - indicator function, evaluates to 1 when true, 0 when false
  • 𝑐𝑜𝑢𝑛𝑡(𝐷=𝑥) - total number of observed samples with 𝑥
  • 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴1≤𝑖≤𝑛𝑘(𝑥𝑖,𝑥)

where:

  • ℎ > 0 - band-width
  • 𝑘(𝑥𝑖,𝑥) - a univariate kernel function
  • ∫𝛴1≤𝑖≤𝑛𝑘(𝑥𝑖,𝑥)𝑑𝑥 = 1

To estimate a joint probability density distribution 𝐏(𝑋=𝑥,𝑍=𝑧)

Given a set of 𝑛 samples 𝐷={(𝑥1,𝑧1), (𝑥2,𝑧2), …, (𝑥𝑛,𝑧𝑛)} i.i.d drawn from the joint distribution of 𝑋 & 𝑍

  • 𝐏ˆℎ𝑥·ℎ𝑧(𝑋=𝑥,𝑍=𝑧) = 1/[ℎ𝑥·ℎ𝑧·𝑛]・𝛴1≤𝑖≤𝑛𝛴𝑏𝑖𝑛∊𝐵𝐼𝑁𝑆[𝐼((𝑥𝑖,𝑧𝑖)∊𝑏𝑖𝑛)·𝐼((𝑥,𝑧)∊𝑏𝑖𝑛)]

where:

  • 𝑥 ≥ 0 - bin-width on 𝑥 axis
  • 𝑧 ≥ 0 - bin-width on 𝑧 axis
  • 𝐏ˆ(𝑋=𝑥,𝑍=𝑧) = 1/[ℎ2·𝑛]・𝛴1≤𝑖≤𝑛𝑘((𝑥𝑖,𝑧𝑖),(𝑥,𝑧))

where:

To estimate the conditional probability density 𝐏(𝑋=𝑥|𝑌=𝑦)

  • 𝐏ˆ(𝑋=𝑥|𝑌=𝑦) = 1/[ℎ·𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦)]・𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥,𝑌=𝑦)

where:

  • ℎ > 0 - a parameter called bandwidth
  • 𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦) - total number of observed samples with 𝑦
  • 𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥,𝑌=𝑦) - total number of observed samples with 𝑥 and 𝑦
  • 𝐏ˆ(𝑋=𝑥|𝑌=𝑦) = 1/[ℎ·𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦)]・𝛴(𝑥𝑖,𝑦𝑖)∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑘[(𝑥𝑖,𝑦𝑖),(𝑥,𝑦)]

where:

Example estimate of probability distribution 𝐏(𝑋=𝑥)

the red-dotted-line represents a gaussian-kernel-function for each observation

  • 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥)

where:

  • 𝑛 = 6
  • ℎ = 2 # here we chose 2 but we could choose any number greater than 0

evaluation:

  • 𝐏ˆ(𝑋=𝑥) = 1/12・𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥)
  • 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴𝑥𝑖∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑘(𝑥𝑖,𝑥)

where:

  • 𝑛 = 6
  • ℎ = 0.5 # here we chose 0.5 but we could choose any number greater than 0
  • 𝑘(𝑥𝑖,𝑥) = 𝑒𝑥𝑝(-𝛾·||𝑥𝑖-𝑥||2) # in this case we use a gaussian kernel, but we could choose any other kernel function

evaluation:

  • 𝐏ˆ(𝑋=𝑥) = 1/3・𝛴𝑥𝑖∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑒𝑥𝑝(-𝛾·||𝑥𝑖-𝑥||2)

Choice of Bandwidth h & Bias-Variance Tradeoff

TODO

Resources