In statistics, KDE/KDC is a type of non-parametric regression to estimate the density function of a random variable(s)

KDE/KDC - How it Works & Compared to Histogram

Must See: Histogram vs KDE

  • 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴𝑥𝑖∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑘(𝑥𝑖,𝑥)
  • 𝐏ˆ(𝑋=𝑥|𝑌=𝑦) = 1/[ℎ·𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦)]・𝛴(𝑥𝑖,𝑦𝑖)∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑘[(𝑥𝑖,𝑦𝑖),(𝑥,𝑦)]

where:

Bandwidth (ℎ)

Kernel Function (𝑘)

Kernel Density Estimate (KDE) with different bandwidths of a random sample of 100 points from a standard normal distribution:

  • grey: true density (standard normal)
  • red: KDE with ℎ=0.05
  • black: KDE with ℎ=0.337
  • green: KDE with ℎ=2

see Kernel Functions (Similarity Functions)

KDE/KDC - Types

Histogram

k-Nearest Neighbors

  • KDE with a uniform kernel with variable bandwidth to encompass 𝑘 nearest neighbors

Gaussian Process Regression (GPR) - Kriging

Kernel Regression

  • a non-parametric regression technique in statistics to estimate the conditional expectation of a random variable

KDE/KDC - Resources