|
Histogram
|
Classification
|
|---|
|
To estimate univariate probability density distribution 𝐏(𝑋=𝑥)
|
|---|
|
Given a set of 𝑛 samples 𝐷={𝑥1, 𝑥2, …, 𝑥𝑛} i.i.d drawn from a random variable 𝑋
|
- 𝐏ˆℎ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴1≤𝑖≤𝑛𝛴𝑏𝑖𝑛∊𝐵𝐼𝑁𝑆[𝐼(𝑥𝑖∊𝑏𝑖𝑛)·𝐼(𝑥∊𝑏𝑖𝑛)]
- 𝐏ˆℎ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝑐𝑜𝑢𝑛𝑡(𝐷=𝑥)
where:
- ℎ ≥ 0 - bin-width
- 𝑛 - total number of observed samples
- 𝐵𝐼𝑁𝑆 - set of all bins
- 𝐼() - indicator function, evaluates to 1 when true, 0 when false
- 𝑐𝑜𝑢𝑛𝑡(𝐷=𝑥) - total number of observed samples with 𝑥
|
- 𝐏ˆℎ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴1≤𝑖≤𝑛𝑘ℎ(𝑥𝑖,𝑥)
where:
- ℎ > 0 - band-width
- 𝑘(𝑥𝑖,𝑥) - a univariate kernel function
- ∫𝛴1≤𝑖≤𝑛𝑘ℎ(𝑥𝑖,𝑥)𝑑𝑥 = 1
|
|
To estimate a joint probability density distribution 𝐏(𝑋=𝑥,𝑍=𝑧)
|
|---|
|
Given a set of 𝑛 samples 𝐷={(𝑥1,𝑧1), (𝑥2,𝑧2), …, (𝑥𝑛,𝑧𝑛)} i.i.d drawn from the joint distribution of 𝑋 & 𝑍
|
- 𝐏ˆℎ𝑥·ℎ𝑧(𝑋=𝑥,𝑍=𝑧) = 1/[ℎ𝑥·ℎ𝑧·𝑛]・𝛴1≤𝑖≤𝑛𝛴𝑏𝑖𝑛∊𝐵𝐼𝑁𝑆[𝐼((𝑥𝑖,𝑧𝑖)∊𝑏𝑖𝑛)·𝐼((𝑥,𝑧)∊𝑏𝑖𝑛)]
where:
- ℎ𝑥 ≥ 0 - bin-width on 𝑥 axis
- ℎ𝑧 ≥ 0 - bin-width on 𝑧 axis
|
- 𝐏ˆℎ(𝑋=𝑥,𝑍=𝑧) = 1/[ℎ2·𝑛]・𝛴1≤𝑖≤𝑛𝑘ℎ((𝑥𝑖,𝑧𝑖),(𝑥,𝑧))
where:
|
|
To estimate the conditional probability density 𝐏(𝑋=𝑥|𝑌=𝑦)
|
|---|
- 𝐏ˆ(𝑋=𝑥|𝑌=𝑦) = 1/[ℎ·𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦)]・𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥,𝑌=𝑦)
where:
- ℎ > 0 - a parameter called bandwidth
- 𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦) - total number of observed samples with 𝑦
- 𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥,𝑌=𝑦) - total number of observed samples with 𝑥 and 𝑦
|
- 𝐏ˆ(𝑋=𝑥|𝑌=𝑦) = 1/[ℎ·𝑐𝑜𝑢𝑛𝑡(𝑌=𝑦)]・𝛴(𝑥𝑖,𝑦𝑖)∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑘[(𝑥𝑖,𝑦𝑖),(𝑥,𝑦)]
where:
|
|
Example estimate of probability distribution 𝐏(𝑋=𝑥)
|
|---|
|
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---model-comparisons/histogram-vs-kde/comparison_of_1d_histogram_and_kde.png)
the red-dotted-line represents a gaussian-kernel-function for each observation
|
- 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥)
where:
- 𝑛 = 6
- ℎ = 2 # here we chose 2 but we could choose any number greater than 0
evaluation:
- 𝐏ˆ(𝑋=𝑥) = 1/12・𝑐𝑜𝑢𝑛𝑡(𝑋=𝑥)
-
- 𝐏ˆ(𝑋=-4) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=-3.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=-3) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=-2.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=-2) = 1/12・2 = 1/6
- 𝐏ˆ(𝑋=-1.5) = 1/12・2 = 1/6
- 𝐏ˆ(𝑋=-1) = 1/12・2 = 1/6
- 𝐏ˆ(𝑋=-1.5) = 1/12・2 = 1/6
- 𝐏ˆ(𝑋=0) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=0.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=1) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=1.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=2) = 1/12・0 = 0
- 𝐏ˆ(𝑋=3) = 1/12・0 = 0
- 𝐏ˆ(𝑋=4) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=4.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=5.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=6) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=6.5) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=7) = 1/12・1 = 1/12
- 𝐏ˆ(𝑋=7.5) = 1/12・1 = 1/12
|
- 𝐏ˆ(𝑋=𝑥) = 1/[ℎ·𝑛]・𝛴𝑥𝑖∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑘(𝑥𝑖,𝑥)
where:
- 𝑛 = 6
- ℎ = 0.5 # here we chose 0.5 but we could choose any number greater than 0
- 𝑘(𝑥𝑖,𝑥) = 𝑒𝑥𝑝(-𝛾·||𝑥𝑖-𝑥||2) # in this case we use a gaussian kernel, but we could choose any other kernel function
evaluation:
- 𝐏ˆ(𝑋=𝑥) = 1/3・𝛴𝑥𝑖∊𝑎𝑙𝑙-𝑠𝑎𝑚𝑝𝑙𝑒𝑠𝑒𝑥𝑝(-𝛾·||𝑥𝑖-𝑥||2)
|
|
Choice of Bandwidth h & Bias-Variance Tradeoff
|
|---|
|
TODO
|
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---model-comparisons/histogram-vs-kde/kde-choice-of-bandwidth-bias-variance-tradeoff.png)
|