- prior predictive distribution - is the probability distribution of a data sample WITHOUT observed data sample(s)
- posterior predictive distribution - is the probability distribution of a data sample GIVEN observed data sample(s)
Prior Predictive Distribution
a prior predictive distribution is denoted as 𝐏(𝑋) where:
- 𝑋 - data sample
continuous prior predictive distribution
- 𝐏(𝑋) = ∫𝜃∊𝜣𝐏(𝑋,𝜃)𝑑𝜃
- 𝐏(𝑋) = ∫𝜃∊𝜣𝐏(𝑋|𝜃)𝐏(𝜃)𝑑𝜃
where:
- 𝐏(𝑋|𝜃) - is the likelihood function
- 𝐏(𝜃) - is the prior distribution
therefore, when we multiply a prior distribution with the likelihood function and integrate over the range of 𝜃 values, we get the prior predictive distribution
Posterior Predictive Distribution
a posterior predictive distribution is denoted as 𝐏(𝑋ˆ|𝑋) where:
- 𝑋ˆ - new data sample
- 𝑋 - observed/given data sample
continuous posterior predictive distribution
- 𝐏(𝑋ˆ|𝑋) = ∫𝜃∊𝜣𝐏(𝑋ˆ,𝜃|𝑋)𝑑𝜃
- 𝐏(𝑋ˆ|𝑋) = ∫𝜃∊𝜣𝐏(𝑋ˆ|𝜃,𝑋)𝐏(𝜃|𝑋)𝑑𝜃
- 𝐏(𝑋ˆ|𝑋) = ∫𝜃∊𝜣𝐏(𝑋ˆ|𝜃)𝐏(𝜃|𝑋)𝑑𝜃 # normally when we condition on 𝜃, 𝑋ˆ is conditionally independent to 𝑋 given 𝜃
where:
- 𝐏(𝜃|𝑋) - is the posterior distribution (difference between: Posterior Predictive Distribution vs Posterior Distribution)
- 𝐏(𝑋ˆ|𝜃) - is the likelihood function
therefore, when we multiply a posterior distribution with the likelihood function and integrate over the range of 𝜃 values, we get the posterior predictive distribution