• posterior distribution 𝐏(𝜃|𝑋) refers to the distribution of parameter(s) 𝜃 given observed data 𝑋
  • predictive posterior distribution 𝐏(𝑋ˆ|𝑋) refers to the distribution of new data 𝑋ˆ given observed data 𝑋

For example in Bayesian linear regression, you learn a posterior distribution over the 𝑤 parameter of the model 𝑦=𝑤𝑋 given some observed data 𝑋. Then when a new unseen data point 𝑥* comes in, you want to find the distribution over possible predictions 𝑦* given the posterior distribution for 𝑤 that you just learned. This distribution over possible 𝑦*‘s given the posterior for 𝑤 is the prediction distribution

  • predictive posterior distribution 
    • is used to predict the label of new data values
    • distribution for future predicted data based on the data you have already seen