Bayesian Inference uses the posterior distribution to infer the values of parameters. Like Inferential Statistics Forms, Bayesian Inference has 3 forms:

related: Bayes’ Rule

Bayesian Priors

priors can be represented in different forms such as:

  • explicitly expressed in probability distributions over parameters of the model
  • the direct influence of the function itself and only indirectly acting on the parameters via their effect on the function
  • implicitly expressed by choosing algorithms that are biased toward choosing some class of functions over another (smoothness prior or local constancy prior)

Bayesian Estimation/Inference Steps

  • before observing the dataset, we represent knowledge of 𝜃 with a prior probability distribution 𝐏(𝜃)
  • usually, the selected prior distribution has “high entropy” to reflect a high degree of uncertainty in the value of 𝜃 (e.g. uniform distribution or Gaussian distribution with high entropy)
  • next, the observation of data causes the posterior to lose entropy and concentrate around a few highly likely values of the parameters

Bayesian methods typically generalize much better when limited training data is available, but typically suffer from high computational cost when the number of training examples is large

Other

TODO