original article: https://www.probabilisticworld.com/what-is-bayes-theorem/
Events & Probabilities
- events - one way to think about an event is an outcome, or a set of outcomes, of some general process
- probabilities - the probability of an event is a number that, intuitively speaking, represents the uncertainty associated with the event’s occurrence
/what-is-bayes'-theorem/probability-rain.png)
Conditional Probabilities
- conditional probabilities expresses the probability that Event-1 will occur when you (assume or know) that Event-2 has already occurred
/what-is-bayes'-theorem/conditional-probability-rain.png)
Bayes’ Theorem Connects Probabilities & Conditional Probabilities
The British statistician Thomas Bayes first discovered Bayes’ theorem and that’s why it’s named after him. So, let’s finally look at the mathematical statement it makes:
/what-is-bayes'-theorem/bayes-theorem-with-description.png)
The equation consists of four parts and the traditional terminology used for referring to them is:
- P(Event-1) - Prior probability
- P(Event-2) - Evidence
- P(Event-2 | Event-1) - Likelihood
- P(Event-1 | Event-2) - Posterior probability
In words, Bayes’ Theorem asserts that:
the posterior probability of Event-1, given Event-2, can be calculated by multiplying the likelihood and the prior probability terms and dividing their product by the evidence term.
Prior Probability
The prior probability of an event (often simply called the prior) is its probability calculated from some prior information about the event.
The word prior can be somewhat misleading. It’s not immediately clear what the probability is supposed to be prior to. A simple way to to describe it would be as the probability of the event calculated from all the information related to the event that is already known. In the weather example, the prior probability of rain was given as P(“Rain”) = 0.6. This could come (for instance) from the prior knowledge that 60% of the days on the same date have been rainy for the past 100 years.
Here is another way to look at it. A prior probability is always prior with respect to some piece of information that you left out from the calculations. In this example, the information left out when calculating P(“Rain”) = 0.6 is basically everything, except for the past rain frequency for the current date.
Evidence
You started with the prior P(“Rain”) = 0.6 but now you have new information you can use for more accurately (re-)estimating the same probability. The evidence term in Bayes’ theorem refers to the overall probability of this new piece of information.
In the current example, the information used for updating P(“Rain”) was the current weather conditions, so the evidence would be P(“Windy & Cloudy”). That is, the probability of having windy and cloudy weather, regardless of whether the day turns out to be rainy. You can think about it as the average probability of one event across all possibilities for the other events.
Notice that, outside Bayesian tradition, the word “evidence” is most commonly used to refer to the piece of information itself, and not to its probability. This is a good reminder to not be too literal about these terms.
Likelihood
Unlike the previous two terms of the equation, the likelihood represents a conditional probability. In the weather example, this is the probability of having a windy and cloudy morning, given that it ends up raining at least once throughout that day:
- P(“Windy & Cloudy” | “Rain”).
An intuitive way to think about it is as the degree to which the first event is consistent with the second event. That is, the likelihood represents how strongly you expect that the morning will be windy and cloudy, assuming that the day is going to be rainy.
Posterior Probability
The posterior probability (often simply called the posterior) is the conditional probability you calculate when using Bayes’ theorem. It represents the updated prior probability after taking into account some new piece of information. As prior probability is always relative, so is the posterior probability of an event. What this means is that the posterior probability becomes the new prior probability which you can then update using some other piece of information. And the cycle goes on. As Dennis Lindley put it:
Today’s posterior is tomorrow’s prior.
In the weather example, the posterior probability was P(“Rain” | “Windy & Cloudy”): the conditional probability that can convince you to take an umbrella on your way out.