Hypergeometric Distribution Derivation
If there are ๐ balls total, of which ๐พ are successes and ๐-๐พ are failures, and ๐ balls are randomly selected WITHOUT REPLACEMENT, then the probability that exactly ๐ of ๐ balls are success is given by:
To see where this formula comes from, label the balls:
- ๐ = {๐ 1,ย ๐ 2, โฆ,ย ๐ ๐พ, ๐๐พ+1,ย ๐๐พ+2, โฆ,ย ๐๐}
Now our sample space ๐ฎ consists of all subsets of ๐ of size ๐:
- ๐ฎ = {๐ โ๐ | |๐ |=๐}
The size of ๐ฎ is equal to the number of ways to choose ๐ distinct balls from a size of ๐:
This is where the denominator comes from.
Next we need to enumerate those ๐-subsets in which exactly ๐ successes. To do this, we note that any such set would not only have exactly ๐ successes, but also exactly (๐-๐) failures. How many ways are there to choose ๐ successes from ๐ and ๐-๐ failures from ๐? There are:
- (๐พ๐ถ๐) ways to choose ๐ successes
- (๐-๐พ๐ถ๐-๐) ways to choose ๐-๐ failures
The choices above are independent from each other; thus the toal number of desired outcomes is simply the product of each individual event, so the total number of ways to have exactly ๐ successes from ๐ samples is:
- (๐พ๐ถ๐)ย (๐-๐พ๐ถ๐-๐)
The numerator probability