Hypergeometric Distribution Derivation

If there are ๐‘ balls total, of which ๐พ are successes and ๐‘-๐พ are failures, and ๐‘› balls are randomly selected WITHOUT REPLACEMENT, then the probability that exactly ๐‘˜ of ๐‘› balls are success is given by:

To see where this formula comes from, label the balls:

  • ๐‘† = {๐‘…1,ย ๐‘…2, โ€ฆ,ย ๐‘…๐พ, ๐‘Š๐พ+1,ย ๐‘Š๐พ+2, โ€ฆ,ย ๐‘Š๐‘}

Now our sample space ๐’ฎ consists of all subsets of ๐‘† of size ๐‘›:

  • ๐’ฎ = {๐‘ โІ๐‘† | |๐‘ |=๐‘›}

The size of ๐’ฎ is equal to the number of ways to choose ๐‘› distinct balls from a size of ๐‘:

This is where the denominator comes from.

Next we need to enumerate those ๐‘›-subsets in which exactly ๐‘˜ successes. To do this, we note that any such set would not only have exactly ๐‘˜ successes, but also exactly (๐‘›-๐‘˜) failures. How many ways are there to choose ๐‘˜ successes from ๐‘† and ๐‘›-๐‘˜ failures from ๐‘†? There are:

  • (๐พ๐ถ๐‘˜) ways to choose ๐‘˜ successes
  • (๐‘-๐พ๐ถ๐‘›-๐‘˜) ways to choose ๐‘›-๐‘˜ failures

The choices above are independent from each other; thus the toal number of desired outcomes is simply the product of each individual event, so the total number of ways to have exactly ๐‘˜ successes from ๐‘› samples is:

  • (๐พ๐ถ๐‘˜)ย (๐‘-๐พ๐ถ๐‘›-๐‘˜)

The numerator probability