Continuity Correction is needed when we approximate a discrete distribution by a continuous distribution (e.g. approximating Binomial with Normal). When a continuous distribution is approximated by another continuous distribution (e.g approximating Gamma with Normal), the continuity correction is not needed. In fact, it would be an error to use it in this case because it would no longer preserve the probability
Recall that the probability P{X=x} may be positive if X is discrete, whereas it is always 0 for continuous X. Thus, direct use of
Indent
will always approximate this probability by 0. It is obviously a poor approximation.
This is resolved by introducing a continuity correction. Expand the interval by 0.5 units in each direction, then use the Normal approximation. Notice that
Indent
is true for a Binomial variable X; therefore, the continuity correction does not change the event and preserves its probability. It makes a difference for the Normal distribution, so every time when we approximate some discrete distribution with some continuous distribution, we should be using a continuity correction. Now it is the probability of an interval instead of one number, and it is not zero
Example
A new computer virus attacks a folder consisting of 200 files. Each file gets damaged with a probability 0.2 independently of other files.
What is the probability that fewer than 50 files get damaged?
solution
The number X of damaged files has Binomial distribution with:
- n = 200
- p = 0.2
- μ = np = 40
- σ = √[np(1 − p)] = 5.657
compute actual probability using Binomial probability mass function
- 𝑃 {𝑋<50} = ∑0≤𝑖≤49[200 choose 𝑖] * (0.2)(𝑖) * (0.8)(200 - 𝑖)
- 𝑃 {𝑋<50} = 0.95064666..
estimate probability using Central Limit Theorem (CLT) WITHOUT continuity correction
- 𝑃 {𝑋<50} = 𝑃 {(𝑋-40)/5.657 < (50-40)/5.657}
- 𝑃 {𝑋<50} = ɸ(1.767721407)
- 𝑃 {𝑋<50} = 0.9614 ← OFF!!
estimate probability using Central Limit Theorem (CLT) WITH continuity correction
- 𝑃 {𝑋<50} = 𝑃 {𝑋<49.5}
- 𝑃 {𝑋<50} = 𝑃 {(𝑋-40)/5.657 < (49.5-40)/5.657}
- 𝑃 {𝑋<50} = ɸ(1.679335337)
- 𝑃 {𝑋<50} = 0.9535
Notice that the properly applied continuity correction replaces 50 with 49.5, not 50.5. Indeed, we are interested in the event that X is strictly less than 50. This includes all values up to 49 and corresponds to the interval [0, 49] that we expand to [0, 49.5]. In other words, events {X < 50} and {X < 49.5} are the same; they include the same possible values of X. Events {X < 50} and {X < 50.5} are different because the former includes X = 50, and the latter does not. Replacing {X < 50} with {X < 50.5} would have changed its probability and would have given a wrong answer
What is the probability that exactly 50 files get damaged?
Click here to expand...
The number X of damaged files has Binomial distribution with:
- n = 200
- p = 0.2
- μ = np = 40
- σ = √[np(1 − p)] = 5.657
compute actual probability using Binomial probability mass function
- 𝑃 {𝑋=50} = [200 choose 50] * (0.2)50 * (0.8)150
- 𝑃 {𝑋=50} = 0.014856..
estimate probability using Central Limit Theorem (CLT) WITHOUT continuity correction
- 𝑃 {𝑋=50} = 𝑃 {50<𝑋<50}
- 𝑃 {𝑋=50} = 𝑃 {𝑋<50} - 𝑃 {𝑋<50}
- 𝑃 {𝑋=50} = 0 ← WRONG!!
estimate probability using Central Limit Theorem (CLT) WITH continuity correction
- 𝑃 {𝑋=50} = 𝑃 {49.5<𝑋<50.5}
- 𝑃 {𝑋=50} = 𝑃 {𝑋<50.5} - 𝑃 {𝑋<49.5}
- 𝑃 {𝑋=50} = ɸ(1.856107477) - ɸ(1.679335337)
- 𝑃 {𝑋=50} = 0.9683 - 0.9535
- 𝑃 {𝑋=50} = 0.0148

