Central Limit Theorem
- states that, given a sufficiently large sample size of i.i.d. samples, the sampling distribution of the mean (or sums and averages) for a random variable will approximate a normal distribution regardless of that variable’s distribution in the population (with the exception of distributions with infinite variance such as the Cauchy distribution)
CLT - Sample Size
Click here to expand...
CLT states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?
It depends on the shape of the variable’s distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. We’ll see the sample size aspect in action during the empirical demonstration below.
As the sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens
CLT - Formal Definition (Form 1)
Let {𝑋1, …, 𝑋𝑛} be i.i.d random samples taken from a probability distribution with:
- 𝐄(𝑋𝑖) = 𝜇
- 𝑉𝑎𝑟(𝑋𝑖) = 𝜎2
Let
Indent
𝑆𝑛 = 𝑋1+ … + 𝑋𝑛
(𝑆𝑛/𝑛 or 𝑋̅) is approximately Normal(𝜇, 𝜎2/𝑛) for sufficiently large sample size 𝑛
intuition and computation
Say we have a random variable 𝑆𝑛 that is a sum of a sequence of i.i.d random variables:
𝑆𝑛= 𝑋1+ … + 𝑋𝑛
for every random variable 𝑋𝑖 let:
- 𝐄(𝑋𝑖) = 𝜇 = population-mean
- 𝑉𝑎𝑟(𝑋𝑖) = 𝜎2= population variance
Distributions of (𝑆𝑛, 𝑆𝑛/𝑛, 𝑆𝑛/√𝑛) for Large 𝑛?
- (𝑆𝑛 = 𝑛𝑋̅) is approximately Normal(𝑛𝜇, 𝑛𝜎2)
Click here to expand... 𝑛) = 𝐄(𝑛𝑋̅)
- 𝐄(𝑆𝑛) = 𝐄(𝑆𝑛)
- 𝐄(𝑆𝑛) = 𝐄(𝑋1) + … + 𝐄(𝑋𝑛)
- 𝐄(𝑆𝑛) = 𝜇 + … + 𝜇
- 𝐄(𝑆𝑛) = 𝑛𝜇
𝑉𝑎𝑟(𝑆𝑛) = 𝑉𝑎𝑟(𝑛𝑋̅)
- 𝑉𝑎𝑟(𝑆𝑛) = 𝑉𝑎𝑟(𝑆𝑛)
- 𝑉𝑎𝑟(𝑆𝑛) = 𝑉𝑎𝑟(𝑋1) + … + 𝑉𝑎𝑟(𝑋𝑛)
- 𝑉𝑎𝑟(𝑆𝑛) = 𝜎2 + … + 𝜎2
- 𝑉𝑎𝑟(𝑆𝑛) = 𝑛𝜎2
𝐄(𝑆
- (𝑆𝑛/𝑛 = 𝑋̅) is approximately Normal(𝜇, 𝜎2/𝑛)
Click here to expand... 𝑛/𝑛) = 𝐄(𝑋̅)
- 𝐄(𝑆𝑛/𝑛) = 𝐄(𝑆𝑛/𝑛)
- 𝐄(𝑆𝑛/𝑛) = 𝐄(𝑆𝑛) / 𝑛
- 𝐄(𝑆𝑛/𝑛) = [𝐄(𝑋1) + … + 𝐄(𝑋𝑛) ] / 𝑛
- 𝐄(𝑆𝑛/𝑛) = [𝜇 + … + 𝜇] / 𝑛
- 𝐄(𝑆𝑛/𝑛) = 𝜇
𝑉𝑎𝑟(𝑆𝑛/𝑛) = 𝑉𝑎𝑟(𝑋̅)
- 𝑉𝑎𝑟(𝑆𝑛/𝑛) = 𝑉𝑎𝑟(𝑆𝑛/𝑛)
- 𝑉𝑎𝑟(𝑆𝑛/𝑛) =[𝑉𝑎𝑟(𝑆𝑛)] / 𝑛2
- 𝑉𝑎𝑟(𝑆𝑛/𝑛) = [𝑉𝑎𝑟(𝑋1) + … + 𝑉𝑎𝑟(𝑋𝑛)] / 𝑛2
- 𝑉𝑎𝑟(𝑆𝑛/𝑛) =[𝜎2 + … + 𝜎2] / 𝑛2
- 𝑉𝑎𝑟(𝑆𝑛/𝑛) = 𝑛𝜎2 / 𝑛2
- 𝑉𝑎𝑟(𝑆𝑛/𝑛) = 𝜎2 / 𝑛
𝐄(𝑆
- (𝑆𝑛/√𝑛 = √𝑛̅𝑋̅) is approximately Normal(√𝑛̅𝜇, 𝜎2)
Click here to expand... 𝐄(𝑆𝑛/√𝑛) = 𝐄(√𝑛̅𝑋̅)
- 𝐄(𝑆𝑛/√𝑛) = 𝐄(𝑆𝑛/√𝑛)
- 𝐄(𝑆𝑛/√𝑛) = 𝐄(𝑆𝑛) / √𝑛
- 𝐄(𝑆𝑛/√𝑛) = [𝐄(𝑋1) + … + 𝐄(𝑋𝑛) ] / √𝑛
- 𝐄(𝑆𝑛/√𝑛) = [𝜇 + … + 𝜇] / √𝑛
- 𝐄(𝑆𝑛/√𝑛) = √𝑛̅𝜇
𝑉𝑎𝑟(𝑆𝑛/√𝑛) = 𝑉𝑎𝑟(√𝑛̅𝑋̅)
- 𝑉𝑎𝑟(𝑆𝑛/√𝑛) =𝑉𝑎𝑟(𝑆𝑛/√𝑛)
- 𝑉𝑎𝑟(𝑆𝑛/√𝑛) =[𝑉𝑎𝑟(𝑆𝑛)] / 𝑛
- 𝑉𝑎𝑟(𝑆𝑛/√𝑛) =[𝑉𝑎𝑟(𝑋1) + … + 𝑉𝑎𝑟(𝑋𝑛)] / 𝑛
- 𝑉𝑎𝑟(𝑆𝑛/√𝑛) =[𝜎2 + … + 𝜎2] / 𝑛
- 𝑉𝑎𝑟(𝑆𝑛/√𝑛) =𝑛𝜎2/ 𝑛
- 𝑉𝑎𝑟(𝑆𝑛/√𝑛) =𝜎2
CLT - Importance
CLT is vital in statistics for 2 main reasons:
- z-scores
Click here to expand...
the standardized-value or z-score (𝑍𝑛) of the sample mean (𝑆𝑛/𝑛)is as follows:
Indent
/2.png)
[!list-indent-undo]
𝑍𝑛 measures how many √𝑛) the 𝑛) is below or above the population mean (𝜇)
As 𝑛→∞, 𝑍𝑛 converges in distribution to a Standard Normal random variable (for all 𝑧):
Indent
/3.png)
[!list-indent-undo]
where:
- 𝜱(𝑧) - is the integral of Standard Normal Distribution from -∞ to 𝑧
This theorem can be applied to random variables {𝑋1, 𝑋2, … } of ANY distribution with finite expectation and variance. As long as 𝑛 is sufficiently large, one can use Normal Distribution to compute probabilities about the random variable 𝑆𝑛 or 𝑋̅.
- normality assumption
Click here to expand...
The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. Consequently, you might think that these tests are not valid when the data are non-normally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributed—as long as your sample size is large enough.
You might have heard that parametric tests of the mean are robust to depart from the normality assumption when your sample size is sufficiently large. That’s thanks to the central limit theorem!
- precision of the estimates
Click here to expand...
sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.
Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, it’s not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates
Some Probability Distributions of Form 𝑆𝑛
- Binomial variable = sum of independent Bernoulli variables
- Negative Binomial variable = sum of independent Geometric variables
- Gamma variable = sum of independent Exponential variables
Hence, the Central Limit Theorem applies to all these distributions with sufficiently large:
- 𝑛 for Binomial variables
- 𝑘 for Negative Binomial variables
- 𝛼 for Gamma variables
Examples
Example 1 (Allocation of disk space)
A disk has a free space of 330 megabytes. Is it likely to be sufficient for 300 independent images, if each image has an expected size of 1 megabyte with a standard deviation of 0.5 megabytes?
We have:
- 𝑛 = 300
- 𝜇 = 1
- 𝜎 = 0.5
The number of images 𝑛 is large, so the Central Limit Theorem applies to their total size 𝑆𝑛. Then
Again, ɸ(𝑧) is the integral of Standard Normal Distribution from -∞ to 𝑧
This probability is very high, hence, the available disk space is very likely to be sufficient
Example 2 (Elevator)
You wait for an elevator, whose capacity is 2000 pounds. The elevator comes with 10 adult passengers. Suppose your own weight is 150 lbs, and you heard that human weights are normally distributed with a mean of 165 lbs and a standard deviation of 20 lbs. Would you board this elevator or wait for the next one?
we have:
- 𝑛 = 10
- 𝜇 = 165
- 𝜎 = 20
The probability of an overload equals
So, with a probability of 0.9992, it is safe to take this elevator
/central-limit-theorem-example.png)
/central-limit-theorem-example-2.png)