Central Limit Theorem

states that, given a sufficiently large sample size of i.i.d. samples, the sampling distribution of the mean (or sums and averages) for a random variable will approximate a normal distribution regardless of that variable’s distribution in the population (with the exception of distributions with infinite variance such as the Cauchy distribution)

CLT - Sample Size

Click here to expand...

CLT states that when you have a sufficiently large sample size, the sampling distribution starts to approximate a normal distribution. How large does the sample size have to be for that approximation to occur?

It depends on the shape of the variable’s distribution in the underlying population. The more the population distribution differs from being normal, the larger the sample size must be. Typically, statisticians say that a sample size of 30 is sufficient for most distributions. However, strongly skewed distributions can require larger sample sizes. We’ll see the sample size aspect in action during the empirical demonstration below.

As the sample size increases, the sampling distribution more closely approximates the normal distribution, and the spread of that distribution tightens

CLT - Formal Definition (Form 1)

Let {𝑋₁, …, 𝑋_𝑛} be i.i.d random samples taken from a probability distribution with:

𝐄(𝑋_𝑖) = 𝜇
𝑉𝑎𝑟(𝑋_𝑖) = 𝜎²

Let

Indent

𝑆_𝑛 = 𝑋₁+ … + 𝑋_𝑛

(𝑆_𝑛/𝑛 or 𝑋̅) is approximately Normal(𝜇, 𝜎²/𝑛) for sufficiently large sample size 𝑛

intuition and computation

Say we have a random variable 𝑆_𝑛 that is a sum of a sequence of i.i.d random variables:

𝑆_𝑛= 𝑋₁+ … + 𝑋_𝑛

for every random variable 𝑋_𝑖 let:

𝐄(𝑋_𝑖) = 𝜇 = population-mean

𝑉𝑎𝑟(𝑋_𝑖) = 𝜎²= population variance

Distributions of (𝑆_𝑛, 𝑆_𝑛/𝑛, 𝑆_𝑛/√𝑛) for Large 𝑛?

(𝑆_𝑛 = 𝑛𝑋̅) is approximately Normal(𝑛𝜇, 𝑛𝜎²)

Click here to expand... _𝑛) = 𝐄(𝑛𝑋̅)

𝐄(𝑆_𝑛) = 𝐄(𝑆_𝑛)

𝐄(𝑆_𝑛) = 𝐄(𝑋₁) + … + 𝐄(𝑋_𝑛)

𝐄(𝑆_𝑛) = 𝜇 + … + 𝜇

𝐄(𝑆_𝑛) = 𝑛𝜇

𝑉𝑎𝑟(𝑆_𝑛) = 𝑉𝑎𝑟(𝑛𝑋̅)

𝑉𝑎𝑟(𝑆_𝑛) = 𝑉𝑎𝑟(𝑆_𝑛)

𝑉𝑎𝑟(𝑆_𝑛) = 𝑉𝑎𝑟(𝑋₁) + … + 𝑉𝑎𝑟(𝑋_𝑛)

𝑉𝑎𝑟(𝑆_𝑛) = 𝜎² + … + 𝜎²

𝑉𝑎𝑟(𝑆_𝑛) = 𝑛𝜎²

𝐄(𝑆

(𝑆_𝑛/𝑛 = 𝑋̅) is approximately Normal(𝜇, 𝜎²/𝑛)

Click here to expand... _𝑛/𝑛) = 𝐄(𝑋̅)

𝐄(𝑆_𝑛/𝑛) = 𝐄(𝑆_𝑛/𝑛)

𝐄(𝑆_𝑛/𝑛) = 𝐄(𝑆_𝑛) / 𝑛

𝐄(𝑆_𝑛/𝑛) = [𝐄(𝑋₁) + … + 𝐄(𝑋_𝑛) ] / 𝑛

𝐄(𝑆_𝑛/𝑛) = [𝜇 + … + 𝜇] / 𝑛

𝐄(𝑆_𝑛/𝑛) = 𝜇

𝑉𝑎𝑟(𝑆_𝑛/𝑛) = 𝑉𝑎𝑟(𝑋̅)

𝑉𝑎𝑟(𝑆_𝑛/𝑛) = 𝑉𝑎𝑟(𝑆_𝑛/𝑛)

𝑉𝑎𝑟(𝑆_𝑛/𝑛) =[𝑉𝑎𝑟(𝑆_𝑛)] / 𝑛²

𝑉𝑎𝑟(𝑆_𝑛/𝑛) = [𝑉𝑎𝑟(𝑋₁) + … + 𝑉𝑎𝑟(𝑋_𝑛)] / 𝑛²

𝑉𝑎𝑟(𝑆_𝑛/𝑛) =[𝜎² + … + 𝜎²] / 𝑛²

𝑉𝑎𝑟(𝑆_𝑛/𝑛) = 𝑛𝜎² / 𝑛²

𝑉𝑎𝑟(𝑆_𝑛/𝑛) = 𝜎² / 𝑛

𝐄(𝑆

(𝑆_𝑛/√𝑛 = √𝑛̅𝑋̅) is approximately Normal(√𝑛̅𝜇, 𝜎²)

Click here to expand... 𝐄(𝑆_𝑛/√𝑛) = 𝐄(√𝑛̅𝑋̅)

𝐄(𝑆_𝑛/√𝑛) = 𝐄(𝑆_𝑛/√𝑛)

𝐄(𝑆_𝑛/√𝑛) = 𝐄(𝑆_𝑛) / √𝑛

𝐄(𝑆_𝑛/√𝑛) = [𝐄(𝑋₁) + … + 𝐄(𝑋_𝑛) ] / √𝑛

𝐄(𝑆_𝑛/√𝑛) = [𝜇 + … + 𝜇] / √𝑛

𝐄(𝑆_𝑛/√𝑛) = √𝑛̅𝜇

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) = 𝑉𝑎𝑟(√𝑛̅𝑋̅)

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) =𝑉𝑎𝑟(𝑆_𝑛/√𝑛)

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) =[𝑉𝑎𝑟(𝑆_𝑛)] / 𝑛

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) =[𝑉𝑎𝑟(𝑋₁) + … + 𝑉𝑎𝑟(𝑋_𝑛)] / 𝑛

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) =[𝜎² + … + 𝜎²] / 𝑛

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) =𝑛𝜎²/ 𝑛

𝑉𝑎𝑟(𝑆_𝑛/√𝑛) =𝜎²

CLT - Importance

CLT is vital in statistics for 2 main reasons:

z-scores
Click here to expand...
the standardized-value or z-score (𝑍𝑛) of the sample mean (𝑆_𝑛/𝑛)is as follows:

Indent

[!list-indent-undo]

𝑍_𝑛 measures how many √𝑛) the 𝑛) is below or above the population mean (𝜇)

As 𝑛→∞, 𝑍_𝑛 converges in distribution to a Standard Normal random variable (for all 𝑧):

Indent

[!list-indent-undo]

where:
- 𝜱(𝑧) - is the integral of Standard Normal Distribution from -∞ to 𝑧
This theorem can be applied to random variables {𝑋₁, 𝑋₂, … } of ANY distribution with finite expectation and variance. As long as 𝑛 is sufficiently large, one can use Normal Distribution to compute probabilities about the random variable 𝑆_𝑛 or 𝑋̅.

normality assumption

Click here to expand...

The fact that sampling distributions can approximate a normal distribution has critical implications. In statistics, the normality assumption is vital for parametric hypothesis tests of the mean, such as the t-test. Consequently, you might think that these tests are not valid when the data are non-normally distributed. However, if your sample size is large enough, the central limit theorem kicks in and produces sampling distributions that approximate a normal distribution. This fact allows you to use these hypothesis tests even when your data are nonnormally distributed—as long as your sample size is large enough.

You might have heard that parametric tests of the mean are robust to depart from the normality assumption when your sample size is sufficiently large. That’s thanks to the central limit theorem!

precision of the estimates

Click here to expand...

sampling distributions of the mean cluster more tightly around the population mean as the sample sizes increase. This property of the central limit theorem becomes relevant when using a sample to estimate the mean of an entire population. With a larger sample size, your sample mean is more likely to be close to the real population mean. In other words, your estimate is more precise.

Conversely, the sampling distributions of the mean for smaller sample sizes are much broader. For small sample sizes, it’s not unusual for sample means to be further away from the actual population mean. You obtain less precise estimates

Some Probability Distributions of Form 𝑆_𝑛

Binomial variable = sum of independent Bernoulli variables
Negative Binomial variable = sum of independent Geometric variables
Gamma variable = sum of independent Exponential variables

Hence, the Central Limit Theorem applies to all these distributions with sufficiently large:

𝑛 for Binomial variables
𝑘 for Negative Binomial variables
𝛼 for Gamma variables

Examples

Example 1 (Allocation of disk space)

A disk has a free space of 330 megabytes. Is it likely to be sufficient for 300 independent images, if each image has an expected size of 1 megabyte with a standard deviation of 0.5 megabytes?

We have:

𝑛 = 300

𝜇 = 1

𝜎 = 0.5

The number of images 𝑛 is large, so the Central Limit Theorem applies to their total size 𝑆_𝑛. Then

Again, ɸ(𝑧) is the integral of Standard Normal Distribution from -∞ to 𝑧

This probability is very high, hence, the available disk space is very likely to be sufficient

Example 2 (Elevator)

You wait for an elevator, whose capacity is 2000 pounds. The elevator comes with 10 adult passengers. Suppose your own weight is 150 lbs, and you heard that human weights are normally distributed with a mean of 165 lbs and a standard deviation of 20 lbs. Would you board this elevator or wait for the next one?

we have:

𝑛 = 10

𝜇 = 165

𝜎 = 20

The probability of an overload equals

So, with a probability of 0.9992, it is safe to take this elevator

Subpages

Central Limit Theorem (CLT) Proofs - via Double Factorials (!!) and the Moment Method

／var／log marcus chiu

Explorer

Central Limit Theorem (CLT)

CLT - Sample Size

CLT - Formal Definition (Form 1)

Distributions of (𝑆_𝑛, 𝑆_𝑛/𝑛, 𝑆_𝑛/√𝑛) for Large 𝑛?

CLT - Importance

Some Probability Distributions of Form 𝑆_𝑛

Examples

Subpages

Resources

／var／logmarcus chiu

Explorer

Central Limit Theorem (CLT)

CLT - Sample Size

CLT - Formal Definition (Form 1)

Distributions of (𝑆𝑛, 𝑆𝑛/𝑛, 𝑆𝑛/√𝑛) for Large 𝑛?

CLT - Importance

Some Probability Distributions of Form 𝑆𝑛

Examples

Subpages

Resources

／var／log marcus chiu

Distributions of (𝑆_𝑛, 𝑆_𝑛/𝑛, 𝑆_𝑛/√𝑛) for Large 𝑛?

Some Probability Distributions of Form 𝑆_𝑛