Why Dividing by n Always Underestimates Variance

Why Divide by n-1 Instead of n-2?

Analytical (v1)

(𝑠²) = [ 𝛴_{1≤𝑖≤𝑛} (𝑋_𝑖- 𝑋̅)²] / (𝑛 - 1)

𝐄[𝑠²] = 𝐄[( 𝛴_{1≤𝑖≤𝑛} (𝑋_𝑖- 𝑋̅)²) / (𝑛 - 1)]

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝐄[𝛴_{1≤𝑖≤𝑛} (𝑋_𝑖- 𝑋̅)²]

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋_𝑖² - 2𝑋_𝑖𝑋̅ + 𝑋̅²)]

𝐄[𝑠²] = [1/(𝑛 - 1)] ( 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋_𝑖²)] - 𝐄[𝛴_{1≤𝑖≤𝑛}(2𝑋_𝑖𝑋̅)] + 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋̅²)] )

𝐄[𝑠²] = [1/(𝑛 - 1)] ( 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋_𝑖²)] - 𝐄[2𝑋̅𝛴_{1≤𝑖≤𝑛}(𝑋_𝑖)] + 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋̅²)] ) # 2𝑋̅ is the same value and can be pulled out of summation

𝐄[𝑠²] = [1/(𝑛 - 1)] ( 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋_𝑖²)] - 𝐄[2𝑋̅𝑛𝑋̅] + 𝐄[𝛴_{1≤𝑖≤𝑛}(𝑋̅²)] ) # 𝛴_{1≤𝑖≤𝑛}(𝑋_𝑖) = 𝑛𝑋̅

𝐄[𝑠²] = [1/(𝑛 - 1)] ( 𝑛𝐄[𝑋_𝑖²] - 2𝑛𝐄[𝑋̅²] + 𝑛𝐄[𝑋̅²] )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( 𝐄[𝑋_𝑖²] - 2𝐄[𝑋̅²] + 𝐄[𝑋̅²] )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( 𝐄[𝑋_𝑖²] - 𝐄[𝑋̅²] )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( [𝜎² + 𝜇²] - 𝐄[𝑋̅²] ) # 𝐄[𝑋_𝑖²] = 𝜎² + 𝜇² because 2nd moment: 𝜎² = 𝐄[𝑋_𝑖²] - 𝜇²

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( [𝜎² + 𝜇²] - [𝜎²/𝑛 + 𝜇²] ) # 𝐄[𝑋̅²] = 𝜎²/𝑛 + 𝜇² because 𝐄[𝑋̅²] = 𝑉𝑎𝑟(𝑋̅) + 𝜇² and see variance of sample mean and properties of expectation

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( 𝜎² + 𝜇² - 𝜎²/𝑛 - 𝜇² )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( 𝜎² - 𝜎²/𝑛 )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]𝜎² - [𝑛/(𝑛 - 1)]𝜎²/𝑛

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]𝜎² - [1/(𝑛 - 1)]𝜎²

𝐄[𝑠²] = 𝜎²( [𝑛/(𝑛 - 1)] - [1/(𝑛 - 1)] )

𝐄[𝑠²] = 𝜎²[1/(𝑛 - 1)] [𝑛 - 1]

𝐄[𝑠²] = 𝜎²

Analytical (v2)

(𝑠²) = [ 𝛴_{1≤𝑖≤𝑛} (𝑋_𝑖- 𝑋̅)²] / (𝑛 - 1)

𝐄[𝑠²] = 𝐄[( 𝛴_{1≤𝑖≤𝑛} (𝑋_𝑖- 𝑋̅)²) / (𝑛 - 1)]

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝐄[𝛴_{1≤𝑖≤𝑛} (𝑋_𝑖- 𝑋̅)²]

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝛴_{1≤𝑖≤𝑛} 𝐄[(𝑋_𝑖- 𝑋̅)²]

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝛴_{1≤𝑖≤𝑛} 𝐄[𝑋_𝑖² - 2𝑋_𝑖𝑋̅ + 𝑋̅²]

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝛴_{1≤𝑖≤𝑛} ( 𝐄[𝑋_𝑖²] - 𝐄[2𝑋_𝑖𝑋̅] + 𝐄[𝑋̅²] )

𝐄[𝑠²] = [1/(𝑛 - 1)] 𝑛 ( 𝐄[𝑋_𝑖²] - 𝐄[2𝑋_𝑖𝑋̅] + 𝐄[𝑋̅²] )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)] ( 𝐄[𝑋_𝑖²] - 𝐄[2𝑋_𝑖𝑋̅] + 𝐄[𝑋̅²] )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·𝐄[𝑋_𝑖²] - [𝑛/(𝑛 - 1)]·𝐄[2𝑋_𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅²]

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·𝐄[𝑋_𝑖²] - [𝑛/(𝑛 - 1)]·𝐄[2𝑋_𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅²]

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·[𝜎² + 𝜇²] - [𝑛/(𝑛 - 1)]·𝐄[2𝑋_𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅²] # 𝐄[𝑋_𝑖²] = 𝜎² + 𝜇²

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·[𝜎² + 𝜇²] - 2·[𝑛/(𝑛 - 1)]·𝐄[𝑋_𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅²]

The expected value of 𝐄[𝑋_𝑗𝑋_𝑘] depends on whether you are sampling different (independent) samples where 𝑗≠𝑘, or the same (definitely dependent in this case!) sample where 𝑗=𝑘. Since we have n samples, the possibility of getting the same sample is 1/𝑛

𝐄[𝑋_𝑗𝑋_𝑘] = 𝐄[𝑋_𝑗]𝐄[𝑋_𝑘] = 𝜇² if 𝑗≠𝑘

𝐄[𝑋_𝑗𝑋_𝑘] = 𝐄[𝑋_𝑗²] = 𝜎² + 𝜇² if 𝑗=𝑘

𝐄[𝑋_𝑖𝑋̅] = [(𝑛-1)/𝑛]𝜇² + [1/𝑛][𝜎² + 𝜇²]

𝐄[𝑋̅²] = [(𝑛²-𝑛)/𝑛²]𝜇² + [𝑛/𝑛²][𝜎² + 𝜇²]

𝐄[𝑋̅²] = [(𝑛-1)/𝑛]𝜇² + [1/𝑛][𝜎² + 𝜇²]

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·[𝜎² + 𝜇²] - 2·[𝑛/(𝑛 - 1)]·( [(𝑛-1)/𝑛]𝜇² + [1/𝑛][𝜎² + 𝜇²] ) + [𝑛/(𝑛 - 1)]·( [(𝑛-1)/𝑛]𝜇² + [1/𝑛][𝜎² + 𝜇²] )

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·[𝜎² + 𝜇²] - 2𝜇² - [2/(𝑛 - 1)][𝜎² + 𝜇²] + 𝜇² + [1/(𝑛 - 1)][𝜎² + 𝜇²]

𝐄[𝑠²] = [𝑛/(𝑛 - 1)]·[𝜎² + 𝜇²] - 𝜇² - [1/(𝑛 - 1)][𝜎² + 𝜇²]

𝐄[𝑠²] = [(𝑛 - 1)/(𝑛 - 1)]·[𝜎² + 𝜇²] - 𝜇²

𝐄[𝑠²] = 𝜎² + 𝜇² - 𝜇²

𝐄[𝑠²] = 𝜎²

Resources

https://towardsdatascience.com/why-sample-variance-is-divided-by-n-1-89821b83ef6d

Geometric

TODO

Samples of n-values generate n-1 dimensional space of SSDs. The remaining dimension (1,1, …, 1) has nothing to do with sample variance but sample mean

TODO

Resources

https://medium.com/fun-with-data-science/sample-variance-how-does-n-1-come-f60c71be09cb

／var／log marcus chiu

Explorer

Sample Variance - Why (n - 1)？

Why Dividing by n Always Underestimates Variance

Why Divide by n-1 Instead of n-2?

Resources

Resources

／var／logmarcus chiu

Explorer

Sample Variance - Why (n - 1)？

Why Dividing by n Always Underestimates Variance

Why Divide by n-1 Instead of n-2?

Resources

Resources

／var／log marcus chiu