Why Dividing by n Always Underestimates Variance
Why Divide by n-1 Instead of n-2?
Analytical (v1)
(𝑠2) = [ 𝛴1≤𝑖≤𝑛 (𝑋𝑖- 𝑋̅)2] / (𝑛 - 1)
- 𝐄[𝑠2] = 𝐄[( 𝛴1≤𝑖≤𝑛 (𝑋𝑖- 𝑋̅)2) / (𝑛 - 1)]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝐄[𝛴1≤𝑖≤𝑛 (𝑋𝑖- 𝑋̅)2]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝐄[𝛴1≤𝑖≤𝑛(𝑋𝑖2 - 2𝑋𝑖𝑋̅ + 𝑋̅2)]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] ( 𝐄[𝛴1≤𝑖≤𝑛(𝑋𝑖2)] - 𝐄[𝛴1≤𝑖≤𝑛(2𝑋𝑖𝑋̅)] + 𝐄[𝛴1≤𝑖≤𝑛(𝑋̅2)] )
- 𝐄[𝑠2] = [1/(𝑛 - 1)] ( 𝐄[𝛴1≤𝑖≤𝑛(𝑋𝑖2)] - 𝐄[2𝑋̅𝛴1≤𝑖≤𝑛(𝑋𝑖)] + 𝐄[𝛴1≤𝑖≤𝑛(𝑋̅2)] ) # 2𝑋̅ is the same value and can be pulled out of summation
- 𝐄[𝑠2] = [1/(𝑛 - 1)] ( 𝐄[𝛴1≤𝑖≤𝑛(𝑋𝑖2)] - 𝐄[2𝑋̅𝑛𝑋̅] + 𝐄[𝛴1≤𝑖≤𝑛(𝑋̅2)] ) # 𝛴1≤𝑖≤𝑛(𝑋𝑖) = 𝑛𝑋̅
- 𝐄[𝑠2] = [1/(𝑛 - 1)] ( 𝑛𝐄[𝑋𝑖2] - 2𝑛𝐄[𝑋̅2] + 𝑛𝐄[𝑋̅2] )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( 𝐄[𝑋𝑖2] - 2𝐄[𝑋̅2] + 𝐄[𝑋̅2] )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( 𝐄[𝑋𝑖2] - 𝐄[𝑋̅2] )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( [𝜎2 + 𝜇2] - 𝐄[𝑋̅2] ) # 𝐄[𝑋𝑖2] = 𝜎2 + 𝜇2 because 2nd moment: 𝜎2 = 𝐄[𝑋𝑖2] - 𝜇2
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( [𝜎2 + 𝜇2] - [𝜎2/𝑛 + 𝜇2] ) # 𝐄[𝑋̅2] = 𝜎2/𝑛 + 𝜇2 because 𝐄[𝑋̅2] = 𝑉𝑎𝑟(𝑋̅) + 𝜇2 and see variance of sample mean and properties of expectation
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( 𝜎2 + 𝜇2 - 𝜎2/𝑛 - 𝜇2 )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( 𝜎2 - 𝜎2/𝑛 )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]𝜎2 - [𝑛/(𝑛 - 1)]𝜎2/𝑛
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]𝜎2 - [1/(𝑛 - 1)]𝜎2
- 𝐄[𝑠2] = 𝜎2( [𝑛/(𝑛 - 1)] - [1/(𝑛 - 1)] )
- 𝐄[𝑠2] = 𝜎2[1/(𝑛 - 1)] [𝑛 - 1]
- 𝐄[𝑠2] = 𝜎2
Analytical (v2)
(𝑠2) = [ 𝛴1≤𝑖≤𝑛 (𝑋𝑖- 𝑋̅)2] / (𝑛 - 1)
- 𝐄[𝑠2] = 𝐄[( 𝛴1≤𝑖≤𝑛 (𝑋𝑖- 𝑋̅)2) / (𝑛 - 1)]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝐄[𝛴1≤𝑖≤𝑛 (𝑋𝑖- 𝑋̅)2]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝛴1≤𝑖≤𝑛 𝐄[(𝑋𝑖- 𝑋̅)2]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝛴1≤𝑖≤𝑛 𝐄[𝑋𝑖2 - 2𝑋𝑖𝑋̅ + 𝑋̅2]
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝛴1≤𝑖≤𝑛 ( 𝐄[𝑋𝑖2] - 𝐄[2𝑋𝑖𝑋̅] + 𝐄[𝑋̅2] )
- 𝐄[𝑠2] = [1/(𝑛 - 1)] 𝑛 ( 𝐄[𝑋𝑖2] - 𝐄[2𝑋𝑖𝑋̅] + 𝐄[𝑋̅2] )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)] ( 𝐄[𝑋𝑖2] - 𝐄[2𝑋𝑖𝑋̅] + 𝐄[𝑋̅2] )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·𝐄[𝑋𝑖2] - [𝑛/(𝑛 - 1)]·𝐄[2𝑋𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅2]
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·𝐄[𝑋𝑖2] - [𝑛/(𝑛 - 1)]·𝐄[2𝑋𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅2]
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·[𝜎2 + 𝜇2] - [𝑛/(𝑛 - 1)]·𝐄[2𝑋𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅2] # 𝐄[𝑋𝑖2] = 𝜎2 + 𝜇2
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·[𝜎2 + 𝜇2] - 2·[𝑛/(𝑛 - 1)]·𝐄[𝑋𝑖𝑋̅] + [𝑛/(𝑛 - 1)]·𝐄[𝑋̅2]
- The expected value of 𝐄[𝑋𝑗𝑋𝑘] depends on whether you are sampling different (independent) samples where 𝑗≠𝑘, or the same (definitely dependent in this case!) sample where 𝑗=𝑘. Since we have n samples, the possibility of getting the same sample is 1/𝑛
- 𝐄[𝑋𝑗𝑋𝑘] = 𝐄[𝑋𝑗]𝐄[𝑋𝑘] = 𝜇2 if 𝑗≠𝑘
- 𝐄[𝑋𝑗𝑋𝑘] = 𝐄[𝑋𝑗2] = 𝜎2 + 𝜇2 if 𝑗=𝑘
- 𝐄[𝑋𝑖𝑋̅] = [(𝑛-1)/𝑛]𝜇2 + [1/𝑛][𝜎2 + 𝜇2]
- 𝐄[𝑋̅2] = [(𝑛2-𝑛)/𝑛2]𝜇2 + [𝑛/𝑛2][𝜎2 + 𝜇2]
- 𝐄[𝑋̅2] = [(𝑛-1)/𝑛]𝜇2 + [1/𝑛][𝜎2 + 𝜇2]
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·[𝜎2 + 𝜇2] - 2·[𝑛/(𝑛 - 1)]·( [(𝑛-1)/𝑛]𝜇2 + [1/𝑛][𝜎2 + 𝜇2] ) + [𝑛/(𝑛 - 1)]·( [(𝑛-1)/𝑛]𝜇2 + [1/𝑛][𝜎2 + 𝜇2] )
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·[𝜎2 + 𝜇2] - 2𝜇2 - [2/(𝑛 - 1)][𝜎2 + 𝜇2] + 𝜇2 + [1/(𝑛 - 1)][𝜎2 + 𝜇2]
- 𝐄[𝑠2] = [𝑛/(𝑛 - 1)]·[𝜎2 + 𝜇2] - 𝜇2 - [1/(𝑛 - 1)][𝜎2 + 𝜇2]
- 𝐄[𝑠2] = [(𝑛 - 1)/(𝑛 - 1)]·[𝜎2 + 𝜇2] - 𝜇2
- 𝐄[𝑠2] = 𝜎2 + 𝜇2 - 𝜇2
- 𝐄[𝑠2] = 𝜎2
Resources
Geometric
TODO
Samples of n-values generate n-1 dimensional space of SSDs. The remaining dimension (1,1, …, 1) has nothing to do with sample variance but sample mean
TODO
Resources
-deviation)/sample-variance---sample-standard-deviation---adjusted-sample-variance---adjusted-sample-standard-deviation/sample-variance---why-(n---1)?/sample-variance-why-n-1.png)