2-sample problems - comparison of 2 samples, and making inferences of corresponding populations
- Population 1: 𝑋 ∼ 𝑓𝑋(𝑥), 𝐄(𝑋) = 𝜇𝑋
- Population 2: 𝑌 ∼ 𝑓𝑌(𝑦), 𝐄(𝑌) = 𝜇𝑌
CI with 2 Independent Samples - both 𝑋 and 𝑌 samples come from 2 different subjects (i.e. independent observations)
sample size of 𝑋 and 𝑌 may be different
|
𝑋 |
𝑌 |
|---|---|
|
𝑋1 |
𝑌1 |
|
𝑋2 |
𝑌2 |
|
… |
… |
|
𝑋𝑛 |
𝑌𝑚 |
CI General Formula
Click here to expand...
Link to originalCI Definition
An interval [𝐴, 𝐵] is a (1 − 𝛼)100% confidence interval for the parameter 𝜃 if it contains the parameter with probability (1 − 𝛼):
- 𝐏{𝐴 ≤ 𝜃 ≤ 𝐵} = 1 − 𝛼
where:
- 𝛼 - significance level
- (1 − 𝛼) - confidence level or coverage probability
CI Formula Intuition
Click here to expand...
Given a sample of data and a desired confidence level (1 − 𝛼), how can we construct a confidence interval [𝐴, 𝐵] that will satisfy the coverage condition
- 𝐏{𝐴 ≤ 𝜃 ≤ 𝐵} = 1 − 𝛼
first we need to estimate 𝜃
choose an unbiased estimator with normal distribution (e.g. MLE)
use the estimator to take the sample data and estimate 𝜃 point estimate 𝜃ˆ
next we standardize 𝜃ˆ to get a standard normal variable 𝑧:
- 𝑧 = [𝜃ˆ - 𝐄(𝜃ˆ)] / 𝜎(𝜃ˆ)
since 𝜃ˆ was estimated with an unbiased estimator: 𝐄(𝜃ˆ) = 𝜃
- 𝑧 = (𝜃ˆ - 𝜃) / 𝜎(𝜃ˆ)
this variable 𝑧 falls between the standard normal quantiles 𝑞𝛼/2and 𝑞1−𝛼/2, denoted by
- -𝑧𝛼/2= 𝑞𝛼/2
- 𝑧𝛼/2= 𝑞1−𝛼/2
with probability (1 - 𝛼), then:
- 𝐏{-𝑧𝛼/2 ≤ (𝜃ˆ - 𝜃) / 𝜎(𝜃ˆ) ≤ 𝑧𝛼/2} = 1 - 𝛼
now rearrange for 𝜃:
- 𝐏{𝜃ˆ - 𝑧𝛼/2·𝜎(𝜃ˆ) ≤ 𝜃 ≤ 𝜃ˆ + 𝑧𝛼/2·𝜎(𝜃ˆ)} = 1 - 𝛼
we have obtained two numbers:
- 𝐴 = 𝜃ˆ - 𝑧𝛼/2·𝜎(𝜃ˆ)
- 𝐵 = 𝜃ˆ + 𝑧𝛼/2·𝜎(𝜃ˆ)
such that
- 𝐏{𝐴 ≤ 𝜃 ≤ 𝐵} = 1 − 𝛼
CI Formulas
Large Sample Size (𝑛)
Normal Population
𝑆𝐸(𝜃ˆ) Known
Confidence Interval
FALSE
FALSE
EITHER
FALSE
TRUE
FALSE
𝜃ˆ ± 𝑡𝛼/2·𝑆𝐸ˆ(𝜃ˆ)
FALSE
TRUE
TRUE
𝜃ˆ ± 𝑧𝛼/2·𝑆𝐸(𝜃ˆ)
TRUE
EITHER
FALSE
𝜃ˆ ± 𝑧𝛼/2·𝑆𝐸ˆ(𝜃ˆ)
TRUE
EITHER
TRUE
𝜃ˆ ± 𝑧𝛼/2·𝑆𝐸(𝜃ˆ)
where:
- 𝜃ˆ - point estimate/statistic or center of the interval
- 𝑧 - z-score a type of confidence multiplier
- 𝑡 - t-score a type of confidence multiplier
- 𝑆𝐸(𝜃ˆ) or 𝜎(𝜃ˆ) or 𝑆𝑡𝑑(𝜃ˆ) - standard error of the point estimator/statistic
- 𝑆𝐸ˆ(𝜃ˆ) or 𝑠(𝜃ˆ) or 𝑆𝑡𝑑ˆ(𝜃ˆ) - estimated standard error of the point estimator/statistic
CI Annotated
CI Diagram
CI Formula For 2 Independent Samples of Sample Mean
the general formula states the confidence interval is:
- 𝜃ˆ ± 𝑧*·𝑆𝐸(𝜃ˆ)
computing CI for population mean, we substitute:
- 𝜃ˆ = 𝑋̅-𝑌̅
- 𝑆𝐸(𝜃ˆ) = 𝑆𝐸(𝑋̅-𝑌̅) = 𝑟𝑜𝑜𝑡[(𝜎𝑋2/𝑛𝑋) + (𝜎𝑌2/𝑛𝑌)]
computation of 𝑆𝐸(𝑋̅-𝑌̅):
Click here to expand...
- 𝑉𝑎𝑟(𝑋̅-𝑌̅) = 𝑉𝑎𝑟(𝑋̅) + 𝑉𝑎𝑟(𝑌̅) = (𝜎𝑋2/𝑛𝑋) + (𝜎𝑌2/𝑛𝑌)
- 𝑆𝑡𝑑(𝑋̅-𝑌̅) = 𝑆𝐸(𝑋̅-𝑌̅) = 𝑟𝑜𝑜𝑡((𝜎𝑋2/𝑛𝑋) + (𝜎𝑌2/𝑛𝑌))
if population standard deviation 𝜎𝑋 and 𝜎𝑌are UNKNOWN and assumed to be:
- EQUAL (use pooled standard deviation)
Click here to expand...
how do we estimate 𝜎? we could use a pooled standard deviation (𝑠𝑝):
- 𝑠𝑝= [ (𝑛𝑋 - 1) 𝑠𝑋2 + (𝑛𝑌 - 1) 𝑠𝑌2] / [ 𝑛𝑋 + 𝑛𝑌 - 2 ]
therefore:
- 𝑆𝐸ˆ(𝑋̅-𝑌̅) = 𝑠𝑝·𝑟𝑜𝑜𝑡(1/𝑛𝑋 + 1/𝑛𝑌)
the 100(1-𝛼)% CI for (𝜇𝑋-𝜇𝑌) when :
- (𝑋̅-𝑌̅) ± 𝑡𝛼/2,𝑛𝑋+𝑛𝑌-2·𝑆𝐸ˆ(𝑋̅-𝑌̅)
the 100(1-𝛼)% CI for (𝜇𝑋-𝜇𝑌) when either (sample sizes of 𝑛𝑋and 𝑛𝑌are large) or (𝑋̅ and 𝑌̅ are sampled from a population that has normal distribution):
- (𝑋̅-𝑌̅) ± 𝑧𝛼/2·𝑆𝐸ˆ(𝑋̅-𝑌̅)
- NOT EQUAL (use Satterthwaite’s Approximation)
Click here to expand...
- 𝑣 = [𝑠𝑋2/𝑛𝑋 + 𝑠𝑌2/𝑛𝑌]2/ [ 𝑠𝑋4/(𝑛𝑋2(𝑛𝑋-1)) + 𝑠𝑌4/(𝑛𝑌2(𝑛𝑌 -1)) ]
the 100(1-𝛼)% CI for (𝜇𝑋-𝜇𝑌) when
- (𝑋̅-𝑌̅) ± 𝑡𝛼/2,𝑣·𝑟𝑜𝑜𝑡(𝑠𝑋2/𝑛𝑋 + 𝑠𝑌2/𝑛𝑌)
CI Formulas For 2 Independent Samples of Sample Mean
|
Large Sample Sizes |
Normal Population |
(𝜎𝑋and 𝜎𝑌) |
𝜎𝑋 = 𝜎𝑌Assumed |
Confidence Interval |
|---|---|---|---|---|
|
FALSE |
FALSE |
EITHER |
EITHER |
Bootstrap Method |
|
FALSE |
TRUE |
FALSE |
FALSE |
(𝑋̅-𝑌̅) ± 𝑡𝛼/2,𝑣·𝑟𝑜𝑜𝑡(𝑠𝑋2/𝑛𝑋 + 𝑠𝑌2/𝑛𝑌) |
|
FALSE |
TRUE |
FALSE |
TRUE | |
|
FALSE |
TRUE |
TRUE |
EITHER |
(𝑋̅-𝑌̅) ± 𝑧𝛼/2·𝑟𝑜𝑜𝑡(𝜎𝑋2/𝑛𝑋 + 𝜎𝑌2/𝑛𝑌) |
|
TRUE |
EITHER |
FALSE |
FALSE |
(𝑋̅-𝑌̅) ± 𝑧𝛼/2·𝑟𝑜𝑜𝑡(𝑠𝑋2/𝑛𝑋 + 𝑠𝑌2/𝑛𝑌) |
|
TRUE |
EITHER |
FALSE |
TRUE | |
|
TRUE |
EITHER |
TRUE |
EITHER |
(𝑋̅-𝑌̅) ± 𝑧𝛼/2·𝑟𝑜𝑜𝑡(𝜎𝑋2/𝑛𝑋 + 𝜎𝑌2/𝑛𝑌) |
/ci---formula-for-unbiased-estimator-with-normal-distribution/../../../../../../../../mathematics/probability---statistics---information-theory---econometrics/statistics/inferential-statistics/interval-estimation/confidence-interval-(ci)/ci---formula-for-unbiased-estimator-with-normal-distribution/confidence-interval-structure-general.png)
/ci---formula-for-unbiased-estimator-with-normal-distribution/../../../../../../../../mathematics/probability---statistics---information-theory---econometrics/statistics/inferential-statistics/interval-estimation/confidence-interval-(ci)/ci---formula-for-unbiased-estimator-with-normal-distribution/confidence-interval-structure-line.png)