2-sample problems - comparison of 2 samples, and making inferences of corresponding populations

Population 1: 𝑋 ∼ 𝑓_𝑋(𝑥), 𝐄(𝑋) = 𝜇_𝑋
Population 2: 𝑌 ∼ 𝑓_𝑌(𝑦), 𝐄(𝑌) = 𝜇_𝑌

CI with 2 Independent Samples - both 𝑋 and 𝑌 samples come from 2 different subjects (i.e. independent observations)

sample size of 𝑋 and 𝑌 may be different

𝑋	𝑌
𝑋₁	𝑌₁
𝑋₂	𝑌₂
…	…
𝑋_𝑛	𝑌_𝑚

CI General Formula

Click here to expand...

CI Definition

An interval [𝐴, 𝐵] is a (1 − 𝛼)100% confidence interval for the parameter 𝜃 if it contains the parameter with probability (1 − 𝛼):

𝐏{𝐴 ≤ 𝜃 ≤ 𝐵} = 1 − 𝛼

where:

𝛼 - significance level

(1 − 𝛼) - confidence level or coverage probability

CI Formula Intuition

Click here to expand...

Given a sample of data and a desired confidence level (1 − 𝛼), how can we construct a confidence interval [𝐴, 𝐵] that will satisfy the coverage condition

𝐏{𝐴 ≤ 𝜃 ≤ 𝐵} = 1 − 𝛼

first we need to estimate 𝜃

choose an unbiased estimator with normal distribution (e.g. MLE)

use the estimator to take the sample data and estimate 𝜃 point estimate 𝜃ˆ

next we standardize 𝜃ˆ to get a standard normal variable 𝑧:

𝑧 = [𝜃ˆ - 𝐄(𝜃ˆ)] / 𝜎(𝜃ˆ)

since 𝜃ˆ was estimated with an unbiased estimator: 𝐄(𝜃ˆ) = 𝜃

𝑧 = (𝜃ˆ - 𝜃) / 𝜎(𝜃ˆ)

this variable 𝑧 falls between the standard normal quantiles 𝑞_𝛼/2and 𝑞_1−𝛼/2, denoted by

-𝑧_𝛼/2= 𝑞_𝛼/2

𝑧_𝛼/₂= 𝑞_1−𝛼/2

with probability (1 - 𝛼), then:

𝐏{-𝑧_𝛼/2 ≤ (𝜃ˆ - 𝜃) / 𝜎(𝜃ˆ) ≤ 𝑧_𝛼/₂} = 1 - 𝛼

now rearrange for 𝜃:

𝐏{𝜃ˆ - 𝑧_𝛼/2·𝜎(𝜃ˆ) ≤ 𝜃 ≤ 𝜃ˆ + 𝑧_𝛼/₂·𝜎(𝜃ˆ)} = 1 - 𝛼

we have obtained two numbers:

𝐴 = 𝜃ˆ - 𝑧_𝛼/₂·𝜎(𝜃ˆ)

𝐵 = 𝜃ˆ + 𝑧_𝛼/₂·𝜎(𝜃ˆ)

such that

𝐏{𝐴 ≤ 𝜃 ≤ 𝐵} = 1 − 𝛼

CI Formulas

Large Sample Size (𝑛)

Normal Population

𝑆𝐸(𝜃ˆ) Known

Confidence Interval

FALSE

FALSE

EITHER

Bootstrap Method

FALSE

TRUE

FALSE

𝜃ˆ ± 𝑡_𝛼/2·𝑆𝐸ˆ(𝜃ˆ)

FALSE

TRUE

TRUE

𝜃ˆ ± 𝑧_𝛼/2·𝑆𝐸(𝜃ˆ)

TRUE

EITHER

FALSE

𝜃ˆ ± 𝑧_𝛼/2·𝑆𝐸ˆ(𝜃ˆ)

TRUE

EITHER

TRUE

𝜃ˆ ± 𝑧_𝛼/2·𝑆𝐸(𝜃ˆ)

where:

𝜃ˆ - point estimate/statistic or center of the interval

𝑧 - z-score a type of confidence multiplier

𝑡 - t-score a type of confidence multiplier

𝑆𝐸(𝜃ˆ) or 𝜎(𝜃ˆ) or 𝑆𝑡𝑑(𝜃ˆ) - standard error of the point estimator/statistic

𝑆𝐸ˆ(𝜃ˆ) or 𝑠(𝜃ˆ) or 𝑆𝑡𝑑ˆ(𝜃ˆ) - estimated standard error of the point estimator/statistic

CI Annotated

CI Diagram

Link to original

Large Sample Size (𝑛)	Normal Population	𝑆𝐸(𝜃ˆ) Known	Confidence Interval
FALSE	FALSE	EITHER	Bootstrap Method
FALSE	TRUE	FALSE	𝜃ˆ ± 𝑡_𝛼/2·𝑆𝐸ˆ(𝜃ˆ)
FALSE	TRUE	TRUE	𝜃ˆ ± 𝑧_𝛼/2·𝑆𝐸(𝜃ˆ)
TRUE	EITHER	FALSE	𝜃ˆ ± 𝑧_𝛼/2·𝑆𝐸ˆ(𝜃ˆ)
TRUE	EITHER	TRUE	𝜃ˆ ± 𝑧_𝛼/2·𝑆𝐸(𝜃ˆ)

CI Formula For 2 Independent Samples of Sample Mean

the general formula states the confidence interval is:

𝜃ˆ ± 𝑧*·𝑆𝐸(𝜃ˆ)

computing CI for population mean, we substitute:

𝜃ˆ = 𝑋̅-𝑌̅
𝑆𝐸(𝜃ˆ) = 𝑆𝐸(𝑋̅-𝑌̅) = 𝑟𝑜𝑜𝑡[(𝜎_𝑋²/𝑛_𝑋) + (𝜎_𝑌²/𝑛_𝑌)]

computation of 𝑆𝐸(𝑋̅-𝑌̅):

Click here to expand...

𝑉𝑎𝑟(𝑋̅-𝑌̅) = 𝑉𝑎𝑟(𝑋̅) + 𝑉𝑎𝑟(𝑌̅) = (𝜎_𝑋²/𝑛_𝑋) + (𝜎_𝑌²/𝑛_𝑌)

𝑆𝑡𝑑(𝑋̅-𝑌̅) = 𝑆𝐸(𝑋̅-𝑌̅) = 𝑟𝑜𝑜𝑡((𝜎_𝑋²/𝑛_𝑋) + (𝜎_𝑌²/𝑛_𝑌))

if population standard deviation 𝜎_𝑋 and 𝜎_𝑌are UNKNOWN and assumed to be:

EQUAL (use pooled standard deviation)
Click here to expand...
how do we estimate 𝜎? we could use a pooled standard deviation (𝑠𝑝):
- 𝑠_𝑝= [ (𝑛_𝑋 - 1) 𝑠_𝑋² + (𝑛_𝑌 - 1) 𝑠_𝑌²] / [ 𝑛_𝑋 + 𝑛_𝑌 - 2 ]
therefore:
- 𝑆𝐸ˆ(𝑋̅-𝑌̅) = 𝑠_𝑝·𝑟𝑜𝑜𝑡(1/𝑛_𝑋 + 1/𝑛_𝑌)
the 100(1-𝛼)% CI for (𝜇_𝑋-𝜇_𝑌) when :
- (𝑋̅-𝑌̅) ± 𝑡_{𝛼/2,𝑛_𝑋+𝑛_𝑌-2}·𝑆𝐸ˆ(𝑋̅-𝑌̅)
the 100(1-𝛼)% CI for (𝜇_𝑋-𝜇_𝑌) when either (sample sizes of 𝑛_𝑋and 𝑛_𝑌are large) or (𝑋̅ and 𝑌̅ are sampled from a population that has normal distribution):
- (𝑋̅-𝑌̅) ± 𝑧_𝛼/2·𝑆𝐸ˆ(𝑋̅-𝑌̅)

NOT EQUAL (use Satterthwaite’s Approximation)
Click here to expand...
Satterthwaite’s Approximation
- 𝑣 = [𝑠_𝑋²/𝑛_𝑋 + 𝑠_𝑌²/𝑛_𝑌]²/ [ 𝑠_𝑋⁴/(𝑛_𝑋²(𝑛_𝑋-1)) + 𝑠_𝑌⁴/(𝑛_𝑌²(𝑛_𝑌 -1)) ]
the 100(1-𝛼)% CI for (𝜇_𝑋-𝜇_𝑌) when
- (𝑋̅-𝑌̅) ± 𝑡_𝛼/2,𝑣·𝑟𝑜𝑜𝑡(𝑠_𝑋²/𝑛_𝑋 + 𝑠_𝑌²/𝑛_𝑌)

CI Formulas For 2 Independent Samples of Sample Mean

Large Sample Sizes (𝑛_𝑋& 𝑛_𝑌)	Normal Population (𝑋 and 𝑌)	(𝜎_𝑋and 𝜎_𝑌) Known	𝜎_𝑋 = 𝜎_𝑌Assumed	Confidence Interval
FALSE	FALSE	EITHER	EITHER	Bootstrap Method
FALSE	TRUE	FALSE	FALSE	(𝑋̅-𝑌̅) ± 𝑡_𝛼/2,𝑣·𝑟𝑜𝑜𝑡(𝑠_𝑋²/𝑛_𝑋 + 𝑠_𝑌²/𝑛_𝑌)
FALSE	TRUE	FALSE	TRUE	(𝑋̅-𝑌̅) ± 𝑡_{𝛼/2,𝑛_𝑋+𝑛_𝑌-2}·𝑟𝑜𝑜𝑡(𝑠𝑝²/𝑛_𝑋 + 𝑠𝑝²/𝑛_𝑌)
FALSE	TRUE	TRUE	EITHER	(𝑋̅-𝑌̅) ± 𝑧_𝛼/2·𝑟𝑜𝑜𝑡(𝜎_𝑋²/𝑛_𝑋 + 𝜎_𝑌²/𝑛_𝑌)
TRUE	EITHER	FALSE	FALSE	(𝑋̅-𝑌̅) ± 𝑧_𝛼/2·𝑟𝑜𝑜𝑡(𝑠_𝑋²/𝑛_𝑋 + 𝑠_𝑌²/𝑛_𝑌)
TRUE	EITHER	FALSE	TRUE	(𝑋̅-𝑌̅) ± 𝑧_𝛼/2·𝑟𝑜𝑜𝑡(𝑠𝑝²/𝑛_𝑋 + 𝑠𝑝²/𝑛_𝑌)
TRUE	EITHER	TRUE	EITHER	(𝑋̅-𝑌̅) ± 𝑧_𝛼/2·𝑟𝑜𝑜𝑡(𝜎_𝑋²/𝑛_𝑋 + 𝜎_𝑌²/𝑛_𝑌)

／var／log marcus chiu

Explorer

CI - 2 Independent Samples - Mean

CI General Formula

CI Definition

CI Formula Intuition

CI Formulas

CI Annotated

CI Diagram

CI Formula For 2 Independent Samples of Sample Mean

CI Formulas For 2 Independent Samples of Sample Mean

／var／logmarcus chiu

Explorer

CI - 2 Independent Samples - Mean

CI General Formula

CI Definition

CI Formula Intuition

CI Formulas

CI Annotated

CI Diagram

CI Formula For 2 Independent Samples of Sample Mean

CI Formulas For 2 Independent Samples of Sample Mean

／var／log marcus chiu