Univariate Linear Regression model assumes that the conditional expectation is a linear function of a single variable 𝑥:

𝑦̂ = 𝑓(𝑥) = 𝐄[𝑌|𝑋=𝑥] = 𝜃₀+ 𝜃₁𝑥

where:

𝜃₀ = 𝑓(0) # 𝑦 intercept
𝜃₁ = 𝛿𝑦/𝛿𝑥 # slope along 𝑥 axis

Estimating 𝜃₀ and 𝜃₁(Ordinary Least Squares Method)

Click here to expand...

given training/sample data 𝐷 = {(𝑥₁,𝑦₁), …, (𝑥_𝑛,𝑦_𝑛)} let us estimate the (intercept 𝜃₀) and (slope 𝜃₁) by the method of least squares:

The Sum of Square Error (𝑆𝑆_𝐸𝑅𝑅) of 𝑓(𝑥) given 𝐷 is defined below:

𝑆𝑆_𝐸𝑅𝑅 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̂_𝑖]²

𝑆𝑆_𝐸𝑅𝑅 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑓(𝑥_𝑖)]²

𝑆𝑆_𝐸𝑅𝑅 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝜃₀- 𝜃₁𝑥_𝑖]²

We want to minimize 𝑠𝑠𝑒 wrt 𝜃₀and 𝜃₁. We can do it by taking partial derivatives of 𝑠𝑠𝑒, equating them to 0, then solving for 𝜃₀ and 𝜃₁.

Therefore, the estimate of 𝜃₀ and 𝜃₁ is shown below:

estimate of 𝜃₀= 𝜃ˆ₀ = 𝑦̅ - 𝜃ˆ₁𝑥̅

estimate of 𝜃₁= 𝜃ˆ₁= 𝑆_𝑥𝑦/ 𝑆_𝑥𝑥

where:

𝑦̅ - mean of all 𝑦‘s

𝑥̅ - mean of all 𝑥‘s

𝑆_𝑥𝑦 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)(𝑥_𝑖 - 𝑥̅)]

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)(𝑥_𝑖 - 𝑥̅)]

computation of the estimates

Click here to expand...

𝛿𝑠𝑠𝑒/𝛿𝜃₀ = -2 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝜃₀- 𝜃₁𝑥_𝑖]

𝛿𝑠𝑠𝑒/𝛿𝜃₁ = -2 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝜃₀- 𝜃₁𝑥_𝑖]𝑥_𝑖

equating them to 0, we obtain so-called normal equations:

0 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝜃₀- 𝜃₁𝑥_𝑖]

0 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝜃₀- 𝜃₁𝑥_𝑖]𝑥_𝑖

for the first normal equation:

0= 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖] - 𝜃₀𝛴_{1≤𝑖≤𝑛}[1] - 𝜃₁𝛴_{1≤𝑖≤𝑛}[𝑥_𝑖]

0= 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖] - 𝜃₀𝑛 - 𝜃₁𝛴_{1≤𝑖≤𝑛}[𝑥_𝑖]

𝜃₀𝑛= 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖] - 𝜃₁𝛴_{1≤𝑖≤𝑛}[𝑥_𝑖]

𝜃₀= 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖]/𝑛 - 𝜃₁𝛴_{1≤𝑖≤𝑛}[𝑥_𝑖]/𝑛

𝜃₀= 𝑦̅ - 𝜃₁𝑥̅ # 𝑦̅ and 𝑥̅ definition of the sample mean

substitute this into the second equation:

0 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝜃₀- 𝜃₁𝑥_𝑖]𝑥_𝑖

0 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - (𝑦̅ - 𝜃₁𝑥̅)- 𝜃₁𝑥_𝑖]𝑥_𝑖

0 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̅ + 𝜃₁𝑥̅- 𝜃₁𝑥_𝑖]𝑥_𝑖

0 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅) + 𝜃₁(𝑥̅- 𝑥_𝑖)]𝑥_𝑖

0 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅) - (-𝜃₁(𝑥̅- 𝑥_𝑖))]𝑥_𝑖

0 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅) - 𝜃₁(𝑥_𝑖- 𝑥̅)]𝑥_𝑖

0 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)𝑥_𝑖] - 𝜃₁𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)𝑥_𝑖]

𝑆_𝑥𝑦 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)𝑥_𝑖]

𝑆_𝑥𝑦= 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)𝑥_𝑖] - 𝑥̅·0

𝑆_𝑥𝑦= 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)𝑥_𝑖] - 𝑥̅·𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)] # because 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)] = 0

𝑆_𝑥𝑦= 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)𝑥_𝑖] - 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)𝑥̅]

𝑆_𝑥𝑦 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)(𝑥_𝑖 - 𝑥̅)]

and

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)𝑥_𝑖]

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)𝑥_𝑖] - 𝑥̅·0

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)𝑥_𝑖] - 𝑥̅·𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)] # because 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)] = 0

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)𝑥_𝑖] - 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)𝑥̅]

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)(𝑥_𝑖 - 𝑥̅)]

0 = 𝑆_𝑥𝑦 - 𝜃₁𝑆_𝑥𝑥

𝜃₁= 𝑆_𝑥𝑦/ 𝑆_𝑥𝑥

the partial derivatives are:

(Regression Slope 𝜃₁) in relation to (Correlation Coefficient 𝑟_𝑥𝑦)

Click here to expand...

𝜃₁= 𝑟_𝑥𝑦(𝑠_𝑦/𝑠_𝑥)

𝜃₁= 𝑠_𝑥𝑦²/𝑠_𝑥²

sample correlation is defined below:

𝑟_𝑥𝑦 = 𝑠_𝑥𝑦²/ (𝑠_𝑥𝑠_𝑦)

where:

𝑠_𝑥𝑦² = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)(𝑦_𝑖 - 𝑦̅)] / [𝑛 - 1] # sample covariance

𝑠_𝑥= 𝑟𝑜𝑜𝑡[𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)²] / [𝑛 - 1]] # sample standard deviation

𝑠_𝑦= 𝑟𝑜𝑜𝑡[𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)²] / [𝑛 - 1]] # sample standard deviation

proof:

Click here to expand... ₁= 𝑆_𝑥𝑦/ 𝑆_𝑥𝑥

where:

𝑆_𝑥𝑦 = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)(𝑥_𝑖 - 𝑥̅)]

𝑆_𝑥𝑥 = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖- 𝑥̅)(𝑥_𝑖 - 𝑥̅)]

therefore:

𝜃₁= 𝑆_𝑥𝑦/𝑆_𝑥𝑥

𝜃₁= [𝑆_𝑥𝑦/(𝑛-1)]/ [𝑆_𝑥𝑥/(𝑛-1)]

𝜃₁= [𝑠_𝑥𝑦²/ 𝑠_𝑥²] # 𝑠_𝑥𝑦² = sample covariance & 𝑠_𝑥² = sample variance & 𝑠_𝑥= sample standard deviation

𝜃₁= [𝑠_𝑥𝑦²/ 𝑠_𝑥²] [𝑠_𝑦/𝑠_𝑦] # 𝑠_𝑦= sample standard deviation

𝜃₁= [𝑠_𝑥𝑦²/ 𝑠_𝑥𝑠_𝑦] [𝑠_𝑦/𝑠_𝑥]

𝜃₁= 𝑟_𝑥𝑦 [𝑠_𝑦/𝑠_𝑥] # 𝑟_𝑥𝑦 = sample correlation coefficient

Both the correlation coefficient and regression slope is:

positive for positively correlated 𝑋 and 𝑌

negative for negatively correlated 𝑋 and 𝑌

The difference is:

correlation coefficient is dimensionless ranging from [-1,1]

slope is measured in units of 𝑌 per unit of 𝑋

recall that 𝜃

(Regression Slope 𝜃₁) in relation to (Covariance 𝑠_𝑥𝑦²)

Click here to expand...

𝜃₁= 𝑠_𝑥𝑦²/𝑠_𝑥²

where:

𝑠_𝑥𝑦² = 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)(𝑦_𝑖 - 𝑦̅)] / [𝑛 - 1] # sample covariance

𝑠_𝑥= 𝑟𝑜𝑜𝑡[𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)²] / [𝑛 - 1]] # sample standard deviation

Intuition

Given:

𝑋 is standard normal, thus:

𝐄[𝑋] = 0

𝑉𝑎𝑟(𝑋) = 1

𝑦 = 𝜃₀+ 𝜃₁𝑥 + 𝜎𝜀

𝐄[𝜀] = 0

Then:

𝐄[𝑌|𝑋] = 𝐄[𝜃₀+ 𝜃₁𝑋 + 𝜎𝜀|𝑋]

𝐄[𝑌|𝑋] = 𝐄[𝜃₀|𝑋] + 𝐄[𝜃₁𝑋|𝑋] + 𝐄[𝜎𝜀|𝑋]

𝐄[𝑌|𝑋] = 𝜃₀ + 𝜃₁𝑋 + 0

𝐄[𝑌|𝑋] = 𝜃₀+ 𝜃₁𝑋

so

𝐄[𝑌𝑋] = 𝐄[(𝜃₀+ 𝜃₁𝑋 + 𝜎𝜀)𝑋]

𝐄[𝑌𝑋] = 𝐄[𝜃₀𝑋+ 𝜃₁𝑋² + 𝜎𝜀𝑋]

𝐄[𝑌𝑋] = 𝐄[𝜃₀𝑋]+ 𝐄[𝜃₁𝑋²] + 𝐄[𝜎𝜀𝑋]

𝐄[𝑌𝑋] = 𝜃₀𝐄[𝑋]+ 𝜃₁𝐄[𝑋²] + 𝜎𝐄[𝜀𝑋]

𝐄[𝑌𝑋] = 𝜃₀0+ 𝜃₁𝐄[𝑋²] + 𝜎𝐄[𝜀𝑋]

𝐄[𝑌𝑋] = 𝜃₁𝐄[𝑋²] + 𝜎𝐄[𝜀𝑋]

𝑉𝑎𝑟(𝑋) = 𝐄[𝑋²] - 𝐄[𝑋]²

𝑉𝑎𝑟(𝑋) = 𝐄[𝑋²] + 0²

𝑉𝑎𝑟(𝑋) = 𝐄[𝑋²]

1 = 𝐄[𝑋²] # 𝑉𝑎𝑟(𝑋) = 1 as given above

𝐄[𝑌𝑋] = 𝜃₁1 + 𝜎𝐄[𝜀𝑋]

𝐄[𝑌𝑋] = 𝜃₁ + 𝜎𝐄[𝜀𝑋]

𝐶𝑜𝑣(𝜀,𝑋) = 𝐄[𝜀𝑋] - 𝐄[𝜀]𝐄[𝑋]

𝐶𝑜𝑣(𝜀,𝑋) = 𝐄[𝜀𝑋] + 0·0

𝐶𝑜𝑣(𝜀,𝑋) = 𝐄[𝜀𝑋]

0 = 𝐄[𝜀𝑋] # 𝐶𝑜𝑣(𝜀,𝑋) = 0 based on assumption of the linear model

𝐄[𝑌𝑋] = 𝜃₁ + 𝜎·0

𝐄[𝑌𝑋] = 𝜃₁

Hence:

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₁

We know 𝜃₁ is key to linear dependence, covariance formalizes it. In general:

𝐶𝑜𝑣(𝑌,𝑋) = 𝐄[𝑌𝑋] - 𝐄[𝑌]𝐄[𝑋]

Even if:

𝑋 is NOT standard normal, like:

𝐄[𝑋] = 𝜇

𝑉𝑎𝑟(𝑋) = 𝜎²

𝑦 = 𝜃₀+ 𝜃₁𝑥 + 𝜎𝜀

Then:

𝐄[𝑌|𝑋] = 𝜃₀+ 𝜃₁𝑋

so

𝐄[𝑌𝑋] = 𝐄[𝜃₀𝑋+ 𝜃₁𝑋²]

𝐄[𝑌𝑋] = 𝐄[𝜃₀𝑋]+ 𝐄[𝜃₁𝑋²]

𝐄[𝑌𝑋] = 𝜃₀𝜇+ 𝜃₁𝐄[𝑋²]

𝐄[𝑌𝑋] = 𝜃₀𝜇+ 𝜃₁𝐄[𝑋²]

𝑉𝑎𝑟(𝑌,𝑋) = 𝜎² = 𝐄[𝑋²] - 𝐄[𝑋]²

𝑉𝑎𝑟(𝑌,𝑋) = 𝜎² = 𝐄[𝑋²] - 𝜇²

𝜎² + 𝜇²= 𝐄[𝑋²]

𝐄[𝑌𝑋] = 𝜃₀𝜇+ 𝜃₁(𝜎² + 𝜇²)

𝐶𝑜𝑣(𝑌,𝑋) = 𝐄[𝑌𝑋] - 𝐄[𝑌]𝐄[𝑋]

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₀𝜇+ 𝜃₁(𝜎² + 𝜇²) - 𝐄[𝑌]𝐄[𝑋]

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₀𝜇+ 𝜃₁(𝜎² + 𝜇²) - 𝐄[𝜃₀+ 𝜃₁𝑋]𝐄[𝑋]

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₀𝜇+ 𝜃₁(𝜎² + 𝜇²) - 𝐄[𝜃₀+ 𝜃₁𝑋]𝜇

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₀𝜇+ 𝜃₁(𝜎² + 𝜇²) - (𝜃₀+ 𝜃₁𝜇)𝜇

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₀𝜇+ 𝜃₁(𝜎² + 𝜇²) - (𝜃₀𝜇 + 𝜃₁𝜇²)

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₀𝜇+ 𝜃₁𝜎² + 𝜃₁𝜇² - 𝜃₀𝜇 - 𝜃₁𝜇²

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₁𝜎²

Then:

𝐶𝑜𝑣(𝑌,𝑋) = 𝜃₁𝜎²

Thus:

𝜃₁ = 𝐶𝑜𝑣(𝑌,𝑋) / 𝜎²

𝜃₁ = 𝐶𝑜𝑣(𝑌,𝑋) / 𝑉𝑎𝑟(𝑋)

The estimations:

𝐶𝑜𝑣(𝑌,𝑋) = 𝑠_𝑥𝑦²

𝑉𝑎𝑟(𝑋) = 𝑠_𝑥²

Thus:

𝜃₁= 𝑠_𝑥𝑦²/𝑠_𝑥²

Analysis of Variance (ANOVA) - Prediction - Further Inference

sections:

evaluate the goodness of fit of the chosen regression model to the observed data
estimate the variance of response variable 𝑌_𝑖given 𝑋_𝑖: 𝑉𝑎𝑟(𝑌_𝑖|𝑋_𝑖) = 𝜎²with 𝜎̂²
then use 𝜎̂² to test the significance of regression parameters: 𝜃₀and 𝜃₁
construct confidence intervals and prediction intervals

Total Sum of Squares = Regression Sum of Squares + Error Sum of Squares

Click here to expand...

total sum of squares

measures the total variation among observed responses (variation of 𝑦_𝑖 about their sample mean 𝑦̅)

does not change wrt regression model

formula:

𝑆𝑆_𝑇𝑂𝑇 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̅]²

𝑆𝑆_𝑇𝑂𝑇 = (𝑛 - 1)(𝑠_𝑦)²

𝑆𝑆_𝑇𝑂𝑇 can be partitioned into 2 parts:

𝑆𝑆_𝑇𝑂𝑇 = 𝑆𝑆_𝑅𝐸𝐺+ 𝑆𝑆_𝐸𝑅𝑅

where:

𝑆𝑆_𝑅𝐸𝐺regression sum of squares - measures the total variation explained by the regression model

𝑆𝑆_𝑅𝐸𝐺 = 𝛴_{1≤𝑖≤𝑛}[𝑦̂_𝑖 - 𝑦̅]²

is often computed as

𝑆𝑆_𝑅𝐸𝐺 = 𝛴_{1≤𝑖≤𝑛}[𝜃₀+ 𝜃₁𝑥_𝑖 - 𝑦̅]²

𝑆𝑆_𝑅𝐸𝐺 = 𝛴_{1≤𝑖≤𝑛}[𝑦̅ - 𝜃₁𝑥̅ + 𝜃₁𝑥_𝑖 - 𝑦̅]²# substitute 𝜃₀= 𝑦̅ - 𝜃₁𝑥̅

𝑆𝑆_𝑅𝐸𝐺 = 𝜃₁𝛴_{1≤𝑖≤𝑛}[𝑥_𝑖 - 𝑥̅]²

𝑆𝑆_𝑅𝐸𝐺 = (𝑛 - 1)𝜃₁(𝑠_𝑥)²

𝑆𝑆_𝑅𝐸𝐺 = 𝜃₁·𝑆_𝑥𝑥

𝑆𝑆_𝐸𝑅𝑅error sum of squares - measures the total variation NOT explained by the regression model

𝑆𝑆_𝐸𝑅𝑅 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̂_𝑖]²= 𝛴_{1≤𝑖≤𝑛}[𝑒_𝑖]²

R-Square (Coefficient of Determination)

Click here to expand...

R-Square - Coefficient of Determination - Coefficient of Multiple Determination - Multiple R-Square

𝑅²is very similar to Pearson’s Correlation Coefficient (R)

𝑅²measures the proportion of the total variation (𝑆𝑆_𝑇𝑂𝑇) explained by regression model (𝑆𝑆_𝑅𝐸𝐺) In other words, how close the data fits the regression model

𝑅²ranges between the interval [0,1]

in univariate regression, 𝑅²also equals the correlation coefficient SQUARED

as new regressors are added to regression model, additional portions of the total variation 𝑆𝑆_𝑇𝑂𝑇are explained or it doesn’t. Therefore, 𝑅²either goes up or remains the same andNEVER goes down as we add more regressors. Thus, we expect 𝑅²to increase going from univariate regression to multivariate regression

for penalizing the addition of USELESS regressors see: Adjusted R-Square

formula:

𝑅² = (variance-about-the-mean - Mean Square Error (MSE)) / variance-about-the-mean

𝑅² = (variance-about-the-mean - variance-about-the-regression-line) / variance-about-the-mean

𝑅² = (variance-about-the-mean - variance-of-errors-not-explained-by-model) / variance-about-the-mean

𝑅² = variance-explained-by-model / variance-about-the-mean

𝑅² = 𝑆𝑆_𝑅𝐸𝐺/ 𝑆𝑆_𝑇𝑂𝑇

visual of variance-about-the-mean vs variance-about-the-regression-line

Indent

R²- Formal Definition

in Simple Linear Regression Models 𝑅²is known as Coefficient of Determination:

𝑅²= [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)] / 𝑉𝑎𝑟(mean)

𝑅²= 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛(𝑋,𝑌)²

in Multiple Linear Regression Models 𝑅² is known as Coefficient of Multiple Determination or Multiple R-Squared:

𝑅²= [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)] / 𝑉𝑎𝑟(mean) # where variance-explained-by-model = [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)]

R²- Properties

𝑅²ranges between [0,1]. This means the variation of a model is always less than or equal to variation of mean

high 𝑅² (and hence |𝑟|) → points are tightly clustered around the regression model → predicted 𝑦̂’s are close to observed 𝑦‘s → errors are small → fit is good

R² - Example

Link to original

Standard Regression Assumptions

Click here to expand...

Standard Regression Assumptions

each observed response 𝑦^(𝑖) are independent Normal random variables with:

expectation = 𝐄[𝑌^(𝑖)|𝑋₁^(𝑖), …, 𝑋_𝑘^(𝑖)] = 𝜃₀+ 𝜃₁𝑋₁^(𝑖)+ … + 𝜃_𝑘𝑋_𝑘^(𝑖)

variance = 𝑉𝑎𝑟(𝑌^(𝑖)|𝑋₁^(𝑖), …, 𝑋_𝑘^(𝑖)) = 𝜎² # variance is same constant for all 𝑋₁^(𝑖), …, 𝑋_𝑘^(𝑖)

predictors {𝑋₁^(𝑖), …, 𝑋_𝑘^(𝑖)} are considered non-random because they are observed

as a consequence:

𝜃₀, …, 𝜃_𝑘have Normal Distribution

Desired Properties

Required Assumptions

𝜃ˆ_𝑂𝐿𝑆 is an unbiased estimate of 𝜃

𝐄[𝜖^(𝑖)] = 0, ∀𝑖

𝜃ˆ_𝑂𝐿𝑆 is an unbiased estimate of 𝜃

𝜃ˆ_𝑂𝐿𝑆 is a BLUE estimator

𝐄[𝜖^(𝑖)] = 0, ∀𝑖

𝑉𝑎𝑟(𝜖^(𝑖)) = constant < ∞, ∀𝑖

𝐶𝑜𝑣(𝜖^(𝑖),𝜖^(𝑗)) = 0, ∀𝑖≠𝑗

𝜃ˆ_𝑂𝐿𝑆 is an unbiased estimate of 𝜃

𝜃ˆ_𝑂𝐿𝑆 is a BLUE estimator

𝜃ˆ_𝑂𝐿𝑆 is mathematically equivalent to MLE

𝐄[𝜖^(𝑖)] = 0, ∀𝑖

𝑉𝑎𝑟(𝜖^(𝑖)) = constant < ∞, ∀𝑖

𝐶𝑜𝑣(𝜖^(𝑖),𝜖^(𝑗)) = 0, ∀𝑖≠𝑗

All 𝜖 are Independent and Identically Distributed (IID) from the normal distribution

Assumptions in ANOVA

normality of sampling distribution of means - the distribution of sample means is normally distributed

errors 𝑒^(𝑖)& 𝑒^(𝑗) are independent of each other (where 𝑒^(𝑖) = 𝑦̂^(𝑖) - 𝑦^(𝑖))

absence of outliers - outliers have been removed from the dataset

homogeneity of variance - population variances at different levels of each independent variable {𝑋₁, …, 𝑋_𝑘} are equal

Link to original

Desired Properties	Required Assumptions
𝜃ˆ_𝑂𝐿𝑆 is an unbiased estimate of 𝜃	𝐄[𝜖^(𝑖)] = 0, ∀𝑖
𝜃ˆ_𝑂𝐿𝑆 is an unbiased estimate of 𝜃 𝜃ˆ_𝑂𝐿𝑆 is a BLUE estimator	𝐄[𝜖^(𝑖)] = 0, ∀𝑖 𝑉𝑎𝑟(𝜖^(𝑖)) = constant < ∞, ∀𝑖 𝐶𝑜𝑣(𝜖^(𝑖),𝜖^(𝑗)) = 0, ∀𝑖≠𝑗
𝜃ˆ_𝑂𝐿𝑆 is an unbiased estimate of 𝜃 𝜃ˆ_𝑂𝐿𝑆 is a BLUE estimator 𝜃ˆ_𝑂𝐿𝑆 is mathematically equivalent to MLE	𝐄[𝜖^(𝑖)] = 0, ∀𝑖 𝑉𝑎𝑟(𝜖^(𝑖)) = constant < ∞, ∀𝑖 𝐶𝑜𝑣(𝜖^(𝑖),𝜖^(𝑗)) = 0, ∀𝑖≠𝑗 All 𝜖 are Independent and Identically Distributed (IID) from the normal distribution

Degrees of Freedom

Click here to expand...

let us compute degrees of freedom of all three sum of squares 𝑆𝑆:

𝑑𝑓_𝑇𝑂𝑇 = 𝑑𝑓_𝑅𝐸𝐺+ 𝑑𝑓_𝐸𝑅𝑅

𝑆𝑆_𝑇𝑂𝑇 has 𝑑𝑓_𝑇𝑂𝑇 = (𝑛 - 1)

Click here to expand... _𝑇𝑂𝑇 = (𝑛 - 1) because it is computed directly from the sample variance (𝑠_𝑦)²

𝑆𝑆_𝑇𝑂𝑇= (𝑛 - 1)(𝑠_𝑦)²

𝑑𝑓

𝑆𝑆_𝑅𝐸𝐺 has 𝑑𝑓_𝑅𝐸𝐺 = 1

Click here to expand... _𝑅𝐸𝐺 = 1 because the number of degrees is the dimensions of the corresponding space. regression line, which is just a straight line, has deminsion 1

𝑑𝑓

𝑆𝑆_𝐸𝑅𝑅has 𝑑𝑓_𝐸𝑅𝑅 = (𝑛 - 2)

Click here to expand...

𝑑𝑓_𝐸𝑅𝑅= (sample size) - (number of estimated location parameters)

𝑑𝑓_𝐸𝑅𝑅= 𝑛 - 2

the error degrees of freedom also follow the formula below:

Estimating Population-Error/Regression Variance 𝜎²= 𝑉𝑎𝑟(𝑌|𝑋) With 𝜎̂² Mean Square Error (MSE)

Click here to expand...

see also: Ben Lambert’s Video Lecture

with the computed degrees of freedom, we can now estimate the regression variance 𝑌 given 𝑋:

𝜎̂²= estimated regression variance

𝜎̂² = 𝐄[(𝑦_𝑖 - 𝑦̂_𝑖)²]

𝜎̂² = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̂_𝑖]²/ (𝑛 - 2)

𝜎̂² = 𝑆𝑆_𝐸𝑅𝑅/ 𝑑𝑓_𝐸𝑅𝑅

𝜎̂² estimates 𝜎²= 𝑉𝑎𝑟(𝑌|𝑋) unbiasedly

NOTE: the usual sample variance:

(𝑠_𝑦)²= 𝑆𝑆_𝑇𝑂𝑇/ (𝑛 - 1)

(𝑠_𝑦)²= 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̅]² / (𝑛 - 1)

is biased because 𝑦̅ no longer estimates the expectation of 𝑌_𝑖

ANOVA Table Summary

𝑆𝑆_𝑇𝑂𝑇 = (𝑛 - 1)𝑉𝑎𝑟(𝑦)
𝑆𝑆_𝑅𝐸𝐺 = 𝑟²(𝑛 - 1)𝑉𝑎𝑟(𝑦)
𝑆𝑆_𝐸𝑅𝑅 = (1 - 𝑟²)(𝑛 - 1)𝑉𝑎𝑟(𝑦)

This table is a modification of One-Way ANOVA and can be used for both univariate linear regression and multivariate linear regression

Source

Sum of Squares

Degrees of Freedom

Mean Squares

𝐹 Statistic (ALL)

Total

Sum of Squares Total (TSS)
Sum of Squares Restricted
Sum of Squares Around Mean

𝑆𝑆_𝑇𝑂𝑇 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̅]²

𝑆𝑆_𝑇𝑂𝑇= 𝑆𝑆_𝑅𝐸𝐺 + 𝑆𝑆_𝐸𝑅𝑅

𝑑𝑓_𝑇𝑂𝑇 = 𝑛 - 1

Error

Sum of Squares Error (ESS)
Sum of Squares Residual (RSS)
Sum of Squares UnRestricted
Sum of Squares Around Model

𝑆𝑆_𝐸𝑅𝑅 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̂_𝑖]² = 𝛴_{1≤𝑖≤𝑛}[𝑒_𝑖]²

𝑑𝑓_𝐸𝑅𝑅= 𝑛 - # of model params including 𝜃₀
𝑑𝑓_𝐸𝑅𝑅= 𝑛 - (𝑘 + 1)
𝑑𝑓_𝐸𝑅𝑅= 𝑛 - 𝑘 - 1

𝑀𝑆_𝐸𝑅𝑅 = 𝑆𝑆_𝐸𝑅𝑅 / 𝑑𝑓_𝐸𝑅𝑅

Mean Square Error (MSE)
Regression Variance

𝑀𝑆_𝑅𝐸𝐺/ 𝑀𝑆_𝐸𝑅𝑅

this 𝐹 formula is used to test significance of the ENTIRE regression model

for other 𝐹 formulas used to test PARTIAL significance of regression model consult table below

Model

Sum of Squares Regression (RSS)
Sum of Squares Explained (ESS)

𝑆𝑆_𝑅𝐸𝐺 = 𝑆𝑆_𝑇𝑂𝑇 - 𝑆𝑆_𝐸𝑅𝑅

𝑆𝑆_𝑅𝐸𝐺 = 𝛴_{1≤𝑖≤𝑛}[𝑦̂_𝑖 - 𝑦̅]²

𝑑𝑓_𝑅𝐸𝐺 = 𝑑𝑓_𝑇𝑂𝑇 - 𝑑𝑓_𝐸𝑅𝑅
𝑑𝑓_𝑅𝐸𝐺 = (𝑛 - 1) - (𝑛 - # of model params)
𝑑𝑓_𝑅𝐸𝐺 = (# of model params) - 1
𝑑𝑓_𝑅𝐸𝐺 = 𝑘 = number of predictor variables 𝜃_𝑖‘s excluding 𝜃₀

𝑀𝑆_𝑅𝐸𝐺 = 𝑆𝑆_𝑅𝐸𝐺 / 𝑑𝑓_𝑅𝐸𝐺

𝐹 statistic for testing the null hypothesis that ALL variables are insignificant (e.g. 𝐻₀: 𝜃₁= … 𝜃_𝑘 = 0)

𝐹 statistic for testing the null hypothesis that SOME variables are insignificant (e.g. 𝐻₀: 𝜃_𝑖= 0, ∀𝜃_𝑖∊𝑆 where 𝑆⊆{𝜃₁, … 𝜃_𝑘})

unrestricted model

𝑦̂_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} = 𝜃₀+ 𝜃₁𝑥₁ + … + 𝜃_𝑘𝑥_𝑘

restricted model

𝑦̂_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} = 𝜃₀+ 0·𝑥₁ + … + 0·𝑥_𝑘= 𝜃₀ = 𝑦̅

𝐹 sum of squares form

𝐹 = 𝑀𝑆_𝑅𝐸𝐺/ 𝑀𝑆_𝐸𝑅𝑅

𝐹 = [(𝑆𝑆_𝑇𝑂𝑇- 𝑆𝑆_𝐸𝑅𝑅)/((𝑛 - 1)-(𝑛 - 𝑘 - 1))] / [(𝑆𝑆_𝐸𝑅𝑅)/(𝑛 - 𝑘 - 1)]

𝐹 = [(𝑆𝑆_𝑇𝑂𝑇- 𝑆𝑆_𝐸𝑅𝑅)/(𝑘)] / [(𝑆𝑆_𝐸𝑅𝑅)/(𝑛 - 𝑘 - 1)]

𝐹 = [(𝑆𝑆_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑}- 𝑆𝑆_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(𝑘)] / [(𝑆𝑆_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(𝑛 - 𝑘 - 1)]

𝐹 𝑅² form

𝐹 = [𝑅²_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑}/𝑘] / [(1 - 𝑅²_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(𝑛 - 𝑘 - 1)]

unrestricted model

𝑦̂_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} = 𝜃₀+ 𝜃₁𝑥₁ + … + 𝜃_𝑘𝑥_𝑘

restricted model

𝑦̂_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} = 𝜃₀ + (linear combination of 0·𝑥_𝑖 for 𝜃_𝑖∊𝑆) + (linear combination of 𝜃_𝑗𝑥_𝑗 for 𝜃_𝑗∉𝑆)

𝑦̂_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} = 𝜃₀ + (linear combination of 𝜃_𝑗𝑥_𝑗 for 𝜃_𝑗∉𝑆)

𝐹 sum of squares form

𝐹 = [(𝑆𝑆_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} - 𝑆𝑆_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/((𝑛 - (𝑘-|𝑆|) - 1)-(𝑛 - 𝑘 - 1))] / [(𝑆𝑆_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(𝑛 - 𝑘 - 1)]

𝐹 = [(𝑆𝑆_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑} - 𝑆𝑆_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(|𝑆|)] / [(𝑆𝑆_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(𝑛 - 𝑘 - 1)]

𝐹 𝑅² form

𝐹 = [(𝑅²_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑}- 𝑅²_{𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(|𝑆|)] / [(1 - 𝑅²_{𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑})/(𝑛 - 𝑘 - 1)]

𝐹 has f-distribution with parameters (𝑘, (𝑛 - 𝑘 - 1))

𝐹 has f-distribution with parameters (|𝑆|, (𝑛 - 𝑘 - 1))

Link to original

Source	Sum of Squares	Degrees of Freedom	Mean Squares	𝐹 Statistic (ALL)
Total	Sum of Squares Total (TSS) Sum of Squares Restricted Sum of Squares Around Mean 𝑆𝑆_𝑇𝑂𝑇 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̅]² 𝑆𝑆_𝑇𝑂𝑇= 𝑆𝑆_𝑅𝐸𝐺 + 𝑆𝑆_𝐸𝑅𝑅	𝑑𝑓_𝑇𝑂𝑇 = 𝑛 - 1
Error	Sum of Squares Error (ESS) Sum of Squares Residual (RSS) Sum of Squares UnRestricted Sum of Squares Around Model 𝑆𝑆_𝐸𝑅𝑅 = 𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖 - 𝑦̂_𝑖]² = 𝛴_{1≤𝑖≤𝑛}[𝑒_𝑖]²	𝑑𝑓_𝐸𝑅𝑅= 𝑛 - # of model params including 𝜃₀ 𝑑𝑓_𝐸𝑅𝑅= 𝑛 - (𝑘 + 1) 𝑑𝑓_𝐸𝑅𝑅= 𝑛 - 𝑘 - 1	𝑀𝑆_𝐸𝑅𝑅 = 𝑆𝑆_𝐸𝑅𝑅 / 𝑑𝑓_𝐸𝑅𝑅 Mean Square Error (MSE) Regression Variance	𝑀𝑆_𝑅𝐸𝐺/ 𝑀𝑆_𝐸𝑅𝑅 this 𝐹 formula is used to test significance of the ENTIRE regression model for other 𝐹 formulas used to test PARTIAL significance of regression model consult table below
Model	Sum of Squares Regression (RSS) Sum of Squares Explained (ESS) 𝑆𝑆_𝑅𝐸𝐺 = 𝑆𝑆_𝑇𝑂𝑇 - 𝑆𝑆_𝐸𝑅𝑅 𝑆𝑆_𝑅𝐸𝐺 = 𝛴_{1≤𝑖≤𝑛}[𝑦̂_𝑖 - 𝑦̅]²	𝑑𝑓_𝑅𝐸𝐺 = 𝑑𝑓_𝑇𝑂𝑇 - 𝑑𝑓_𝐸𝑅𝑅 𝑑𝑓_𝑅𝐸𝐺 = (𝑛 - 1) - (𝑛 - # of model params) 𝑑𝑓_𝑅𝐸𝐺 = (# of model params) - 1 𝑑𝑓_𝑅𝐸𝐺 = 𝑘 = number of predictor variables 𝜃_𝑖‘s excluding 𝜃₀	𝑀𝑆_𝑅𝐸𝐺 = 𝑆𝑆_𝑅𝐸𝐺 / 𝑑𝑓_𝑅𝐸𝐺

T-Test on Regression Slope (𝜃₁)

Click here to expand...

having estimated the regression variance 𝜎² with 𝜎̂², we can use it to create confidence intervals for the regression slope 𝜃₁ and use it for hypothesis testing

according to standard regression assumptions (defined above)

𝑦_𝑖‘s are independent Normal random variables with:

mean 𝐄[𝑌_𝑖|𝑋=𝑥_𝑖] = 𝜃₀+ 𝜃₁𝑥_𝑖

constant variance = 𝜎²

predictor 𝑥_𝑖is non-random

As a consequence, regression estimates 𝜃ˆ₀and 𝜃ˆ₁ have Normal distribution

let’s compute the expectation & variance of 𝜃ˆ₁:

𝐄[𝜃ˆ₁] = 𝜃₁

Click here to expand... Ben Lambert's Video Lecture

the slope 𝜃₁ is estimated by:

𝜃ˆ₁= 𝑆_𝑥𝑦/ 𝑆_𝑥𝑥

𝜃ˆ₁ = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝜃ˆ₁ = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥# because 𝑦̅𝛴_{1≤𝑖≤𝑛}(𝑥_𝑖 - 𝑥̅) = 0

according to standard regression assumptions (defined above):

𝑦_𝑖are Normal Variables

𝑥_𝑖are non-random

𝜃ˆ₁is also Normal because it is a linear function of 𝑦_𝑖

therefore:

𝐄[𝜃ˆ₁] = 𝐄[𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥]

𝐄[𝜃ˆ₁] = 𝛴_{1≤𝑖≤𝑛}[𝐄[𝑦_𝑖](𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝐄[𝜃ˆ₁] = 𝛴_{1≤𝑖≤𝑛}[(𝜃₀+ 𝜃₁𝑥_𝑖)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝐄[𝜃ˆ₁] = 𝛴_{1≤𝑖≤𝑛}[(𝑦̅ - 𝜃₁𝑥̅+ 𝜃₁𝑥_𝑖)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝐄[𝜃ˆ₁] = 𝛴_{1≤𝑖≤𝑛}[(𝑦̅)(𝑥_𝑖 - 𝑥̅)+ (𝜃₁𝑥_𝑖- 𝜃₁𝑥̅)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝐄[𝜃ˆ₁] = (𝑦̅)𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)]+ 𝛴_{1≤𝑖≤𝑛}[𝜃₁(𝑥_𝑖- 𝑥̅)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝐄[𝜃ˆ₁] = 0+ 𝛴_{1≤𝑖≤𝑛}[𝜃₁(𝑥_𝑖 - 𝑥̅)²] / 𝑆_𝑥𝑥# 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)] = 0

𝐄[𝜃ˆ₁] = 𝜃₁·𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)²] / 𝑆_𝑥𝑥

𝐄[𝜃ˆ₁] = 𝜃₁·1 # 𝑆_𝑥𝑥= 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)²]

𝐄[𝜃ˆ₁] = 𝜃₁

thus 𝜃ˆ₁ is an unbiased estimator of 𝜃₁

see also:

𝑉𝑎𝑟(𝜃ˆ₁|𝑥₁) = 𝜎̂²/𝑆_𝑥𝑥

computation Ben Lambert's Video Lecture

the slope 𝜃₁ is estimated by:

𝜃ˆ₁= 𝑆_𝑥𝑦/ 𝑆_𝑥𝑥

𝜃ˆ₁ = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖 - 𝑦̅)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥

𝜃ˆ₁ = 𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥# because 𝑦̅𝛴_{1≤𝑖≤𝑛}(𝑥_𝑖 - 𝑥̅) = 0

according to standard regression assumptions (defined above):

𝑦_𝑖are Normal Variables

𝑥_𝑖are non-random

𝜃ˆ₁is also Normal because it is a linear function of 𝑦_𝑖

therefore:

𝑉𝑎𝑟(𝜃ˆ₁|𝑥₁) = 𝑉𝑎𝑟[𝛴_{1≤𝑖≤𝑛}[(𝑦_𝑖)(𝑥_𝑖 - 𝑥̅)] / 𝑆_𝑥𝑥]

𝑉𝑎𝑟(𝜃ˆ₁|𝑥₁) = 𝛴_{1≤𝑖≤𝑛}[𝑉𝑎𝑟(𝑦_𝑖)(𝑥_𝑖 - 𝑥̅)²] / (𝑆_𝑥𝑥)²

𝑉𝑎𝑟(𝜃ˆ₁|𝑥₁) = 𝛴_{1≤𝑖≤𝑛}[𝜎²(𝑥_𝑖 - 𝑥̅)²] / (𝑆_𝑥𝑥)²

𝑉𝑎𝑟(𝜃ˆ₁|𝑥₁) = 𝜎²𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)²] / (𝑆_𝑥𝑥)²

𝑉𝑎𝑟(𝜃ˆ₁|𝑥₁) = 𝜎² / 𝑆_𝑥𝑥# 𝑆_𝑥𝑥= 𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖 - 𝑥̅)²]

see also:

thus 𝜃ˆ₁ is Normal(𝜇_𝜃₁, (𝜎_𝜃₁)²) with:

𝜇_𝜃₁ = 𝜃₁

(𝜎_𝜃₁)² = 𝜎² / 𝑆_𝑥𝑥

and 𝜃ˆ₁is estimated Normal(𝜇̂_𝜃₁, (𝜎̂_𝜃₁)²) with:

𝜇̂_𝜃₁ = 𝜃ˆ₁

(𝜎̂_𝜃₁)² = 𝜎̂² / 𝑆_𝑥𝑥

with (1-𝛼)100% 2-tailed confidence interval for the regression slope (𝜃₁)

= estimate ± 𝑧_𝛼/2·(std of estimate) # z-test is used when std of the estimator is known

= estimate ± 𝑡_{𝛼/2,𝑛-2}·(estimated std of estimate) # t-test is used when std of the estimator is estimated

= 𝜃ˆ₁± 𝑡_{𝛼/2,𝑛-2}·[𝜎̂²/(𝑆_𝑥𝑥)]^1/2

= 𝜃ˆ₁± 𝑡_{𝛼/2,𝑛-2}·[𝜎̂/√(𝑆_𝑥𝑥)]

hypothesis-testing:

𝐻₀: 𝜃₁= 𝐵

𝐻_𝑎: 𝜃₁≠ 𝐵

use the t-statistic with parameters 𝛼/2,𝑛-2:

𝑡 = (𝜃ˆ₁- 𝐵) / 𝑆𝐸’(𝜃ˆ₁) # 𝑆𝐸’(𝜃ˆ₁) is the estimated standard error of statistic 𝜃ˆ₁

𝑡 = (𝜃ˆ₁- 𝐵) / 𝜎̂/√(𝑆_𝑥𝑥)

To see if 𝑋 is significant for the prediction of 𝑌, test the null hypothesis with 𝐵 = 0:

𝐻₀: 𝜃₁= 0

𝐻_𝐴: 𝜃₁≠ 0

ANOVA F-Test on Model Significance

Click here to expand...

ANOVA F-test

popular for testing ratios of variances and significance of models

compares the portion of variation explained by regression with the portion that remains unexplained

is always one-sided and right-tail because only large values of the F-statistic show a large portion of explained variation and the overall significance of the model

under the null-hypothesis:

𝐻₀: 𝜃₁= 0

the f-statistic value is computed as follows:

𝐹 = 𝑀𝑆_𝑅𝐸𝐺/ 𝑀𝑆_𝐸𝑅𝑅

and has F-distribution and has two parameters:

numerator 𝑑𝑓 = 𝑑𝑓_𝑅𝐸𝐺 = 1

denominator 𝑑𝑓 = 𝑑𝑓_𝐸𝑅𝑅 = (𝑛 − 2)

F-Test vs T-Test

Click here to expand...

T-test for testing individual regression slope

F-test for testing the entire model significance

For the univariate linear regression, they are absolutely equivalent. In fact, the F-statistic equals the SQUARED T-statistic for testing 𝐻₀: 𝜃₁= 0:

𝑡² = (𝜃ˆ₁)² / (𝜎̂²/𝑆_𝑥𝑥)

𝑡² = (𝑆_𝑥𝑦/𝑆_𝑥𝑥)² / (𝜎̂²/𝑆_𝑥𝑥) # 𝜃ˆ₁= 𝑆_𝑥𝑦/𝑆_𝑥𝑥

𝑡² = [(𝑆_𝑥𝑦)²/𝑆_𝑥𝑥] / [𝜎̂²] # factored out 𝑆_𝑥𝑥

𝑡² = [(𝑆_𝑥𝑦)²/(𝑆_𝑦𝑦𝑆_𝑥𝑥)] * [𝑆_𝑦𝑦/𝜎̂²] # multiplied 𝑆_𝑦𝑦/𝑆_𝑦𝑦

𝑡² = [𝑟²] * [𝑆_𝑦𝑦/𝜎̂²] # 𝑟_𝑥𝑦 = 𝑠_𝑥𝑦/ (𝑠_𝑥*𝑠_𝑦) and (𝑠_𝑥)² = 𝑆_𝑥𝑥

𝑡² = [𝑟²] * [𝑆𝑆_𝑇𝑂𝑇/𝜎̂²] # 𝑆𝑆_𝑇𝑂𝑇= 𝑆_𝑦𝑦

𝑡² = [𝑟²*𝑆𝑆_𝑇𝑂𝑇/𝜎̂²]

𝑡² = [𝑆𝑆_𝑅𝐸𝐺/𝜎̂²] # 𝑟² = 𝑆𝑆_𝑅𝐸𝐺 / 𝑆𝑆_𝑇𝑂𝑇

𝑡² = [𝑀𝑆_𝑅𝐸𝐺/𝜎̂²] # 𝑀𝑆_𝑅𝐸𝐺 = 𝑆𝑆_𝑅𝐸𝐺 / 𝑑𝑓_𝑅𝐸𝐺 and 𝑑𝑓_𝑅𝐸𝐺 = 1

𝑡² = [𝑀𝑆_𝑅𝐸𝐺/𝑀𝑆_𝐸𝑅𝑅] # 𝑀𝑆_𝐸𝑅𝑅= 𝜎̂²

𝑡² = 𝑓 # definition of f-statistic

Prediction (Confidence Interval & Prediction Interval)

Click here to expand...

given a value of the predictor 𝑋:

𝑋=𝑥_𝑧

the TRUE mean/expectation of response 𝑌 is:

𝜇_𝑧 = 𝐄[𝑌|𝑋=𝑥_𝑧]

𝜇_𝑧 = 𝜃₀+ 𝑥_𝑧𝜃₁

the ESTIMATED mean/expectation of response 𝑌 is:

𝑦̂_𝑧 = 𝐄[𝑌|𝑋=𝑥_𝑧]

𝑦̂_𝑧 = 𝜃ˆ₀+ 𝑥_𝑧𝜃ˆ₁

How reliable are regression predictions, and how close are they to the real true values? we construct:

a (1-𝛼)100% confidence interval for the expectation

𝜇_𝑧 = 𝐄[𝑌|𝑋=𝑥_𝑧]

a (1-𝛼)100% prediction interval for the actual value of 𝑌=𝑦_𝑧 when 𝑋=𝑥_𝑧

Confidence Interval for the Mean of Response

(1-𝛼)100% confidence interval for the mean 𝜇_𝑧 = 𝐄[𝑌|𝑋=𝑥_𝑧] of all responses with 𝑋=𝑥_𝑧 is:

𝜃ˆ₀ + 𝜃ˆ₁𝑥_𝑧 ± 𝑡_𝛼/2·𝜎̂·√[ (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]

computation:

Click here to expand...

𝜇_𝑧 = 𝐄[𝑌|𝑋=𝑥_𝑧] = 𝜃₀+ 𝑥_𝑧𝜃₁

is the population parameter. 𝜇_𝑧 is the mean response for the entire subpopulation of units where the independent variable 𝑋=𝑥_𝑧

𝜇_𝑧is estimated by:

𝑦̂_𝑧 = 𝜃ˆ₀+ 𝜃ˆ₁𝑥_𝑧

𝑦̂_𝑧 = 𝑦̅ - 𝜃ˆ₁𝑥̅ + 𝜃ˆ₁𝑥_𝑧

𝑦̂_𝑧 = 𝑦̅ + 𝜃ˆ₁𝑥_𝑧- 𝜃ˆ₁𝑥̅

𝑦̂_𝑧 = 𝑦̅ + 𝜃ˆ₁ · (𝑥_𝑧- 𝑥̅)

𝑦̂_𝑧 = (1/𝑛)(𝛴_{1≤𝑖≤𝑛}[𝑦_𝑖]) + (𝛴_{1≤𝑖≤𝑛}[(𝑥_𝑖-𝑥̅)𝑦_𝑖])/(𝑆_𝑥𝑥) · (𝑥_𝑧- 𝑥̅)

𝑦̂_𝑧 = (𝛴_{1≤𝑖≤𝑛}[(1/𝑛)𝑦_𝑖]) + (𝛴_{1≤𝑖≤𝑛}[(1/𝑆_𝑥𝑥)(𝑥_𝑖-𝑥̅)(𝑥_𝑧- 𝑥̅)𝑦_𝑖])

𝑦̂_𝑧 = (𝛴_{1≤𝑖≤𝑛}[(1/𝑛)𝑦_𝑖 + (1/𝑆_𝑥𝑥)(𝑥_𝑖-𝑥̅)(𝑥_𝑧- 𝑥̅)𝑦_𝑖])

𝑦̂_𝑧 = (𝛴_{1≤𝑖≤𝑛}[(1/𝑛) + (1/𝑆_𝑥𝑥)(𝑥_𝑖-𝑥̅)(𝑥_𝑧- 𝑥̅)]·𝑦_𝑖)

we see that the estimator is a linear function of responses 𝑦_𝑖. Then under standard regression assumptions, 𝑦̂_𝑧 is Normal with:

mean = 𝜇_𝑧

𝐄[𝑦̂_𝑧] = 𝐄[𝜃ˆ₀] + 𝐄[𝜃ˆ₁]𝑥_𝑧

𝐄[𝑦̂_𝑧] = 𝜃₀+ 𝜃₁𝑥_𝑧

𝐄[𝑦̂_𝑧] = 𝜇_𝑧

variance = 𝜎²[ (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]

List indent undo

we can estimate the regression variance 𝜎²with 𝜎̂² and obtain the following confidence interval

(1-𝛼)100% confidence interval for the mean 𝜇_𝑧 = 𝐄[𝑌|𝑋=𝑥_𝑧] of all responses with 𝑋=𝑥_𝑧 is:

𝜃ˆ₀ + 𝜃ˆ₁𝑥_𝑧 ± 𝑡_𝛼/2·𝜎̂·√[ (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]

the expectation:

Prediction Interval for the Individual Response

instead of estimating a population parameter, we are now predicting the actual value of a random variable

(1-𝛼)100% prediction interval for the individual response 𝑌 when 𝑋=𝑥_𝑧:

𝜃ˆ₀ + 𝜃ˆ₁𝑥_𝑧 ± 𝑡_𝛼/2·𝜎̂·√[ 1 + (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]

computation:

Click here to expand... _𝑧 if it contains the value of 𝑌 with probability (1-𝛼):

𝐏{𝑎 ≤ 𝑌 ≤ 𝑏 | 𝑋=𝑥_𝑧} = 1 - 𝛼

this time, all 3 quantities: 𝑌, 𝑎, and 𝑏, are random variables

predicting 𝑌 by 𝑦̂_𝑧:

the standard deviation

𝑆𝑡𝑑(𝑌 - 𝑦̂_𝑧) = √[𝑉𝑎𝑟(𝑌) + 𝑉𝑎𝑟(𝑦̂_𝑧)]

𝑆𝑡𝑑(𝑌 - 𝑦̂_𝑧) = 𝜎·√[ 1 + (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]

is estimated by:

𝑆𝑡𝑑ˆ(𝑌 - 𝑦̂_𝑧) = 𝜎̂·√[ 1 + (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]

and standardizing all 3 parts of the inequality:

𝑎 ≤ 𝑌 ≤ 𝑏

we realize that the (1-𝛼)100% prediction interval for 𝑌 has to satisfy the equation

at the same time, the properly standardized (𝑌 - 𝑦̂_𝑧) has t-distribution:

a prediction interval is now computed by solving the following equation for 𝑎 and 𝑏:

thus, the (1-𝛼)100% prediction interval for the individual response 𝑌 when 𝑋=𝑥_𝑧:

₀ + 𝜃ˆ₁𝑥_𝑧 ± 𝑡_𝛼/2·𝜎̂·√[ 1 + (1/𝑛) + [(𝑥_𝑧 - 𝑥̅)² / 𝑆_𝑥𝑥] ]
𝜃ˆ

an interval [𝑎,𝑏] is a (1-𝛼)100% prediction interval for the individual response 𝑌 corresponding to predictor 𝑋=𝑥

／var／log marcus chiu

Explorer

Univariate／Single-Variable／Simple Linear Regression Models

Estimating 𝜃₀ and 𝜃₁(Ordinary Least Squares Method)

(Regression Slope 𝜃₁) in relation to (Correlation Coefficient 𝑟_𝑥𝑦)

(Regression Slope 𝜃₁) in relation to (Covariance 𝑠_𝑥𝑦²)

Intuition

Analysis of Variance (ANOVA) - Prediction - Further Inference

Total Sum of Squares = Regression Sum of Squares + Error Sum of Squares

total sum of squares

R-Square (Coefficient of Determination)

R²- Formal Definition

R²- Properties

R² - Example

Standard Regression Assumptions

Standard Regression Assumptions

Assumptions in ANOVA

Degrees of Freedom

Estimating Population-Error/Regression Variance 𝜎²= 𝑉𝑎𝑟(𝑌|𝑋) With 𝜎̂² Mean Square Error (MSE)

ANOVA Table Summary

T-Test on Regression Slope (𝜃₁)

ANOVA F-Test on Model Significance

F-Test vs T-Test

Prediction (Confidence Interval & Prediction Interval)

Confidence Interval for the Mean of Response

Prediction Interval for the Individual Response

／var／logmarcus chiu

Explorer

Univariate／Single-Variable／Simple Linear Regression Models

Estimating 𝜃0 and 𝜃1(Ordinary Least Squares Method)

(Regression Slope 𝜃1) in relation to (Correlation Coefficient 𝑟𝑥𝑦)

(Regression Slope 𝜃1) in relation to (Covariance 𝑠𝑥𝑦2)

Intuition

Analysis of Variance (ANOVA) - Prediction - Further Inference

Total Sum of Squares = Regression Sum of Squares + Error Sum of Squares

total sum of squares

R-Square (Coefficient of Determination)

R2- Formal Definition

R2- Properties

R2 - Example

Standard Regression Assumptions

Standard Regression Assumptions

Assumptions in ANOVA

Degrees of Freedom

Estimating Population-Error/Regression Variance 𝜎2= 𝑉𝑎𝑟(𝑌|𝑋) With 𝜎̂2 Mean Square Error (MSE)

ANOVA Table Summary

T-Test on Regression Slope (𝜃1)

ANOVA F-Test on Model Significance

F-Test vs T-Test

Prediction (Confidence Interval & Prediction Interval)

Confidence Interval for the Mean of Response

Prediction Interval for the Individual Response

／var／log marcus chiu

Estimating 𝜃₀ and 𝜃₁(Ordinary Least Squares Method)

(Regression Slope 𝜃₁) in relation to (Correlation Coefficient 𝑟_𝑥𝑦)

(Regression Slope 𝜃₁) in relation to (Covariance 𝑠_𝑥𝑦²)

R²- Formal Definition

R²- Properties

R² - Example

Estimating Population-Error/Regression Variance 𝜎²= 𝑉𝑎𝑟(𝑌|𝑋) With 𝜎̂² Mean Square Error (MSE)

T-Test on Regression Slope (𝜃₁)