Univariate Linear Regression model assumes that the conditional expectation is a linear function of a single variable 𝑥:
- 𝑦̂ = 𝑓(𝑥) = 𝐄[𝑌|𝑋=𝑥] = 𝜃0+ 𝜃1𝑥
where:
- 𝜃0 = 𝑓(0) # 𝑦 intercept
- 𝜃1 = 𝛿𝑦/𝛿𝑥 # slope along 𝑥 axis
Estimating 𝜃0 and 𝜃1(Ordinary Least Squares Method)
Click here to expand...
given training/sample data 𝐷 = {(𝑥1,𝑦1), …, (𝑥𝑛,𝑦𝑛)} let us estimate the (intercept 𝜃0) and (slope 𝜃1) by the method of least squares:
The Sum of Square Error (𝑆𝑆𝐸𝑅𝑅) of 𝑓(𝑥) given 𝐷 is defined below:
- 𝑆𝑆𝐸𝑅𝑅 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̂𝑖]2
- 𝑆𝑆𝐸𝑅𝑅 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑓(𝑥𝑖)]2
- 𝑆𝑆𝐸𝑅𝑅 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝜃0- 𝜃1𝑥𝑖]2
We want to minimize 𝑠𝑠𝑒 wrt 𝜃0and 𝜃1. We can do it by taking partial derivatives of 𝑠𝑠𝑒, equating them to 0, then solving for 𝜃0 and 𝜃1.
Therefore, the estimate of 𝜃0 and 𝜃1 is shown below:
- estimate of 𝜃0= 𝜃ˆ0 = 𝑦̅ - 𝜃ˆ1𝑥̅
- estimate of 𝜃1= 𝜃ˆ1= 𝑆𝑥𝑦/ 𝑆𝑥𝑥
where:
- 𝑦̅ - mean of all 𝑦‘s
- 𝑥̅ - mean of all 𝑥‘s
- 𝑆𝑥𝑦 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)(𝑥𝑖 - 𝑥̅)]
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)(𝑥𝑖 - 𝑥̅)]
computation of the estimates
Click here to expand...
- 𝛿𝑠𝑠𝑒/𝛿𝜃0 = -2 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝜃0- 𝜃1𝑥𝑖]
- 𝛿𝑠𝑠𝑒/𝛿𝜃1 = -2 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝜃0- 𝜃1𝑥𝑖]𝑥𝑖
equating them to 0, we obtain so-called normal equations:
- 0 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝜃0- 𝜃1𝑥𝑖]
- 0 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝜃0- 𝜃1𝑥𝑖]𝑥𝑖
for the first normal equation:
- 0= 𝛴1≤𝑖≤𝑛[𝑦𝑖] - 𝜃0𝛴1≤𝑖≤𝑛[1] - 𝜃1𝛴1≤𝑖≤𝑛[𝑥𝑖]
- 0= 𝛴1≤𝑖≤𝑛[𝑦𝑖] - 𝜃0𝑛 - 𝜃1𝛴1≤𝑖≤𝑛[𝑥𝑖]
- 𝜃0𝑛= 𝛴1≤𝑖≤𝑛[𝑦𝑖] - 𝜃1𝛴1≤𝑖≤𝑛[𝑥𝑖]
- 𝜃0= 𝛴1≤𝑖≤𝑛[𝑦𝑖]/𝑛 - 𝜃1𝛴1≤𝑖≤𝑛[𝑥𝑖]/𝑛
- 𝜃0= 𝑦̅ - 𝜃1𝑥̅ # 𝑦̅ and 𝑥̅ definition of the sample mean
substitute this into the second equation:
- 0 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝜃0- 𝜃1𝑥𝑖]𝑥𝑖
- 0 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - (𝑦̅ - 𝜃1𝑥̅)- 𝜃1𝑥𝑖]𝑥𝑖
- 0 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̅ + 𝜃1𝑥̅- 𝜃1𝑥𝑖]𝑥𝑖
- 0 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅) + 𝜃1(𝑥̅- 𝑥𝑖)]𝑥𝑖
- 0 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅) - (-𝜃1(𝑥̅- 𝑥𝑖))]𝑥𝑖
- 0 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅) - 𝜃1(𝑥𝑖- 𝑥̅)]𝑥𝑖
- 0 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)𝑥𝑖] - 𝜃1𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)𝑥𝑖]
- 𝑆𝑥𝑦 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)𝑥𝑖]
- 𝑆𝑥𝑦= 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)𝑥𝑖] - 𝑥̅·0
- 𝑆𝑥𝑦= 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)𝑥𝑖] - 𝑥̅·𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)] # because 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)] = 0
- 𝑆𝑥𝑦= 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)𝑥𝑖] - 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)𝑥̅]
- 𝑆𝑥𝑦 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)(𝑥𝑖 - 𝑥̅)]
- and
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)𝑥𝑖]
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)𝑥𝑖] - 𝑥̅·0
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)𝑥𝑖] - 𝑥̅·𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)] # because 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)] = 0
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)𝑥𝑖] - 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)𝑥̅]
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)(𝑥𝑖 - 𝑥̅)]
- 0 = 𝑆𝑥𝑦 - 𝜃1𝑆𝑥𝑥
- 𝜃1= 𝑆𝑥𝑦/ 𝑆𝑥𝑥
the partial derivatives are:
(Regression Slope 𝜃1) in relation to (Correlation Coefficient 𝑟𝑥𝑦)
Click here to expand...
- 𝜃1= 𝑟𝑥𝑦(𝑠𝑦/𝑠𝑥)
- 𝜃1= 𝑠𝑥𝑦2/𝑠𝑥2
sample correlation is defined below:
- 𝑟𝑥𝑦 = 𝑠𝑥𝑦2/ (𝑠𝑥𝑠𝑦)
where:
- 𝑠𝑥𝑦2 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)(𝑦𝑖 - 𝑦̅)] / [𝑛 - 1] # sample covariance
- 𝑠𝑥= 𝑟𝑜𝑜𝑡[𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)2] / [𝑛 - 1]] # sample standard deviation
- 𝑠𝑦= 𝑟𝑜𝑜𝑡[𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)2] / [𝑛 - 1]] # sample standard deviation
proof:
Click here to expand... 1= 𝑆𝑥𝑦/ 𝑆𝑥𝑥
where:
- 𝑆𝑥𝑦 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)(𝑥𝑖 - 𝑥̅)]
- 𝑆𝑥𝑥 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖- 𝑥̅)(𝑥𝑖 - 𝑥̅)]
therefore:
- 𝜃1= 𝑆𝑥𝑦/𝑆𝑥𝑥
- 𝜃1= [𝑆𝑥𝑦/(𝑛-1)]/ [𝑆𝑥𝑥/(𝑛-1)]
- 𝜃1= [𝑠𝑥𝑦2/ 𝑠𝑥2] # 𝑠𝑥𝑦2 = sample covariance & 𝑠𝑥2 = sample variance & 𝑠𝑥= sample standard deviation
- 𝜃1= [𝑠𝑥𝑦2/ 𝑠𝑥2] [𝑠𝑦/𝑠𝑦] # 𝑠𝑦= sample standard deviation
- 𝜃1= [𝑠𝑥𝑦2/ 𝑠𝑥𝑠𝑦] [𝑠𝑦/𝑠𝑥]
- 𝜃1= 𝑟𝑥𝑦 [𝑠𝑦/𝑠𝑥] # 𝑟𝑥𝑦 = sample correlation coefficient
Both the correlation coefficient and regression slope is:
- positive for positively correlated 𝑋 and 𝑌
- negative for negatively correlated 𝑋 and 𝑌
The difference is:
- correlation coefficient is dimensionless ranging from [-1,1]
- slope is measured in units of 𝑌 per unit of 𝑋
recall that 𝜃
(Regression Slope 𝜃1) in relation to (Covariance 𝑠𝑥𝑦2)
Click here to expand...
- 𝜃1= 𝑠𝑥𝑦2/𝑠𝑥2
where:
- 𝑠𝑥𝑦2 = 𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)(𝑦𝑖 - 𝑦̅)] / [𝑛 - 1] # sample covariance
- 𝑠𝑥= 𝑟𝑜𝑜𝑡[𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)2] / [𝑛 - 1]] # sample standard deviation
Intuition
Given:
- 𝑋 is standard normal, thus:
- 𝐄[𝑋] = 0
- 𝑉𝑎𝑟(𝑋) = 1
- 𝑦 = 𝜃0+ 𝜃1𝑥 + 𝜎𝜀
- 𝐄[𝜀] = 0
Then:
- 𝐄[𝑌|𝑋] = 𝐄[𝜃0+ 𝜃1𝑋 + 𝜎𝜀|𝑋]
- 𝐄[𝑌|𝑋] = 𝐄[𝜃0|𝑋] + 𝐄[𝜃1𝑋|𝑋] + 𝐄[𝜎𝜀|𝑋]
- 𝐄[𝑌|𝑋] = 𝜃0 + 𝜃1𝑋 + 0
- 𝐄[𝑌|𝑋] = 𝜃0+ 𝜃1𝑋
so
- 𝐄[𝑌𝑋] = 𝐄[(𝜃0+ 𝜃1𝑋 + 𝜎𝜀)𝑋]
- 𝐄[𝑌𝑋] = 𝐄[𝜃0𝑋+ 𝜃1𝑋2 + 𝜎𝜀𝑋]
- 𝐄[𝑌𝑋] = 𝐄[𝜃0𝑋]+ 𝐄[𝜃1𝑋2] + 𝐄[𝜎𝜀𝑋]
- 𝐄[𝑌𝑋] = 𝜃0𝐄[𝑋]+ 𝜃1𝐄[𝑋2] + 𝜎𝐄[𝜀𝑋]
- 𝐄[𝑌𝑋] = 𝜃00+ 𝜃1𝐄[𝑋2] + 𝜎𝐄[𝜀𝑋]
- 𝐄[𝑌𝑋] = 𝜃1𝐄[𝑋2] + 𝜎𝐄[𝜀𝑋]
- 𝑉𝑎𝑟(𝑋) = 𝐄[𝑋2] - 𝐄[𝑋]2
- 𝑉𝑎𝑟(𝑋) = 𝐄[𝑋2] + 02
- 𝑉𝑎𝑟(𝑋) = 𝐄[𝑋2]
- 1 = 𝐄[𝑋2] # 𝑉𝑎𝑟(𝑋) = 1 as given above
- 𝐄[𝑌𝑋] = 𝜃11 + 𝜎𝐄[𝜀𝑋]
- 𝐄[𝑌𝑋] = 𝜃1 + 𝜎𝐄[𝜀𝑋]
- 𝐶𝑜𝑣(𝜀,𝑋) = 𝐄[𝜀𝑋] - 𝐄[𝜀]𝐄[𝑋]
- 𝐶𝑜𝑣(𝜀,𝑋) = 𝐄[𝜀𝑋] + 0·0
- 𝐶𝑜𝑣(𝜀,𝑋) = 𝐄[𝜀𝑋]
- 0 = 𝐄[𝜀𝑋] # 𝐶𝑜𝑣(𝜀,𝑋) = 0 based on assumption of the linear model
- 𝐄[𝑌𝑋] = 𝜃1 + 𝜎·0
- 𝐄[𝑌𝑋] = 𝜃1
Hence:
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃1
We know 𝜃1 is key to linear dependence, covariance formalizes it. In general:
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝐄[𝑌𝑋] - 𝐄[𝑌]𝐄[𝑋]
Even if:
- 𝑋 is NOT standard normal, like:
- 𝐄[𝑋] = 𝜇
- 𝑉𝑎𝑟(𝑋) = 𝜎2
- 𝑦 = 𝜃0+ 𝜃1𝑥 + 𝜎𝜀
Then:
- 𝐄[𝑌|𝑋] = 𝜃0+ 𝜃1𝑋
so
- 𝐄[𝑌𝑋] = 𝐄[𝜃0𝑋+ 𝜃1𝑋2]
- 𝐄[𝑌𝑋] = 𝐄[𝜃0𝑋]+ 𝐄[𝜃1𝑋2]
- 𝐄[𝑌𝑋] = 𝜃0𝜇+ 𝜃1𝐄[𝑋2]
- 𝐄[𝑌𝑋] = 𝜃0𝜇+ 𝜃1𝐄[𝑋2]
- 𝑉𝑎𝑟(𝑌,𝑋) = 𝜎2 = 𝐄[𝑋2] - 𝐄[𝑋]2
- 𝑉𝑎𝑟(𝑌,𝑋) = 𝜎2 = 𝐄[𝑋2] - 𝜇2
- 𝜎2 + 𝜇2= 𝐄[𝑋2]
- 𝐄[𝑌𝑋] = 𝜃0𝜇+ 𝜃1(𝜎2 + 𝜇2)
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝐄[𝑌𝑋] - 𝐄[𝑌]𝐄[𝑋]
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃0𝜇+ 𝜃1(𝜎2 + 𝜇2) - 𝐄[𝑌]𝐄[𝑋]
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃0𝜇+ 𝜃1(𝜎2 + 𝜇2) - 𝐄[𝜃0+ 𝜃1𝑋]𝐄[𝑋]
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃0𝜇+ 𝜃1(𝜎2 + 𝜇2) - 𝐄[𝜃0+ 𝜃1𝑋]𝜇
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃0𝜇+ 𝜃1(𝜎2 + 𝜇2) - (𝜃0+ 𝜃1𝜇)𝜇
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃0𝜇+ 𝜃1(𝜎2 + 𝜇2) - (𝜃0𝜇 + 𝜃1𝜇2)
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃0𝜇+ 𝜃1𝜎2 + 𝜃1𝜇2 - 𝜃0𝜇 - 𝜃1𝜇2
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃1𝜎2
Then:
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝜃1𝜎2
Thus:
- 𝜃1 = 𝐶𝑜𝑣(𝑌,𝑋) / 𝜎2
- 𝜃1 = 𝐶𝑜𝑣(𝑌,𝑋) / 𝑉𝑎𝑟(𝑋)
The estimations:
- 𝐶𝑜𝑣(𝑌,𝑋) = 𝑠𝑥𝑦2
- 𝑉𝑎𝑟(𝑋) = 𝑠𝑥2
Thus:
- 𝜃1= 𝑠𝑥𝑦2/𝑠𝑥2
Analysis of Variance (ANOVA) - Prediction - Further Inference
sections:
- evaluate the goodness of fit of the chosen regression model to the observed data
- estimate the variance of response variable 𝑌𝑖given 𝑋𝑖: 𝑉𝑎𝑟(𝑌𝑖|𝑋𝑖) = 𝜎2with 𝜎̂2
- then use 𝜎̂2 to test the significance of regression parameters: 𝜃0and 𝜃1
- construct confidence intervals and prediction intervals
Total Sum of Squares = Regression Sum of Squares + Error Sum of Squares
Click here to expand...
total sum of squares
- measures the total variation among observed responses (variation of 𝑦𝑖 about their sample mean 𝑦̅)
- does not change wrt regression model
formula:
- 𝑆𝑆𝑇𝑂𝑇 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̅]2
- 𝑆𝑆𝑇𝑂𝑇 = (𝑛 - 1)(𝑠𝑦)2
𝑆𝑆𝑇𝑂𝑇 can be partitioned into 2 parts:
- 𝑆𝑆𝑇𝑂𝑇 = 𝑆𝑆𝑅𝐸𝐺+ 𝑆𝑆𝐸𝑅𝑅
where:
- 𝑆𝑆𝑅𝐸𝐺regression sum of squares - measures the total variation explained by the regression model
- 𝑆𝑆𝑅𝐸𝐺 = 𝛴1≤𝑖≤𝑛[𝑦̂𝑖 - 𝑦̅]2
is often computed as
- 𝑆𝑆𝑅𝐸𝐺 = 𝛴1≤𝑖≤𝑛[𝜃0+ 𝜃1𝑥𝑖 - 𝑦̅]2
- 𝑆𝑆𝑅𝐸𝐺 = 𝛴1≤𝑖≤𝑛[𝑦̅ - 𝜃1𝑥̅ + 𝜃1𝑥𝑖 - 𝑦̅]2# substitute 𝜃0= 𝑦̅ - 𝜃1𝑥̅
- 𝑆𝑆𝑅𝐸𝐺 = 𝜃1𝛴1≤𝑖≤𝑛[𝑥𝑖 - 𝑥̅]2
- 𝑆𝑆𝑅𝐸𝐺 = (𝑛 - 1)𝜃1(𝑠𝑥)2
- 𝑆𝑆𝑅𝐸𝐺 = 𝜃1·𝑆𝑥𝑥
- 𝑆𝑆𝐸𝑅𝑅error sum of squares - measures the total variation NOT explained by the regression model
- 𝑆𝑆𝐸𝑅𝑅 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̂𝑖]2= 𝛴1≤𝑖≤𝑛[𝑒𝑖]2
R-Square (Coefficient of Determination)
Click here to expand...
Link to originalR-Square - Coefficient of Determination - Coefficient of Multiple Determination - Multiple R-Square
- 𝑅2is very similar to Pearson’s Correlation Coefficient (R)
- 𝑅2measures the proportion of the total variation (𝑆𝑆𝑇𝑂𝑇) explained by regression model (𝑆𝑆𝑅𝐸𝐺) In other words, how close the data fits the regression model
- 𝑅2ranges between the interval [0,1]
- in univariate regression, 𝑅2also equals the correlation coefficient SQUARED
- as new regressors are added to regression model, additional portions of the total variation 𝑆𝑆𝑇𝑂𝑇are explained or it doesn’t. Therefore, 𝑅2either goes up or remains the same andNEVER goes down as we add more regressors. Thus, we expect 𝑅2to increase going from univariate regression to multivariate regression
- for penalizing the addition of USELESS regressors see: Adjusted R-Square
formula:
- 𝑅2 = (variance-about-the-mean - Mean Square Error (MSE)) / variance-about-the-mean
- 𝑅2 = (variance-about-the-mean - variance-about-the-regression-line) / variance-about-the-mean
- 𝑅2 = (variance-about-the-mean - variance-of-errors-not-explained-by-model) / variance-about-the-mean
- 𝑅2 = variance-explained-by-model / variance-about-the-mean
- 𝑅2 = 𝑆𝑆𝑅𝐸𝐺/ 𝑆𝑆𝑇𝑂𝑇
visual of variance-about-the-mean vs variance-about-the-regression-line
Indent
R2- Formal Definition
- in Simple Linear Regression Models 𝑅2is known as Coefficient of Determination:
- 𝑅2= [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)] / 𝑉𝑎𝑟(mean)
- 𝑅2= 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛(𝑋,𝑌)2
- in Multiple Linear Regression Models 𝑅2 is known as Coefficient of Multiple Determination or Multiple R-Squared:
- 𝑅2= [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)] / 𝑉𝑎𝑟(mean) # where variance-explained-by-model = [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)]
R2- Properties
- 𝑅2ranges between [0,1]. This means the variation of a model is always less than or equal to variation of mean
- high 𝑅2 (and hence |𝑟|) → points are tightly clustered around the regression model → predicted 𝑦̂’s are close to observed 𝑦‘s → errors are small → fit is good
R2 - Example
Standard Regression Assumptions
Click here to expand...
Link to originalStandard Regression Assumptions
- each observed response 𝑦(𝑖) are independent Normal random variables with:
- expectation = 𝐄[𝑌(𝑖)|𝑋1(𝑖), …, 𝑋𝑘(𝑖)] = 𝜃0+ 𝜃1𝑋1(𝑖)+ … + 𝜃𝑘𝑋𝑘(𝑖)
- variance = 𝑉𝑎𝑟(𝑌(𝑖)|𝑋1(𝑖), …, 𝑋𝑘(𝑖)) = 𝜎2 # variance is same constant for all 𝑋1(𝑖), …, 𝑋𝑘(𝑖)
- predictors {𝑋1(𝑖), …, 𝑋𝑘(𝑖)} are considered non-random because they are observed
- as a consequence:
- 𝜃0, …, 𝜃𝑘have Normal Distribution
Desired Properties
Required Assumptions
- 𝜃ˆ𝑂𝐿𝑆 is an unbiased estimate of 𝜃
- 𝐄[𝜖(𝑖)] = 0, ∀𝑖
- 𝜃ˆ𝑂𝐿𝑆 is an unbiased estimate of 𝜃
- 𝜃ˆ𝑂𝐿𝑆 is a BLUE estimator
- 𝐄[𝜖(𝑖)] = 0, ∀𝑖
- 𝑉𝑎𝑟(𝜖(𝑖)) = constant < ∞, ∀𝑖
- 𝐶𝑜𝑣(𝜖(𝑖),𝜖(𝑗)) = 0, ∀𝑖≠𝑗
- 𝜃ˆ𝑂𝐿𝑆 is an unbiased estimate of 𝜃
- 𝜃ˆ𝑂𝐿𝑆 is a BLUE estimator
- 𝜃ˆ𝑂𝐿𝑆 is mathematically equivalent to MLE
- 𝐄[𝜖(𝑖)] = 0, ∀𝑖
- 𝑉𝑎𝑟(𝜖(𝑖)) = constant < ∞, ∀𝑖
- 𝐶𝑜𝑣(𝜖(𝑖),𝜖(𝑗)) = 0, ∀𝑖≠𝑗
- All 𝜖 are Independent and Identically Distributed (IID) from the normal distribution
Assumptions in ANOVA
- normality of sampling distribution of means - the distribution of sample means is normally distributed
- errors 𝑒(𝑖)& 𝑒(𝑗) are independent of each other (where 𝑒(𝑖) = 𝑦̂(𝑖) - 𝑦(𝑖))
- absence of outliers - outliers have been removed from the dataset
- homogeneity of variance - population variances at different levels of each independent variable {𝑋1, …, 𝑋𝑘} are equal
Degrees of Freedom
Click here to expand...
let us compute degrees of freedom of all three sum of squares 𝑆𝑆:
𝑑𝑓𝑇𝑂𝑇 = 𝑑𝑓𝑅𝐸𝐺+ 𝑑𝑓𝐸𝑅𝑅
- 𝑆𝑆𝑇𝑂𝑇 has 𝑑𝑓𝑇𝑂𝑇 = (𝑛 - 1)
Click here to expand... 𝑇𝑂𝑇 = (𝑛 - 1) because it is computed directly from the sample variance (𝑠𝑦)2
𝑆𝑆𝑇𝑂𝑇= (𝑛 - 1)(𝑠𝑦)2
𝑑𝑓
- 𝑆𝑆𝑅𝐸𝐺 has 𝑑𝑓𝑅𝐸𝐺 = 1
Click here to expand... 𝑅𝐸𝐺 = 1 because the number of degrees is the dimensions of the corresponding space. regression line, which is just a straight line, has deminsion 1
𝑑𝑓
- 𝑆𝑆𝐸𝑅𝑅has 𝑑𝑓𝐸𝑅𝑅 = (𝑛 - 2)
Click here to expand...
- 𝑑𝑓𝐸𝑅𝑅= (sample size) - (number of estimated location parameters)
- 𝑑𝑓𝐸𝑅𝑅= 𝑛 - 2
the error degrees of freedom also follow the formula below:
Estimating Population-Error/Regression Variance 𝜎2= 𝑉𝑎𝑟(𝑌|𝑋) With 𝜎̂2 Mean Square Error (MSE)
Click here to expand...
see also: Ben Lambert’s Video Lecture
with the computed degrees of freedom, we can now estimate the regression variance 𝑌 given 𝑋:
- 𝜎̂2= estimated regression variance
- 𝜎̂2 = 𝐄[(𝑦𝑖 - 𝑦̂𝑖)2]
- 𝜎̂2 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̂𝑖]2/ (𝑛 - 2)
- 𝜎̂2 = 𝑆𝑆𝐸𝑅𝑅/ 𝑑𝑓𝐸𝑅𝑅
𝜎̂2 estimates 𝜎2= 𝑉𝑎𝑟(𝑌|𝑋) unbiasedly
NOTE: the usual sample variance:
- (𝑠𝑦)2= 𝑆𝑆𝑇𝑂𝑇/ (𝑛 - 1)
- (𝑠𝑦)2= 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̅]2 / (𝑛 - 1)
is biased because 𝑦̅ no longer estimates the expectation of 𝑌𝑖
ANOVA Table Summary
- 𝑆𝑆𝑇𝑂𝑇 = (𝑛 - 1)𝑉𝑎𝑟(𝑦)
- 𝑆𝑆𝑅𝐸𝐺 = 𝑟2(𝑛 - 1)𝑉𝑎𝑟(𝑦)
- 𝑆𝑆𝐸𝑅𝑅 = (1 - 𝑟2)(𝑛 - 1)𝑉𝑎𝑟(𝑦)
Link to originalThis table is a modification of One-Way ANOVA and can be used for both univariate linear regression and multivariate linear regression
Source
Sum of Squares
Degrees of Freedom
Mean Squares
𝐹 Statistic (ALL)
Total
Sum of Squares Total (TSS)
Sum of Squares Restricted
Sum of Squares Around Mean
- 𝑆𝑆𝑇𝑂𝑇 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̅]2
- 𝑆𝑆𝑇𝑂𝑇= 𝑆𝑆𝑅𝐸𝐺 + 𝑆𝑆𝐸𝑅𝑅
𝑑𝑓𝑇𝑂𝑇 = 𝑛 - 1
Error
Sum of Squares Error (ESS)
Sum of Squares Residual (RSS)
Sum of Squares UnRestricted
Sum of Squares Around Model
- 𝑆𝑆𝐸𝑅𝑅 = 𝛴1≤𝑖≤𝑛[𝑦𝑖 - 𝑦̂𝑖]2 = 𝛴1≤𝑖≤𝑛[𝑒𝑖]2
𝑑𝑓𝐸𝑅𝑅= 𝑛 - # of model params including 𝜃0
𝑑𝑓𝐸𝑅𝑅= 𝑛 - (𝑘 + 1)
𝑑𝑓𝐸𝑅𝑅= 𝑛 - 𝑘 - 1𝑀𝑆𝐸𝑅𝑅 = 𝑆𝑆𝐸𝑅𝑅 / 𝑑𝑓𝐸𝑅𝑅
𝑀𝑆𝑅𝐸𝐺/ 𝑀𝑆𝐸𝑅𝑅
this 𝐹 formula is used to test significance of the ENTIRE regression model
for other 𝐹 formulas used to test PARTIAL significance of regression model consult table below
Model
Sum of Squares Regression (RSS)
Sum of Squares Explained (ESS)
- 𝑆𝑆𝑅𝐸𝐺 = 𝑆𝑆𝑇𝑂𝑇 - 𝑆𝑆𝐸𝑅𝑅
- 𝑆𝑆𝑅𝐸𝐺 = 𝛴1≤𝑖≤𝑛[𝑦̂𝑖 - 𝑦̅]2
𝑑𝑓𝑅𝐸𝐺 = 𝑑𝑓𝑇𝑂𝑇 - 𝑑𝑓𝐸𝑅𝑅
𝑑𝑓𝑅𝐸𝐺 = (𝑛 - 1) - (𝑛 - # of model params)
𝑑𝑓𝑅𝐸𝐺 = (# of model params) - 1
𝑑𝑓𝑅𝐸𝐺 = 𝑘 = number of predictor variables 𝜃𝑖‘s excluding 𝜃0𝑀𝑆𝑅𝐸𝐺 = 𝑆𝑆𝑅𝐸𝐺 / 𝑑𝑓𝑅𝐸𝐺
𝐹 statistic for testing the null hypothesis that ALL variables are insignificant (e.g. 𝐻0: 𝜃1= … 𝜃𝑘 = 0)
𝐹 statistic for testing the null hypothesis that SOME variables are insignificant (e.g. 𝐻0: 𝜃𝑖= 0, ∀𝜃𝑖∊𝑆 where 𝑆⊆{𝜃1, … 𝜃𝑘})
unrestricted model
- 𝑦̂𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 = 𝜃0+ 𝜃1𝑥1 + … + 𝜃𝑘𝑥𝑘
restricted model
- 𝑦̂𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 = 𝜃0+ 0·𝑥1 + … + 0·𝑥𝑘= 𝜃0 = 𝑦̅
𝐹 sum of squares form
- 𝐹 = 𝑀𝑆𝑅𝐸𝐺/ 𝑀𝑆𝐸𝑅𝑅
- 𝐹 = [(𝑆𝑆𝑇𝑂𝑇- 𝑆𝑆𝐸𝑅𝑅)/((𝑛 - 1)-(𝑛 - 𝑘 - 1))] / [(𝑆𝑆𝐸𝑅𝑅)/(𝑛 - 𝑘 - 1)]
- 𝐹 = [(𝑆𝑆𝑇𝑂𝑇- 𝑆𝑆𝐸𝑅𝑅)/(𝑘)] / [(𝑆𝑆𝐸𝑅𝑅)/(𝑛 - 𝑘 - 1)]
- 𝐹 = [(𝑆𝑆𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑- 𝑆𝑆𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(𝑘)] / [(𝑆𝑆𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(𝑛 - 𝑘 - 1)]
𝐹 𝑅2 form
- 𝐹 = [𝑅2𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑/𝑘] / [(1 - 𝑅2𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(𝑛 - 𝑘 - 1)]
unrestricted model
- 𝑦̂𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 = 𝜃0+ 𝜃1𝑥1 + … + 𝜃𝑘𝑥𝑘
restricted model
- 𝑦̂𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 = 𝜃0 + (linear combination of 0·𝑥𝑖 for 𝜃𝑖∊𝑆) + (linear combination of 𝜃𝑗𝑥𝑗 for 𝜃𝑗∉𝑆)
- 𝑦̂𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 = 𝜃0 + (linear combination of 𝜃𝑗𝑥𝑗 for 𝜃𝑗∉𝑆)
𝐹 sum of squares form
- 𝐹 = [(𝑆𝑆𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 - 𝑆𝑆𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/((𝑛 - (𝑘-|𝑆|) - 1)-(𝑛 - 𝑘 - 1))] / [(𝑆𝑆𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(𝑛 - 𝑘 - 1)]
- 𝐹 = [(𝑆𝑆𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑 - 𝑆𝑆𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(|𝑆|)] / [(𝑆𝑆𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(𝑛 - 𝑘 - 1)]
𝐹 𝑅2 form
- 𝐹 = [(𝑅2𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑- 𝑅2𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(|𝑆|)] / [(1 - 𝑅2𝑢𝑛𝑟𝑒𝑠𝑡𝑟𝑖𝑐𝑡𝑒𝑑)/(𝑛 - 𝑘 - 1)]
𝐹 has f-distribution with parameters (𝑘, (𝑛 - 𝑘 - 1))
𝐹 has f-distribution with parameters (|𝑆|, (𝑛 - 𝑘 - 1))
T-Test on Regression Slope (𝜃1)
Click here to expand...
having estimated the regression variance 𝜎2 with 𝜎̂2, we can use it to create confidence intervals for the regression slope 𝜃1 and use it for hypothesis testing
according to standard regression assumptions (defined above)
- 𝑦𝑖‘s are independent Normal random variables with:
- mean 𝐄[𝑌𝑖|𝑋=𝑥𝑖] = 𝜃0+ 𝜃1𝑥𝑖
- constant variance = 𝜎2
- predictor 𝑥𝑖is non-random
As a consequence, regression estimates 𝜃ˆ0and 𝜃ˆ1 have Normal distribution
let’s compute the expectation & variance of 𝜃ˆ1:
- 𝐄[𝜃ˆ1] = 𝜃1
Click here to expand... Ben Lambert's Video Lecture
the slope 𝜃1 is estimated by:
- 𝜃ˆ1= 𝑆𝑥𝑦/ 𝑆𝑥𝑥
- 𝜃ˆ1 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝜃ˆ1 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥# because 𝑦̅𝛴1≤𝑖≤𝑛(𝑥𝑖 - 𝑥̅) = 0
according to standard regression assumptions (defined above):
- 𝑦𝑖are Normal Variables
- 𝑥𝑖are non-random
- 𝜃ˆ1is also Normal because it is a linear function of 𝑦𝑖
therefore:
- 𝐄[𝜃ˆ1] = 𝐄[𝛴1≤𝑖≤𝑛[(𝑦𝑖)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥]
- 𝐄[𝜃ˆ1] = 𝛴1≤𝑖≤𝑛[𝐄[𝑦𝑖](𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝐄[𝜃ˆ1] = 𝛴1≤𝑖≤𝑛[(𝜃0+ 𝜃1𝑥𝑖)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝐄[𝜃ˆ1] = 𝛴1≤𝑖≤𝑛[(𝑦̅ - 𝜃1𝑥̅+ 𝜃1𝑥𝑖)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝐄[𝜃ˆ1] = 𝛴1≤𝑖≤𝑛[(𝑦̅)(𝑥𝑖 - 𝑥̅)+ (𝜃1𝑥𝑖- 𝜃1𝑥̅)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝐄[𝜃ˆ1] = (𝑦̅)𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)]+ 𝛴1≤𝑖≤𝑛[𝜃1(𝑥𝑖- 𝑥̅)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝐄[𝜃ˆ1] = 0+ 𝛴1≤𝑖≤𝑛[𝜃1(𝑥𝑖 - 𝑥̅)2] / 𝑆𝑥𝑥# 𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)] = 0
- 𝐄[𝜃ˆ1] = 𝜃1·𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)2] / 𝑆𝑥𝑥
- 𝐄[𝜃ˆ1] = 𝜃1·1 # 𝑆𝑥𝑥= 𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)2]
- 𝐄[𝜃ˆ1] = 𝜃1
thus 𝜃ˆ1 is an unbiased estimator of 𝜃1
see also:
- 𝑉𝑎𝑟(𝜃ˆ1|𝑥1) = 𝜎̂2/𝑆𝑥𝑥
computation Ben Lambert's Video Lecture
the slope 𝜃1 is estimated by:
- 𝜃ˆ1= 𝑆𝑥𝑦/ 𝑆𝑥𝑥
- 𝜃ˆ1 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖 - 𝑦̅)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥
- 𝜃ˆ1 = 𝛴1≤𝑖≤𝑛[(𝑦𝑖)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥# because 𝑦̅𝛴1≤𝑖≤𝑛(𝑥𝑖 - 𝑥̅) = 0
according to standard regression assumptions (defined above):
- 𝑦𝑖are Normal Variables
- 𝑥𝑖are non-random
- 𝜃ˆ1is also Normal because it is a linear function of 𝑦𝑖
therefore:
- 𝑉𝑎𝑟(𝜃ˆ1|𝑥1) = 𝑉𝑎𝑟[𝛴1≤𝑖≤𝑛[(𝑦𝑖)(𝑥𝑖 - 𝑥̅)] / 𝑆𝑥𝑥]
- 𝑉𝑎𝑟(𝜃ˆ1|𝑥1) = 𝛴1≤𝑖≤𝑛[𝑉𝑎𝑟(𝑦𝑖)(𝑥𝑖 - 𝑥̅)2] / (𝑆𝑥𝑥)2
- 𝑉𝑎𝑟(𝜃ˆ1|𝑥1) = 𝛴1≤𝑖≤𝑛[𝜎2(𝑥𝑖 - 𝑥̅)2] / (𝑆𝑥𝑥)2
- 𝑉𝑎𝑟(𝜃ˆ1|𝑥1) = 𝜎2𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)2] / (𝑆𝑥𝑥)2
- 𝑉𝑎𝑟(𝜃ˆ1|𝑥1) = 𝜎2 / 𝑆𝑥𝑥# 𝑆𝑥𝑥= 𝛴1≤𝑖≤𝑛[(𝑥𝑖 - 𝑥̅)2]
see also:
thus 𝜃ˆ1 is Normal(𝜇𝜃1, (𝜎𝜃1)2) with:
- 𝜇𝜃1 = 𝜃1
- (𝜎𝜃1)2 = 𝜎2 / 𝑆𝑥𝑥
and 𝜃ˆ1is estimated Normal(𝜇̂𝜃1, (𝜎̂𝜃1)2) with:
- 𝜇̂𝜃1 = 𝜃ˆ1
- (𝜎̂𝜃1)2 = 𝜎̂2 / 𝑆𝑥𝑥
with (1-𝛼)100% 2-tailed confidence interval for the regression slope (𝜃1)
- = estimate ± 𝑧𝛼/2·(std of estimate) # z-test is used when std of the estimator is known
- = estimate ± 𝑡𝛼/2,𝑛-2·(estimated std of estimate) # t-test is used when std of the estimator is estimated
- = 𝜃ˆ1± 𝑡𝛼/2,𝑛-2·[𝜎̂2/(𝑆𝑥𝑥)]1/2
- = 𝜃ˆ1± 𝑡𝛼/2,𝑛-2·[𝜎̂/√(𝑆𝑥𝑥)]
hypothesis-testing:
- 𝐻0: 𝜃1= 𝐵
- 𝐻𝑎: 𝜃1≠ 𝐵
use the t-statistic with parameters 𝛼/2,𝑛-2:
- 𝑡 = (𝜃ˆ1- 𝐵) / 𝑆𝐸’(𝜃ˆ1) # 𝑆𝐸’(𝜃ˆ1) is the estimated standard error of statistic 𝜃ˆ1
- 𝑡 = (𝜃ˆ1- 𝐵) / 𝜎̂/√(𝑆𝑥𝑥)
To see if 𝑋 is significant for the prediction of 𝑌, test the null hypothesis with 𝐵 = 0:
- 𝐻0: 𝜃1= 0
- 𝐻𝐴: 𝜃1≠ 0
ANOVA F-Test on Model Significance
Click here to expand...
ANOVA F-test
- popular for testing ratios of variances and significance of models
- compares the portion of variation explained by regression with the portion that remains unexplained
- is always one-sided and right-tail because only large values of the F-statistic show a large portion of explained variation and the overall significance of the model
under the null-hypothesis:
- 𝐻0: 𝜃1= 0
the f-statistic value is computed as follows:
- 𝐹 = 𝑀𝑆𝑅𝐸𝐺/ 𝑀𝑆𝐸𝑅𝑅
and has F-distribution and has two parameters:
- numerator 𝑑𝑓 = 𝑑𝑓𝑅𝐸𝐺 = 1
- denominator 𝑑𝑓 = 𝑑𝑓𝐸𝑅𝑅 = (𝑛 − 2)
F-Test vs T-Test
Click here to expand...
- T-test for testing individual regression slope
- F-test for testing the entire model significance
For the univariate linear regression, they are absolutely equivalent. In fact, the F-statistic equals the SQUARED T-statistic for testing 𝐻0: 𝜃1= 0:
- 𝑡2 = (𝜃ˆ1)2 / (𝜎̂2/𝑆𝑥𝑥)
- 𝑡2 = (𝑆𝑥𝑦/𝑆𝑥𝑥)2 / (𝜎̂2/𝑆𝑥𝑥) # 𝜃ˆ1= 𝑆𝑥𝑦/𝑆𝑥𝑥
- 𝑡2 = [(𝑆𝑥𝑦)2/𝑆𝑥𝑥] / [𝜎̂2] # factored out 𝑆𝑥𝑥
- 𝑡2 = [(𝑆𝑥𝑦)2/(𝑆𝑦𝑦𝑆𝑥𝑥)] * [𝑆𝑦𝑦/𝜎̂2] # multiplied 𝑆𝑦𝑦/𝑆𝑦𝑦
- 𝑡2 = [𝑟2] * [𝑆𝑦𝑦/𝜎̂2] # 𝑟𝑥𝑦 = 𝑠𝑥𝑦/ (𝑠𝑥*𝑠𝑦) and (𝑠𝑥)2 = 𝑆𝑥𝑥
- 𝑡2 = [𝑟2] * [𝑆𝑆𝑇𝑂𝑇/𝜎̂2] # 𝑆𝑆𝑇𝑂𝑇= 𝑆𝑦𝑦
- 𝑡2 = [𝑟2*𝑆𝑆𝑇𝑂𝑇/𝜎̂2]
- 𝑡2 = [𝑆𝑆𝑅𝐸𝐺/𝜎̂2] # 𝑟2 = 𝑆𝑆𝑅𝐸𝐺 / 𝑆𝑆𝑇𝑂𝑇
- 𝑡2 = [𝑀𝑆𝑅𝐸𝐺/𝜎̂2] # 𝑀𝑆𝑅𝐸𝐺 = 𝑆𝑆𝑅𝐸𝐺 / 𝑑𝑓𝑅𝐸𝐺 and 𝑑𝑓𝑅𝐸𝐺 = 1
- 𝑡2 = [𝑀𝑆𝑅𝐸𝐺/𝑀𝑆𝐸𝑅𝑅] # 𝑀𝑆𝐸𝑅𝑅= 𝜎̂2
- 𝑡2 = 𝑓 # definition of f-statistic
Prediction (Confidence Interval & Prediction Interval)
Click here to expand...
given a value of the predictor 𝑋:
- 𝑋=𝑥𝑧
the TRUE mean/expectation of response 𝑌 is:
- 𝜇𝑧 = 𝐄[𝑌|𝑋=𝑥𝑧]
- 𝜇𝑧 = 𝜃0+ 𝑥𝑧𝜃1
the ESTIMATED mean/expectation of response 𝑌 is:
- 𝑦̂𝑧 = 𝐄[𝑌|𝑋=𝑥𝑧]
- 𝑦̂𝑧 = 𝜃ˆ0+ 𝑥𝑧𝜃ˆ1
How reliable are regression predictions, and how close are they to the real true values? we construct:
- a (1-𝛼)100% confidence interval for the expectation
- 𝜇𝑧 = 𝐄[𝑌|𝑋=𝑥𝑧]
- a (1-𝛼)100% prediction interval for the actual value of 𝑌=𝑦𝑧 when 𝑋=𝑥𝑧
Confidence Interval for the Mean of Response
(1-𝛼)100% confidence interval for the mean 𝜇𝑧 = 𝐄[𝑌|𝑋=𝑥𝑧] of all responses with 𝑋=𝑥𝑧 is:
- 𝜃ˆ0 + 𝜃ˆ1𝑥𝑧 ± 𝑡𝛼/2·𝜎̂·√[ (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
computation:
Click here to expand...
- 𝜇𝑧 = 𝐄[𝑌|𝑋=𝑥𝑧] = 𝜃0+ 𝑥𝑧𝜃1
is the population parameter. 𝜇𝑧 is the mean response for the entire subpopulation of units where the independent variable 𝑋=𝑥𝑧
𝜇𝑧is estimated by:
- 𝑦̂𝑧 = 𝜃ˆ0+ 𝜃ˆ1𝑥𝑧
- 𝑦̂𝑧 = 𝑦̅ - 𝜃ˆ1𝑥̅ + 𝜃ˆ1𝑥𝑧
- 𝑦̂𝑧 = 𝑦̅ + 𝜃ˆ1𝑥𝑧- 𝜃ˆ1𝑥̅
- 𝑦̂𝑧 = 𝑦̅ + 𝜃ˆ1 · (𝑥𝑧- 𝑥̅)
- 𝑦̂𝑧 = (1/𝑛)(𝛴1≤𝑖≤𝑛[𝑦𝑖]) + (𝛴1≤𝑖≤𝑛[(𝑥𝑖-𝑥̅)𝑦𝑖])/(𝑆𝑥𝑥) · (𝑥𝑧- 𝑥̅)
- 𝑦̂𝑧 = (𝛴1≤𝑖≤𝑛[(1/𝑛)𝑦𝑖]) + (𝛴1≤𝑖≤𝑛[(1/𝑆𝑥𝑥)(𝑥𝑖-𝑥̅)(𝑥𝑧- 𝑥̅)𝑦𝑖])
- 𝑦̂𝑧 = (𝛴1≤𝑖≤𝑛[(1/𝑛)𝑦𝑖 + (1/𝑆𝑥𝑥)(𝑥𝑖-𝑥̅)(𝑥𝑧- 𝑥̅)𝑦𝑖])
- 𝑦̂𝑧 = (𝛴1≤𝑖≤𝑛[(1/𝑛) + (1/𝑆𝑥𝑥)(𝑥𝑖-𝑥̅)(𝑥𝑧- 𝑥̅)]·𝑦𝑖)
we see that the estimator is a linear function of responses 𝑦𝑖. Then under standard regression assumptions, 𝑦̂𝑧 is Normal with:
- mean = 𝜇𝑧
- 𝐄[𝑦̂𝑧] = 𝐄[𝜃ˆ0] + 𝐄[𝜃ˆ1]𝑥𝑧
- 𝐄[𝑦̂𝑧] = 𝜃0+ 𝜃1𝑥𝑧
- 𝐄[𝑦̂𝑧] = 𝜇𝑧
- variance = 𝜎2[ (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
List indent undo
we can estimate the regression variance 𝜎2with 𝜎̂2 and obtain the following confidence interval
(1-𝛼)100% confidence interval for the mean 𝜇𝑧 = 𝐄[𝑌|𝑋=𝑥𝑧] of all responses with 𝑋=𝑥𝑧 is:
- 𝜃ˆ0 + 𝜃ˆ1𝑥𝑧 ± 𝑡𝛼/2·𝜎̂·√[ (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
the expectation:
Prediction Interval for the Individual Response
instead of estimating a population parameter, we are now predicting the actual value of a random variable
(1-𝛼)100% prediction interval for the individual response 𝑌 when 𝑋=𝑥𝑧:
- 𝜃ˆ0 + 𝜃ˆ1𝑥𝑧 ± 𝑡𝛼/2·𝜎̂·√[ 1 + (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
computation:
Click here to expand... 𝑧 if it contains the value of 𝑌 with probability (1-𝛼):
- 𝐏{𝑎 ≤ 𝑌 ≤ 𝑏 | 𝑋=𝑥𝑧} = 1 - 𝛼
this time, all 3 quantities: 𝑌, 𝑎, and 𝑏, are random variables
predicting 𝑌 by 𝑦̂𝑧:
the standard deviation
- 𝑆𝑡𝑑(𝑌 - 𝑦̂𝑧) = √[𝑉𝑎𝑟(𝑌) + 𝑉𝑎𝑟(𝑦̂𝑧)]
- 𝑆𝑡𝑑(𝑌 - 𝑦̂𝑧) = 𝜎·√[ 1 + (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
is estimated by:
- 𝑆𝑡𝑑ˆ(𝑌 - 𝑦̂𝑧) = 𝜎̂·√[ 1 + (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
and standardizing all 3 parts of the inequality:
- 𝑎 ≤ 𝑌 ≤ 𝑏
we realize that the (1-𝛼)100% prediction interval for 𝑌 has to satisfy the equation
at the same time, the properly standardized (𝑌 - 𝑦̂𝑧) has t-distribution:
a prediction interval is now computed by solving the following equation for 𝑎 and 𝑏:
thus, the (1-𝛼)100% prediction interval for the individual response 𝑌 when 𝑋=𝑥𝑧:
0 + 𝜃ˆ1𝑥𝑧 ± 𝑡𝛼/2·𝜎̂·√[ 1 + (1/𝑛) + [(𝑥𝑧 - 𝑥̅)2 / 𝑆𝑥𝑥] ]
𝜃ˆ
an interval [𝑎,𝑏] is a (1-𝛼)100% prediction interval for the individual response 𝑌 corresponding to predictor 𝑋=𝑥


---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/lr---anova-table/../../../../../../../../../../../computer/artificial-intelligence-(ai)---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/lr---anova-table/error-variance-model.png)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/univariate/single-variable/simple-linear-regression-models/univariate-linear-regression-variance-of-expected-response.png)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/univariate/single-variable/simple-linear-regression-models/univariate-linear-regression-prediction-interval-1.png)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/univariate/single-variable/simple-linear-regression-models/univariate-linear-regression-prediction-interval-2.png)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/regression-models/analysis-(regressor/predictor/independent/input/feature-function---response/dependent/output/outcome)-variable/parametric-regression-(pr)-models/continuous-regression-models/linear-regression-(lr)-models/univariate/single-variable/simple-linear-regression-models/univariate-linear-regression-prediction-interval-3.png)