Linear Regression (LR) Models
- is a type of continuous regression model whose function/estimator is linear with respect to the regression coefficients {𝜃0, …, 𝜃𝑝}:
- 𝑦̂ = 𝜃0 + 𝜃1𝑓1(𝒙) + … + 𝜃𝑝𝑓𝑝(𝒙)
- models the relationship between:
- 𝑌 - a single scalar response/dependent variable (for categorical use logistic regression)
- {𝑋1, …, 𝑋𝑘} - one or more regressors or explanatory/predictor/covariate/independent variables. predictor variable types:
- continuous/scalar/numerical predictor
- categorical predictor - itself can be either nominal or ordinal
- models expected response as a function/conditional of regressors (where 𝑓𝑖(..) are feature functions)
- 𝐄[𝑌|𝑋1=𝑥1, …, 𝑋𝑘=𝑥𝑘] = ℎ(𝑥1, …, 𝑥𝑘) = 𝑦̂ = 𝜃0+ 𝜃1𝑓1(𝑥1, …, 𝑥𝑘) + … + 𝜃𝑝𝑓𝑝(𝑥1, …, 𝑥𝑘)
- coefficient 𝜃0 represents the𝑦 intercept when all feature functions 𝑓𝑖(..) equate to 0
- coefficient 𝜃𝑖 represents the mean change in the dependent variable 𝑦 given a 1 unit change in the independent feature function 𝑓𝑖(𝑥1, …, 𝑥𝑘) # for 1≤𝑖≤𝑝
- is a type of level-level model(or even alevel-log model when 𝑓𝑖(..) are log functions)
- the dependent variable 𝑦 is the combination of the regression model and error
- 𝑦 = 𝑦̂ + 𝑒
- dependent variable = (constant + independent variables) + error
- dependent variable = deterministic + stochastic
- deterministic component is the portion of the variation in the dependent variable that the independent variables explain. In other words, the mean of the dependent variable is a function of the independent variables. In a regression model, all of the explanatory power should reside here
- error is the difference between the expected value 𝑦̂ and the observed value 𝑦. Let’s put these terms together—the gap between the expected and observed values must not be predictable. Or, no explanatory power should be in the error. If you can use the error to make predictions about the response, your model has a problem. This issue is where residual plots play a role.
- the theory here is that the deterministic component of a regression model does such a great job of explaining the dependent variable that it leaves only the intrinsically inexplicable portion of your study area for the error. If you can identify non-randomness in the error term, your independent variables are not explaining everything that they can
LR - Steps
given sample/training data:
- (𝑦1, 𝑥11, …, 𝑥1𝑘) # sample 1
- (𝑦2, 𝑥21, …, 𝑥2𝑘) # sample 2
- …
- (𝑦𝑛, 𝑥𝑛1, …, 𝑥𝑛𝑘) # sample 𝑛
the task of Linear Regression:
- choose line equation form, such as:
- 𝐄[𝑌|𝑋1=𝑥1] = 𝑦̂ = ℎ(𝑥1) = 𝜃0+ 𝜃1𝑥1# univariate linear regression
- 𝐄[𝑌|𝑋1=𝑥1, 𝑋2=𝑥2] = 𝑦̂ = ℎ(𝑥1,𝑥2) = 𝜃0+ 𝜃1𝑥1 + 𝜃2𝑥2# multivariate linear regression
- 𝐄[𝑌|𝑋1=𝑥1, 𝑋2=𝑥2] = 𝑦̂ = ℎ(𝑥1,𝑥2) = 𝜃0+ 𝜃1𝑥1𝑥2+ 𝜃2𝑥12 + 𝜃3𝑥2# multiple linear regression
- where:
- 𝐄[𝑌|..] and 𝑦̂ and ℎ(..) - scalar response/dependent variable or hypothesis function conditional on 𝑥𝑖‘s
- 𝑥𝑖 - regressors or explanatory/predictor/covariate/independent variables
- 𝜃𝑖 - regression coefficients/weights
- estimate/find the values of the regression coefficients 𝜃𝑖which best fit the line equation to the data
- determine whether its a goodfit
LR - Types
|
LR Type |
Model Form |
Example Models |
|---|---|---|
|
𝐄[𝑌|𝑋1=𝑥1] = ℎ(𝑥1) = 𝑦̂ = 𝜃0+ 𝜃1𝑓1(𝑥1) |
| |
|
𝐄[𝑌|𝑋1=𝑥1, …, 𝑋𝑘=𝑥𝑘] = ℎ(𝑥1, …, 𝑥𝑘) = 𝑦̂ =:
|
|
LR - Methods for Estimating Coefficients (𝜃𝑖)
Link to originalMethods estimating unknown coefficients {𝜃0, …, 𝜃𝑘} of 𝐄[𝑌|𝑋1=𝑥1, …, 𝑋𝑘=𝑥𝑘] = ℎ(𝑥1, …, 𝑥𝑘) = 𝑦̂ = 𝜃0+ 𝜃1𝑓1(𝑥1, …, 𝑥𝑘) + … + 𝜃𝑘𝑓𝑘(𝑥1, …, 𝑥𝑘)
Method
Description
- idea: minimizing square error via GRADIENT DESCENT
- need to choose learning rate 𝛼
- need many iterations
- works well when the number of training examples 𝑋 is large
Method of Least Squares
(Projection Matrix - Normal Equation)
- idea: minimizing square error via NORMAL EQUATIONS
- no need to choose learning rate 𝛼
- do not need to iterate
- need to compute (𝑋𝑇𝑋)-1𝑋𝑇 or 𝑉𝐷-1𝑈𝑇
- slow if the number of training examples 𝑋 is large because computing the inverse of a matrix is 𝑂(𝑛3)
- idea: maximize the likelihood/likelihood function
- For OLS to be mathematically equivalent to MLE, the errors are assumed to be normally distributed and Independent and Identically Distributed (IID)
- idea: maximize the posterior
- idea: TODO
LR - Model Types
Linear Regression Models - takes an input vector 𝑥∊ℝ𝑛 as input and predicts the value of a scalar 𝑦∊ℝ as output (whose function/estimator is linear wrt the regression coefficients {𝜃0, …, 𝜃𝑝})
|
Linear Model Type |
Description |
|---|---|
| |
| |
| |
| |
|
Lasso Regression |
|
| |
|
Partial Least Squares (PLS) Regression |
|
| |
| |
| |
|
LR - Methods for Determining How Well The Fitted Line Describes the Data
LR - Methods for Diagnosing Bias Variance
LR - Subpages
- Linear Regression vs Gaussian Regression
List indent undo
- Bayesian Linear Regression
- Cook’s Distance
- Elastic Net Regression (Ridge & LASSO)
- LASSO Regression (Least Absolute Shrinkage and Selection Operator)
- Linear Regression (LR) Models - Comparisons
- LR - ANOVA Table
- LR - Categorical Predictor Variables
- LR - Methods Estimating Unknown Regression Coefficients
- LR - Model Building
- LR - Problems
- LR - R Code Examples
- LR - Standard Regression Assumptions
- LR - Tests - Derivation of F-Statistic
- LR - Tests - Derivation of Student T-Statistic
- Mallow’s Cp Statistic
- Multivariate/Multiple Linear Regression Models
- Ordinary Least Squares (OLS) Regression
- Ridge Regression
- Univariate/Single-Variable/Simple Linear Regression Models
LR - Resources
- Zed Statistics Regression Playlist
- StatQuest Linear Models: Part 1 & Part 2