R-Square - Coefficient of Determination - Coefficient of Multiple Determination - Multiple R-Square
- 𝑅2is very similar to Pearson’s Correlation Coefficient (R)
- 𝑅2measures the proportion of the total variation (𝑆𝑆𝑇𝑂𝑇) explained by regression model (𝑆𝑆𝑅𝐸𝐺) In other words, how close the data fits the regression model
- 𝑅2ranges between the interval [0,1]
- in univariate regression, 𝑅2also equals the correlation coefficient SQUARED
- as new regressors are added to regression model, additional portions of the total variation 𝑆𝑆𝑇𝑂𝑇are explained or it doesn’t. Therefore, 𝑅2either goes up or remains the same andNEVER goes down as we add more regressors. Thus, we expect 𝑅2to increase going from univariate regression to multivariate regression
- for penalizing the addition of USELESS regressors see: Adjusted R-Square
formula:
- 𝑅2 = (variance-about-the-mean - Mean Square Error (MSE)) / variance-about-the-mean
- 𝑅2 = (variance-about-the-mean - variance-about-the-regression-line) / variance-about-the-mean
- 𝑅2 = (variance-about-the-mean - variance-of-errors-not-explained-by-model) / variance-about-the-mean
- 𝑅2 = variance-explained-by-model / variance-about-the-mean
- 𝑅2 = 𝑆𝑆𝑅𝐸𝐺/ 𝑆𝑆𝑇𝑂𝑇
visual of variance-about-the-mean vs variance-about-the-regression-line
Indent
R2- Formal Definition
- in Simple Linear Regression Models 𝑅2is known as Coefficient of Determination:
- 𝑅2= [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)] / 𝑉𝑎𝑟(mean)
- 𝑅2= 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛(𝑋,𝑌)2
- in Multiple Linear Regression Models 𝑅2 is known as Coefficient of Multiple Determination or Multiple R-Squared:
- 𝑅2= [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)] / 𝑉𝑎𝑟(mean) # where variance-explained-by-model = [𝑉𝑎𝑟(mean) - 𝑉𝑎𝑟(model)]
R2- Properties
- 𝑅2ranges between [0,1]. This means the variation of a model is always less than or equal to variation of mean
- high 𝑅2 (and hence |𝑟|) → points are tightly clustered around the regression model → predicted 𝑦̂’s are close to observed 𝑦‘s → errors are small → fit is good
R2 - Example

