R-Square - Coefficient of Determination - Coefficient of Multiple Determination - Multiple R-Square

  • 𝑅2is very similar to Pearson’s Correlation Coefficient (R)
  • 𝑅2measures the proportion of the total variation (𝑆𝑆𝑇𝑂𝑇) explained by regression model (𝑆𝑆𝑅𝐸𝐺) In other words, how close the data fits the regression model
  • 𝑅2ranges between the interval [0,1]
  • in univariate regression, 𝑅2also equals the correlation coefficient SQUARED
  • as new regressors are added to regression model, additional portions of the total variation 𝑆𝑆𝑇𝑂𝑇are explained or it doesn’t. Therefore, 𝑅2either goes up or remains the same andNEVER goes down as we add more regressors. Thus, we expect 𝑅2to increase going from univariate regression to multivariate regression
  • for penalizing the addition of USELESS regressors see: Adjusted R-Square

formula:

  • 𝑅2 = (variance-about-the-meanMean Square Error (MSE)) / variance-about-the-mean
  • 𝑅2 = (variance-about-the-meanvariance-about-the-regression-line) / variance-about-the-mean
  • 𝑅2 = (variance-about-the-meanvariance-of-errors-not-explained-by-model) / variance-about-the-mean
  • 𝑅2 = variance-explained-by-model / variance-about-the-mean
  • 𝑅2 = 𝑆𝑆𝑅𝐸𝐺𝑆𝑆𝑇𝑂𝑇

visual of variance-about-the-mean vs variance-about-the-regression-line

Indent

R2- Formal Definition
R2- Properties
  • 𝑅2ranges between [0,1]. This means the variation of a model is always less than or equal to variation of mean
  • high 𝑅2 (and hence |𝑟|) → points are tightly clustered around the regression model → predicted 𝑦̂’s are close to observed 𝑦‘s → errors are small → fit is good
R2 - Example

Resources