omitted variable bias occurs when a regression model leaves out relevant independent variables, which are known as confounding variables. This forces the model to attribute the effects of omitted variables to variables that are in the model, which biases the coefficient estimates

Conditions that Cause Omitted Variable Bias

  • the omitted variable 𝑍 must correlate with the dependent variable 𝑌
  • the omitted variable 𝑍 must correlate with at least one independent variable 𝑋 in the regression model
  • that one independent variable 𝑋 must correlate with the dependent variable 𝑌

Effects of Omitted Variable Bias

The effect of 𝑋 can be either:

  • overestimated
  • underestimated
  • masked
  • sign reversed

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| > 0 and |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| > 0 then the true effect of 𝑋 is overestimated

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| > 0 and |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| > 0 then the true effect of 𝑋 is overestimated

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| < |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| then the true effect of 𝑋 is underestimated

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| < |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| then the true effect of 𝑋 is underestimated

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| = |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| then the true effect of 𝑋 is masked

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| = |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| then the true effect of 𝑋 is masked

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| > |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| then the true effect of 𝑋 is sign changed

when true effect of |𝐸𝑓𝑓𝑒𝑐𝑡(𝑍)| > |𝐸𝑓𝑓𝑒𝑐𝑡(𝑋)| then the true effect of 𝑋 is sign changed

How to Detect Omitted Variable Bias

We know that for omitted variable bias to exist, an independent variable must correlate with the residuals. Consequently, we can plot the residuals by the variables in our model. If we see a relationship in the plot, rather than random scatter, it both tells us that there is a problem and points us towards the solution