Skip to content

More models

We cover a couple of additional models.

Threat to identification the linear model

We want to think about what is the estimand that a linear model would recover in a couple of interesting cases.

Omitted variable bias

We consider the following model:

Y_{i} = \alpha X_{i} + \gamma Z_{i} + \epsilon_i

together with E[\epsilon_i | X_i, Z_i] = 0 and we ask what does the regression coefficient of Y_i on X_i alone recovers?

\begin{align*} \beta^1 & = cov(X_i,Y_i) / var(X_i) \\ & = cov(X_i,\alpha X_{i} + \gamma Z_{i}) / var(X_i) \\ & = \alpha + \gamma cov(X_i, Z_{i}) / var(X_i) \end{align*}

Measurment error bias

We consider the following model:

Y_{i} = \beta X^*_{i} + \epsilon_i

where we only observe X_i = X^*_{i} + u_i

together with E[\epsilon_i,u_i | X_i] = 0 and we ask what does the regression coefficient of Y_i on X_i alone recovers?

\begin{align*} \beta^1 & = cov(X_i,Y_i) / var(X_i) \\ & = \frac{ cov(X^*_i,Y_i) }{ var(X^*_i) + var(u_i) } \\ & = \beta \frac{var(X^*_i) }{ var(X^*_i) + var(u_i) } \\ & < \beta \end{align*}

Instrumental variable

Perhaps we are willing to assume that for a given Z_i variable we have that E[U_i | Z_i] = 0 even if it is not true for X_i. In this case we can identify \beta from

\begin{align*} E[Z_i X'_i]^{-1} E[ Z_i Y_i ] & = E[Z_i X'_i]^{-1} E[ Z_i X'_i ] \beta + E[Z_i X'_i]^{-1} E[ Z_i U_i ] \\ & = \beta \end{align*}

assuming that E[Z_i X'_i] is square and invertible. We then define the instrumental variable estimator as

\beta^{IV}_n = ( Z_n ' X_n )^{-1} Z'_n Y_n

This can address both earlier mentioned issues. Two important assumptions:

  1. exclusion restriction: E[U_i | Z_i] = 0
  2. relevance: E[Z_i X'_i] is invertible

We can show consistency and asymptotic normality. But what about unbiasedness?

TBD in class!

2SLS

In more general case, when for instance one has more instruments than regressors. We introduce the 2SLS estimator:

$$ \beta_n^{2SLS} = (X'_n P_n X_n)^{-1} X'_n P_n Y_n $$

where P_n = Z_n ( Z'_n Z_n)^{-1} Z'_n. We see that P_n P_n = P_n and so we can write

\beta_n^{2SLS} = (X'_n P'_n P_n X_n)^{-1} X'_n P_n Y_n

and so this is like regressing Y_n on P_n X_n where we note that P_n X_n = Z_n ( Z'_n Z_n)^{-1} Z'_n X_n, is the predicted value from the regression of X_n on Z_n.