# More models

We cover a couple of additional models.

## Threat to identification the linear model

We want to think about what is the estimand that a linear model would recover in a couple of interesting cases.

### Omitted variable bias

We consider the following model:

Y_{i} = \alpha X_{i} + \gamma Z_{i} + \epsilon_i

together with $E[\epsilon_i | X_i, Z_i] = 0$ and we ask what does the regression coefficient of $Y_i$ on $X_i$ alone recovers?

\begin{align*} \beta^1 & = cov(X_i,Y_i) / var(X_i) \\ & = cov(X_i,\alpha X_{i} + \gamma Z_{i}) / var(X_i) \\ & = \alpha + \gamma cov(X_i, Z_{i}) / var(X_i) \end{align*}

### Measurment error bias

We consider the following model:

Y_{i} = \beta X^*_{i} + \epsilon_i

where we only observe $X_i = X^*_{i} + u_i$

together with $E[\epsilon_i,u_i | X_i] = 0$ and we ask what does the regression coefficient of $Y_i$ on $X_i$ alone recovers?

\begin{align*} \beta^1 & = cov(X_i,Y_i) / var(X_i) \\ & = \frac{ cov(X^*_i,Y_i) }{ var(X^*_i) + var(u_i) } \\ & = \beta \frac{var(X^*_i) }{ var(X^*_i) + var(u_i) } \\ & < \beta \end{align*}

## Instrumental variable

Perhaps we are willing to assume that for a given $Z_i$ variable we have that $E[U_i | Z_i] = 0$ even if it is not true for $X_i$. In this case we can identify $\beta$ from

\begin{align*} E[Z_i X'_i]^{-1} E[ Z_i Y_i ] & = E[Z_i X'_i]^{-1} E[ Z_i X'_i ] \beta + E[Z_i X'_i]^{-1} E[ Z_i U_i ] \\ & = \beta \end{align*}

assuming that $E[Z_i X'_i]$ is square and invertible. We then define the instrumental variable estimator as

\beta^{IV}_n = ( Z_n ' X_n )^{-1} Z'_n Y_n

This can address both earlier mentioned issues. Two important assumptions:

1. exclusion restriction: $E[U_i | Z_i] = 0$
2. relevance: $E[Z_i X'_i]$ is invertible

We can show consistency and asymptotic normality. But what about unbiasedness?

TBD in class!

### 2SLS

In more general case, when for instance one has more instruments than regressors. We introduce the 2SLS estimator:

$$\beta_n^{2SLS} = (X'_n P_n X_n)^{-1} X'_n P_n Y_n$$

where $P_n = Z_n ( Z'_n Z_n)^{-1} Z'_n$. We see that $P_n P_n = P_n$ and so we can write

\beta_n^{2SLS} = (X'_n P'_n P_n X_n)^{-1} X'_n P_n Y_n

and so this is like regressing $Y_n$ on $P_n X_n$ where we note that $P_n X_n = Z_n ( Z'_n Z_n)^{-1} Z'_n X_n$, is the predicted value from the regression of $X_n$ on $Z_n$.