More models

We cover a couple of additional models.

Threat to identification the linear model

We want to think about what is the estimand that a linear model would recover in a couple of interesting cases.

Omitted variable bias

We consider the following model:

$Y_{i} = \alpha X_{i} + \gamma Z_{i} + \epsilon_i$

together with $E[\epsilon_i | X_i, Z_i] = 0$ and we ask what does the regression coefficient of $Y_i$ on $X_i$ alone recovers?

$\begin{align*} \beta^1 & = cov(X_i,Y_i) / var(X_i) \\ & = cov(X_i,\alpha X_{i} + \gamma Z_{i}) / var(X_i) \\ & = \alpha + \gamma cov(X_i, Z_{i}) / var(X_i) \end{align*}$

Measurment error bias

We consider the following model:

$Y_{i} = \beta X^*_{i} + \epsilon_i$

where we only observe $X_i = X^*_{i} + u_i$

together with $E[\epsilon_i,u_i | X_i] = 0$ and we ask what does the regression coefficient of $Y_i$ on $X_i$ alone recovers?

$\begin{align*} \beta^1 & = cov(X_i,Y_i) / var(X_i) \\ & = \frac{ cov(X^*_i,Y_i) }{ var(X^*_i) + var(u_i) } \\ & = \beta \frac{var(X^*_i) }{ var(X^*_i) + var(u_i) } \\ & < \beta \end{align*}$

Instrumental variable

Perhaps we are willing to assume that for a given $Z_i$ variable we have that $E[U_i | Z_i] = 0$ even if it is not true for $X_i$ . In this case we can identify $\beta$ from

$\begin{align*} E[Z_i X'_i]^{-1} E[ Z_i Y_i ] & = E[Z_i X'_i]^{-1} E[ Z_i X'_i ] \beta + E[Z_i X'_i]^{-1} E[ Z_i U_i ] \\ & = \beta \end{align*}$

assuming that $E[Z_i X'_i]$ is square and invertible. We then define the instrumental variable estimator as

$\beta^{IV}_n = ( Z_n ' X_n )^{-1} Z'_n Y_n$

This can address both earlier mentioned issues. Two important assumptions:

exclusion restriction: $E[U_i | Z_i] = 0$
relevance: $E[Z_i X'_i]$ is invertible

We can show consistency and asymptotic normality. But what about unbiasedness?

TBD in class!

2SLS

In more general case, when for instance one has more instruments than regressors. We introduce the 2SLS estimator:

$$ \beta_n^{2SLS} = (X'_n P_n X_n)^{-1} X'_n P_n Y_n $$

where $P_n = Z_n ( Z'_n Z_n)^{-1} Z'_n$ . We see that $P_n P_n = P_n$ and so we can write

$\beta_n^{2SLS} = (X'_n P'_n P_n X_n)^{-1} X'_n P_n Y_n$

and so this is like regressing $Y_n$ on $P_n X_n$ where we note that $P_n X_n = Z_n ( Z'_n Z_n)^{-1} Z'_n X_n$ , is the predicted value from the regression of $X_n$ on $Z_n$ .