# Diff-in-Diff notebook

In this section we are going to look at the diff-in-diff estimator. We consider two periods and two groups where one group recieves a treatment $D$ in the second period.

We consider an outcome variable of interest $Y_{it}$ with 2 periods. We augment this with value of the outcome when treated and when not treated $Y_{it}(1)$ and $Y_{it}(0)$. This effectively formalizes a causal question as a statistical properties. Causal questions become missing data questions. This allows to think more clearly about identification.

A treatment realization $D_{it}$ selects whether $Y_{it}$ is $Y_{it}(1)$ or $Y_{it}(0)$:

Y_it = D_{it}Y_{it}(1) + (1- D_{it}) Y_{it}(0)

In non binary settings were treatment takes on many values, we could write $Y_{it} = Y_{it}(D_it)$.

A causal effect would then be the effect of chaning $D_it$ from it's realized value to a different value. The change in the outcome can then characterized from the joint distribution of $Y_{it}(1), Y_{it}(0)$.

TODO: add defintion for each acronym

• Average treatment effect, ATE $E[ Y_2(1) - Y_2(0)]$
• Average Treatment on the treated, ATT $E[ Y_2(1) - Y_2(0) | D = 1]$
• Average Treatment on non treated, ATNT or ATC $E[ Y_2(1) - Y_2(0) | D = 0]$

We also define the following quantitities that are function of observables:

• Difference before after for treated, DBAT $E[ Y_2 - Y_1 | D = 1]$
• Difference before after for non treated, DBAC $E[ Y_2 - Y_1 | D = 0]$
• Difference treatment/control, DTC $E[ Y_2 | D = 1] - E[ Y_2 | D = 0]$
• Difference in difference: DBAT - DBAC
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
from ipywidgets import interactive
import matplotlib.pyplot as plt

N = 5000
mean = [0, 0.25, 1.0]
rho1 = 0
rho2 = 0
rho3 = 0
cov = [[1, rho1, rho2], [rho1,1,rho3],[rho2,rho3,1] ]  # diagonal covariance
Y0_1, Y0_2, Y1_2 = np.random.multivariate_normal(mean, cov, N).T

df = pd.DataFrame({"Y0_1":Y0_1,"Y0_2":Y0_2, "Y1_2":Y1_2})

sns.pairplot(df, kind="scatter")
plt.show() # we are interested in the full distribution of (Y1,Y0) but of course the mean difference is already interesting. We refer to
# at the ATE

plt.hist(df['Y1_2']-df['Y0_2'])
plt.axvline( (df['Y1_2']-df['Y0_2']).mean() , color="red", label="ATE")
plt.legend()
plt.show() ## A/B testing - Randomized treatment

We assume that we randomly assigned the treatment to individuals. In this case we have

E[Y_{i2}(1) | D=0] = E[Y_{i2}(1) | D=1]

as well as

E[Y_{i2}(0) | D=0] = E[Y_{i2}(0) | D=1]
D = np.random.uniform(size=N)>0.4
df['D'] = D

sns.pairplot(df, kind="scatter", hue="D", markers=["o", "s"], palette="Set2")
plt.show() plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red",label="T")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue",label="C")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")
plt.legend()
plt.show() Note that in this case we can read off the ATT by differencing the control and treatment. A before after comparaison wrongly assign a lot of the effect to the time trend!

def compute_TE(df):
res1 = {}
res1['ATE'] = df.eval('Y1_2 - Y0_2').mean()
res1['ATT'] = df.query('D==1').eval('Y1_2 - Y0_2').mean()
res1['ATC'] = df.query('D==0').eval('Y1_2 - Y0_2').mean()

res2 = {}
res2['DBAT'] = df.query('D==1').eval('Y1_2 - Y0_1').mean()
res2['DBAC'] = df.query('D==0').eval('Y0_2 - Y0_1').mean()
res2['DTC']  = df.query('D==1').eval('Y1_2').mean() - df.query('D==0').eval('Y0_2').mean()
res2['DiD']  = res2['DBAT'] - res2['DBAC']

return(res1,res2)

res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())

plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show() ## Before - After

Consider instead a case where the treatment is not assigned randomly. For instance imagine that there is a fixed cost to taking the treatment and that each agent chooses individualy whether to take the treatment.

We then

• introduce a selection rule into treatment $D_i = 1[ Y_2(1) - Y_1(0) > c ]$
• we impose $Y_1(0)=Y_2(0)$ (not trend and perfect correlation)

We remember that for the before-after approach to be valid, we need that

$E[ Y_2(0) | D {=} 1] = E[ Y_1(0) | D {=} 1]$

By having perfect correlation between $Y_1(0)$ and $Y_2(0)$ we guarantee such outcome. In other words here, the period 1 outcome is a perfect measurement of the non-treated outcome.

mean = [0, 0, 1.0]  # no trend effect
rho1 = 1; rho2 = 0; rho3 = 0 # perfect correlation between Y0_1 and Y0_2
cov = [[1, rho1, rho2], [rho1,1,rho3],[rho2,rho3,1] ]  # diagonal covariance
Y0_1, Y0_2, Y1_2 = np.random.multivariate_normal(mean, cov, N).T
df = pd.DataFrame({"Y0_1":Y0_1,"Y0_2":Y0_2, "Y1_2":Y1_2})

# Note the change in the decision rule here
df['D'] = df['Y1_2'] - df['Y0_2'] - 0.3 > 0

sns.pairplot(df, kind="scatter", hue="D", markers=["o", "s"], palette="Set2")
plt.show() plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red",label="T")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue",label="C")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")
plt.legend()
plt.show() Here we see that we can look at the before/after to check the ATT and the AT. Again, because we have constructed the non treated outcome to be the same in period 1 and 2, hence the before/after gives us the treatment effect.

res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())
plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show() ## Difference in Difference

We know add: - trend - covariance between all outcomes

We however maintain an important assumption which is the common trend assumption. This happens by imposing that the gain is orthogonal the time effect of the non treated outcome.

We assume a common trend assumption:

E\left(Y_{i 2}^{0}-Y_{i 1}^{0} \mid D_{i 2}=1\right)=E\left(Y_{i 2}^{0}-Y_{i 1}^{0} \mid D_{i 2}=0\right)

$\Rightarrow$ no selection on the change in non-treatment outcome level

Common trend assumption allows for: selection on non-treatment levels: $$E\left(Y_{i t}^{0} \mid D_{i 2}=1\right) \neq E\left(Y_{i t}^{0} \mid D_{i 2}=0\right), t=1,2$$ selection on gains: $$E\left(Y_{i 2}^{1}-Y_{i 2}^{0} \mid D_{i 2}=1\right) \neq E\left(Y_{i 2}^{1}-Y_{i 2}^{0} \mid D_{i 2}=0\right)$$

mean = [0, 0.25, 1.0]
rho1 = 0.7 # between Y0_1 and Y0_2
rho2 = 0.3 # between Y0_1 and Y1_2
rho3 = 1+rho2 - rho1 # between Y1_2 and Y0_2 - this is what makes the DiD work
cov = [[1, rho1, rho2], [rho1,1,rho3],[rho2,rho3,1] ]  # diagonal covariance
Y0_1, Y0_2, Y1_2 = np.random.multivariate_normal(mean, cov, N).T
df = pd.DataFrame({"Y0_1":Y0_1,"Y0_2":Y0_2, "Y1_2":Y1_2})

df['D'] = df['Y1_2'] - df['Y0_2'] - 0.3 > 0
sns.pairplot(df, kind="scatter", hue="D", markers=["o", "s"], palette="Set2")
plt.show() plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")

plt.plot( [1.2,1.2], [ df.query("D==1").eval('Y0_2').mean(), df.query("D==1").eval('Y1_2').mean()] ,'-o', label="ATT")
plt.plot( [1.3,1.3], [ df.query("D==1").eval('Y0_1').mean(), df.query("D==1").eval('Y1_2').mean()] ,'-o', label="DBAT")
plt.legend()
plt.show() res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())
plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show() ## Effect of non-linear change

We make a simple transformation on the outcome variable: $\exp (Y)$.

for v in ['Y0_1','Y1_2','Y0_2']:
df[v] = np.exp(df[v])

plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")


[<matplotlib.lines.Line2D at 0x7fb6d070e6a0>] res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())
plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show() ## Difference in Difference in Difference

# we generate a policy change in one state for one age group
# we end up with 8 observed values

# Michigan disability increase for high earners
# looking at time on disability

# Michigan High earners
Y_H_S1_T1 = 1
Y_H_S1_T2 = 2.3

# Michigan low earners
Y_L_S1_T1 = 0
Y_L_S1_T2 = 0.5

# Wisconsin High earners
Y_H_S2_T1 = 1
Y_H_S2_T2 = 1.6

# Wisconsin low earners
Y_L_S2_T1 = 0.2
Y_L_S2_T2 = 0.5

plt.subplot(1,2,1)
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T2], "-o", label="MI-H*")
plt.plot([1,2],[Y_L_S1_T1,Y_L_S1_T2], "-o", label="MI-L")
plt.title("MI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.subplot(1,2,2)
plt.plot([1,2],[Y_H_S2_T1,Y_H_S2_T2], "-o", label="WI-H")
plt.plot([1,2],[Y_L_S2_T1,Y_L_S2_T2], "-o", label="WI-L")
plt.title("WI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.show() plt.subplot(1,2,1)
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T2], "-o", label="MI-H*")
plt.plot([1,2],[Y_L_S1_T1,Y_L_S1_T2], "-o", label="MI-L")
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T1 + Y_L_S1_T2 - Y_L_S1_T1], "-o", label="DiD")
plt.title("MI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.subplot(1,2,2)
plt.plot([1,2],[Y_H_S2_T1,Y_H_S2_T2], "-o", label="WI-H")
plt.plot([1,2],[Y_L_S2_T1,Y_L_S2_T2], "-o", label="WI-L")
plt.plot([1,2],[Y_H_S2_T1, Y_H_S2_T1 + Y_L_S2_T2 - Y_L_S2_T1], "-o", label="DiD")

plt.title("WI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.show() We see how that common trend is refjected on in WI. We then use WI to allow for a difference betwen H and L in trends. We take it out of MI.

DiD_WI = Y_H_S2_T1 - Y_H_S2_T1 + Y_L_S2_T2 - Y_L_S2_T1
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T2], "-o", label="MI-H*")
plt.plot([1,2],[Y_L_S1_T1,Y_L_S1_T2], "-o", label="MI-L")
plt.plot([1,2],[Y_H_S1_T1, Y_H_S1_T1 + Y_L_S1_T2 - Y_L_S1_T1], "-o", label="DiD")
plt.plot([1,2],[Y_H_S1_T1, Y_H_S1_T1 + Y_L_S1_T2 - Y_L_S1_T1 + DiD_WI ], "-o", label="DiDiD")
plt.title("MI")
plt.ylim(-0.5,2.5)
plt.legend()
plt.show() 