Skip to content

Diff-in-Diff notebook

In this section we are going to look at the diff-in-diff estimator. We consider two periods and two groups where one group recieves a treatment D in the second period.

We consider an outcome variable of interest Y_{it} with 2 periods. We augment this with value of the outcome when treated and when not treated Y_{it}(1) and Y_{it}(0). This effectively formalizes a causal question as a statistical properties. Causal questions become missing data questions. This allows to think more clearly about identification.

A treatment realization D_{it} selects whether Y_{it} is Y_{it}(1) or Y_{it}(0):

Y_it = D_{it}Y_{it}(1) + (1- D_{it}) Y_{it}(0)

In non binary settings were treatment takes on many values, we could write Y_{it} = Y_{it}(D_it).

A causal effect would then be the effect of chaning D_it from it's realized value to a different value. The change in the outcome can then characterized from the joint distribution of Y_{it}(1), Y_{it}(0).

TODO: add defintion for each acronym

  • Average treatment effect, ATE E[ Y_2(1) - Y_2(0)]
  • Average Treatment on the treated, ATT E[ Y_2(1) - Y_2(0) | D = 1]
  • Average Treatment on non treated, ATNT or ATC E[ Y_2(1) - Y_2(0) | D = 0]

We also define the following quantitities that are function of observables:

  • Difference before after for treated, DBAT E[ Y_2 - Y_1 | D = 1]
  • Difference before after for non treated, DBAC E[ Y_2 - Y_1 | D = 0]
  • Difference treatment/control, DTC E[ Y_2 | D = 1] - E[ Y_2 | D = 0]
  • Difference in difference: DBAT - DBAC
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
from ipywidgets import interactive
import matplotlib.pyplot as plt
N = 5000
mean = [0, 0.25, 1.0]
rho1 = 0
rho2 = 0
rho3 = 0
cov = [[1, rho1, rho2], [rho1,1,rho3],[rho2,rho3,1] ]  # diagonal covariance
Y0_1, Y0_2, Y1_2 = np.random.multivariate_normal(mean, cov, N).T

df = pd.DataFrame({"Y0_1":Y0_1,"Y0_2":Y0_2, "Y1_2":Y1_2})

sns.pairplot(df, kind="scatter")
plt.show()

png

# we are interested in the full distribution of (Y1,Y0) but of course the mean difference is already interesting. We refer to 
# at the ATE

plt.hist(df['Y1_2']-df['Y0_2'])
plt.axvline( (df['Y1_2']-df['Y0_2']).mean() , color="red", label="ATE")
plt.legend()
plt.show()

png

A/B testing - Randomized treatment

We assume that we randomly assigned the treatment to individuals. In this case we have

E[Y_{i2}(1) | D=0] = E[Y_{i2}(1) | D=1]

as well as

E[Y_{i2}(0) | D=0] = E[Y_{i2}(0) | D=1]
D = np.random.uniform(size=N)>0.4
df['D'] = D

sns.pairplot(df, kind="scatter", hue="D", markers=["o", "s"], palette="Set2")
plt.show()

png

plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red",label="T")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue",label="C")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")
plt.legend()
plt.show()

png

Note that in this case we can read off the ATT by differencing the control and treatment. A before after comparaison wrongly assign a lot of the effect to the time trend!

def compute_TE(df):
    res1 = {}
    res1['ATE'] = df.eval('Y1_2 - Y0_2').mean()
    res1['ATT'] = df.query('D==1').eval('Y1_2 - Y0_2').mean()
    res1['ATC'] = df.query('D==0').eval('Y1_2 - Y0_2').mean()

    res2 = {}
    res2['DBAT'] = df.query('D==1').eval('Y1_2 - Y0_1').mean()
    res2['DBAC'] = df.query('D==0').eval('Y0_2 - Y0_1').mean()
    res2['DTC']  = df.query('D==1').eval('Y1_2').mean() - df.query('D==0').eval('Y0_2').mean()
    res2['DiD']  = res2['DBAT'] - res2['DBAC']

    return(res1,res2)

res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())

plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show()

png

Before - After

Consider instead a case where the treatment is not assigned randomly. For instance imagine that there is a fixed cost to taking the treatment and that each agent chooses individualy whether to take the treatment.

We then

  • introduce a selection rule into treatment D_i = 1[ Y_2(1) - Y_1(0) > c ]
  • we impose Y_1(0)=Y_2(0) (not trend and perfect correlation)

We remember that for the before-after approach to be valid, we need that

E[ Y_2(0) | D {=} 1] = E[ Y_1(0) | D {=} 1]

By having perfect correlation between Y_1(0) and Y_2(0) we guarantee such outcome. In other words here, the period 1 outcome is a perfect measurement of the non-treated outcome.

mean = [0, 0, 1.0]  # no trend effect
rho1 = 1; rho2 = 0; rho3 = 0 # perfect correlation between Y0_1 and Y0_2
cov = [[1, rho1, rho2], [rho1,1,rho3],[rho2,rho3,1] ]  # diagonal covariance
Y0_1, Y0_2, Y1_2 = np.random.multivariate_normal(mean, cov, N).T
df = pd.DataFrame({"Y0_1":Y0_1,"Y0_2":Y0_2, "Y1_2":Y1_2})

# Note the change in the decision rule here
df['D'] = df['Y1_2'] - df['Y0_2'] - 0.3 > 0 

sns.pairplot(df, kind="scatter", hue="D", markers=["o", "s"], palette="Set2")
plt.show()

png

plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red",label="T")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue",label="C")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")
plt.legend()
plt.show()

png

Here we see that we can look at the before/after to check the ATT and the AT. Again, because we have constructed the non treated outcome to be the same in period 1 and 2, hence the before/after gives us the treatment effect.

res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())
plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show()

png

Difference in Difference

We know add: - trend - covariance between all outcomes

We however maintain an important assumption which is the common trend assumption. This happens by imposing that the gain is orthogonal the time effect of the non treated outcome.

We assume a common trend assumption:

E\left(Y_{i 2}^{0}-Y_{i 1}^{0} \mid D_{i 2}=1\right)=E\left(Y_{i 2}^{0}-Y_{i 1}^{0} \mid D_{i 2}=0\right)

\Rightarrow no selection on the change in non-treatment outcome level

Common trend assumption allows for: selection on non-treatment levels: $$ E\left(Y_{i t}^{0} \mid D_{i 2}=1\right) \neq E\left(Y_{i t}^{0} \mid D_{i 2}=0\right), t=1,2 $$ selection on gains: $$ E\left(Y_{i 2}^{1}-Y_{i 2}^{0} \mid D_{i 2}=1\right) \neq E\left(Y_{i 2}^{1}-Y_{i 2}^{0} \mid D_{i 2}=0\right) $$

mean = [0, 0.25, 1.0]
rho1 = 0.7 # between Y0_1 and Y0_2
rho2 = 0.3 # between Y0_1 and Y1_2
rho3 = 1+rho2 - rho1 # between Y1_2 and Y0_2 - this is what makes the DiD work
cov = [[1, rho1, rho2], [rho1,1,rho3],[rho2,rho3,1] ]  # diagonal covariance
Y0_1, Y0_2, Y1_2 = np.random.multivariate_normal(mean, cov, N).T
df = pd.DataFrame({"Y0_1":Y0_1,"Y0_2":Y0_2, "Y1_2":Y1_2})

df['D'] = df['Y1_2'] - df['Y0_2'] - 0.3 > 0 
sns.pairplot(df, kind="scatter", hue="D", markers=["o", "s"], palette="Set2")
plt.show()

png

plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")

plt.plot( [1.2,1.2], [ df.query("D==1").eval('Y0_2').mean(), df.query("D==1").eval('Y1_2').mean()] ,'-o', label="ATT")
plt.plot( [1.3,1.3], [ df.query("D==1").eval('Y0_1').mean(), df.query("D==1").eval('Y1_2').mean()] ,'-o', label="DBAT")
plt.legend()
plt.show()

png

res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())
plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show()

png

Effect of non-linear change

We make a simple transformation on the outcome variable: \exp (Y).

for v in ['Y0_1','Y1_2','Y0_2']:
    df[v] = np.exp(df[v])
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y0_2'].mean()] , '-o', markersize=10,color="red", linestyle="--", fillstyle="none")
plt.plot( [df.query("D==1")['Y0_1'].mean(),df.query("D==1")['Y1_2'].mean()] , '-P', markersize=10,color="red")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y0_2'].mean()] , '-o', markersize=10,color="blue")
plt.plot( [df.query("D==0")['Y0_1'].mean(),df.query("D==0")['Y1_2'].mean()] , '-P', markersize=10,color="blue", linestyle="--", fillstyle="none")

[<matplotlib.lines.Line2D at 0x7fb6d070e6a0>]

png

res0,res1 = compute_TE(df)
plt.bar(range(len(res1)), res1.values(), align='center', alpha=0.5)
plt.xticks(range(len(res1)), res1.keys())
plt.axhline(res0['ATE'],label='ATE',color="green")
plt.axhline(res0['ATT'],label='ATT',color="red")
plt.axhline(res0['ATC'],label='ATC',color="blue")
plt.legend()

plt.show()

png

Difference in Difference in Difference

# we generate a policy change in one state for one age group
# we end up with 8 observed values

# Michigan disability increase for high earners
# looking at time on disability

# Michigan High earners
Y_H_S1_T1 = 1
Y_H_S1_T2 = 2.3

# Michigan low earners
Y_L_S1_T1 = 0
Y_L_S1_T2 = 0.5

# Wisconsin High earners
Y_H_S2_T1 = 1
Y_H_S2_T2 = 1.6

# Wisconsin low earners
Y_L_S2_T1 = 0.2
Y_L_S2_T2 = 0.5

plt.subplot(1,2,1)
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T2], "-o", label="MI-H*")
plt.plot([1,2],[Y_L_S1_T1,Y_L_S1_T2], "-o", label="MI-L")
plt.title("MI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.subplot(1,2,2)
plt.plot([1,2],[Y_H_S2_T1,Y_H_S2_T2], "-o", label="WI-H")
plt.plot([1,2],[Y_L_S2_T1,Y_L_S2_T2], "-o", label="WI-L")
plt.title("WI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.show()

png

plt.subplot(1,2,1)
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T2], "-o", label="MI-H*")
plt.plot([1,2],[Y_L_S1_T1,Y_L_S1_T2], "-o", label="MI-L")
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T1 + Y_L_S1_T2 - Y_L_S1_T1], "-o", label="DiD")
plt.title("MI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.subplot(1,2,2)
plt.plot([1,2],[Y_H_S2_T1,Y_H_S2_T2], "-o", label="WI-H")
plt.plot([1,2],[Y_L_S2_T1,Y_L_S2_T2], "-o", label="WI-L")
plt.plot([1,2],[Y_H_S2_T1, Y_H_S2_T1 + Y_L_S2_T2 - Y_L_S2_T1], "-o", label="DiD")

plt.title("WI")
plt.ylim(-0.5,2.5)
plt.legend()

plt.show()

png

We see how that common trend is refjected on in WI. We then use WI to allow for a difference betwen H and L in trends. We take it out of MI.

DiD_WI = Y_H_S2_T1 - Y_H_S2_T1 + Y_L_S2_T2 - Y_L_S2_T1
plt.plot([1,2],[Y_H_S1_T1,Y_H_S1_T2], "-o", label="MI-H*")
plt.plot([1,2],[Y_L_S1_T1,Y_L_S1_T2], "-o", label="MI-L")
plt.plot([1,2],[Y_H_S1_T1, Y_H_S1_T1 + Y_L_S1_T2 - Y_L_S1_T1], "-o", label="DiD")
plt.plot([1,2],[Y_H_S1_T1, Y_H_S1_T1 + Y_L_S1_T2 - Y_L_S1_T1 + DiD_WI ], "-o", label="DiDiD")
plt.title("MI")
plt.ylim(-0.5,2.5)
plt.legend()
plt.show()

png