Skip to content


The goal of the class is to learn how to apply microeconomic concepts to large and complex datasets. We will first revisit notions such as identification, inference and latent heterogeneity in classical contexts. We will then study potential concerns in the presence of a large number of parameters in order to understand over-fitting. Throughout the class, emphasis will be put on project-driven computational exercises involving large datasets. We will learn how to efficiently process and visualize such data using state of the art tools in python. Topics will include fitting models using Tensor-Flow and neural nets, creating event studies using pandas, solving large-scale SVDs, etc.

This website together with the slack group will be the primary source of content.


I have been building the content of the course from scracth. The syllabus is not set in stone and will very likely change during the course of the term. Please bear with me!


The lectures will be held in person. Class notes and related notebooks will be posted on this website. There will be a strong emphasis on take-home work in the form of assignments to run on the computer. There will be a mix between short psets (should be done individually) and long homeworks (done in teams of two).


We will have a midterm and a final we will use the long homework and the short tasks to form the overall grade. The weight will be 4/10 homework, 3/10 short midterm and 3/10 final.

List of topics:

  • Linear model as a review notes
    • review: population, estimand, identification, estimation
    • finite sample versus asymptotic
    • homeworks:
  • Other conditional means models
    • IV, 2SLS,
    • Non-linear Least squares
    • Neural networks, activation function, stoachstic gradient descent
    • homeworks and labs:
  • Distributional models
    • Maximum likelihood
    • Kullback-Leibler divergence, Cramer-Rao bound
    • Neural Nets to model distributions
    • homeworks and labs:
  • Topics on inference
    • dependence in error and clustering standard errors
    • bootstrap lab
    • multiple testing
    • weak IV
    • homeworks: Inference pset
  • Treatment evaluation
    • potential outcome notations
    • Diff in Diff, pre-trends and examples
    • Event studies, assumptions and examples
    • synthetic controls
    • heterogeneous causal effects
    • homeworks: yelp pset
  • Network problems
    • incidental parameter problem, overfitting
    • bias correction
    • large scale PCA, clustering
    • network formation
    • homeworks: Effect of classroom pset

Tools we will learn about:

  • create deployable environments, maintain code
  • Working with databases
    • mainly pandas, but possibly we will expand to modin, dask and pyspark
    • megre, groupby and aggregate, method chaining
  • Plotting and reporting
  • Creating workflows
  • Automatic differentiation and Stochastic EM
  • Computing in the cloud

List of slides