Advantages and limits of double machine learning in economics#

  • What is the practical usage of double machine learning in economics and other applied quantitative sciences (environment, healthcare, …)? Is it often used? Are the best practices well established?

  • Is it really necessary to use double machine learning to robustly answer practical policy questions?

Seed literature references#

A quick search on google and with chatGPT gave me the two following references:

Fuhr et al., 2024#

They did the interesting literature review that I was first interested in. They selected in march 2023 all papers citing the seminal DML paper (Chernozhukov et al. (2018)) and using this method on real-world data. They found 46 applications oif which they analyze 36 containing sufficient information. Most of the papers (20 refs) are in economics / econometrics literature, 5 in healthcare and the remaining are scattered across sociology, political science, geoscience and sport research. They use more often lasso, forest then boosting without assessing different predictive models. Most of the papers are not in the high-dimensional setting (#variables/#samples <=0.1 for 25 out of 36 references). Finally, there is a lack of train/estimate sample splitting for robust effect estimation.

Then the authors benchmark variations of DML estimators (RF, NN, boosting, OLS) with plain OLS and a naive XGboost (with a strange estimation strategy different from a simple S learner) on 11 different simulation settings (extended simulation section) respecting the partial linear model parametrization: $Y = \beta A + g(X)$. Studying different functional form (and interaction of order 2), underlies the superior performances of DML methods. Unfortunately, they remove from all simulation results but one the OLS (DML) estimator by saying that it is the same as the OLS estimator. I am not sure that this is the case. Investigation of IPW is also missing.

They study the effect of residual confounding, collider, treatment-only or outcome-only covariates and sample-sizes. All of these results recover known results with superior (and close between each DML variation) performances for all DML methods compared to OLS.

They also compare different strength of associations, number of fold for cross-fitting but results are less clear and IMHO less interesting.

The figure 17 shows the relationship between small mean squared error of the outcome and treatment models and small estimation bias for the average treatment effect.