On the usage of bootstrap for causal inference estimators#

Interrogation on the variance of the inverse probability-of-treatment weighting estimator#

What is the variance of the IPTW estimator when taking into account the estimation step ?

Papers: Kostouraki et al. (2024). On variance estimation of the inverse probability-of-treatment weighting estimator: A tutorial for different types of propensity score weights. Statistics in Medicine, Li, T., & Lawson, J. (2024). A generalized bootstrap procedure of the standard error and confidence interval estimation for inverse probability of treatment weighting. Multivariate Behavioral Research, 59(2), 251-265.

Motivation#

Taking into account the variance of the IPTW estimator is not trivial since some variance is induced by the estimation of the propensity score (treatment probabilities), $e(x) = P(W=1|X)$. Most theoretical results takes the propensity scores as given, then compute the variance of the oracle IPW estimator with oracle propensity score.For example, Wager, 2024, Th. 2.2 derive the oracle IPW variance as:

$ Var(\hat{\tau}^{\star}{IPW}) = Var[\tau(X_i)] + \mathbb{E} \left[ \frac{\left( \mu{(0)}(X_i) + (1 - e(X_i)) \tau(X_i) \right)^2}{e(X_i)(1 - e(X_i))} \right] + \mathbb{E} \left[ \frac{\sigma^2_{(1)}(X_i)}{e(X_i)} + \frac{\sigma^2_{(0)}(X_i)}{1 - e(X_i)} \right]. $

The sole source of variance assumed is due to data sampling through the variance. How to take into account the variance of the propensity score estimation ?

Possible answers#

First, 😍 Kostouraki et al. (2024). state Non-parametric bootstrap can be used to obtain valid SEs, but the bootstrap may be computationally intensive for large databases. So, the authors are concerned with analytical approaches in which I am a bit less interested in due to the induced complexity.

Secondly, Eike Petersen, reading my tutorial paper on causal estimaton Step-by-step causal analysis of EHRs to ground decision-making was also concerned with this question. He pointed me to Li and Lawson (2024) which argue that naive bootstrap for IPW underestimate the variance. The authors introduce a generalized bootstrap method. Their argument is that every sample in treatment or control group have not identical probability to be included in its own group. On this point, I follow. They then argue that for a given group (eg. treatment), each sample has the same chance to be resampled ($1/n_t$) which does reflect the assignment mechanism. Here I don’t follow anymore, since I have the impression that given the group (eg. treatment), we should sample with equal probability each individual to yield the original dataset distribution.

When reading their simulation study, I have difficulty to understand how to derive a true standard error of the homogeneous causal effect $Beta_w$ in their specification. Importantly they do not reestimate the ps scores for each bootstrap repetition, thus underestimating the variance with ordinary bootstrap.

And for g-estimation ?#

Note for later: I still wonder if bootstrap is valid for g-estimation ? The marginal effect blog let me think that it is the case. It compares the delta method to an example of Pearl using bootstrap and conclude that the results are close but not identical. However, there is no reference for a proof. A good mathematical exploration of this question in the randomized case is given by Imbens and Menzel. 2018. Causal boostrap. It concludes that the bootstrap is conservative.