Towards optimal doubly robust estimation of heterogeneous causal effects

Heterogeneous effect estimation plays a crucial role in causal inference, with applications across medicine and social science. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there are important theoretical gaps in understanding if and when such methods are optimal. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). Our work contributes in several main ways. First, we study a two-stage doubly robust CATE estimator and give a generic model-free error bound, which, despite its generality, yields sharper results than those in the current literature. We apply the bound to derive error rates in nonparametric models with smoothness or sparsity, and give sufficient conditions for oracle efficiency. Underlying our error bound is a general oracle inequality for regression with estimated or imputed outcomes, which is of independent interest; this is the second main contribution. The third contribution is aimed at understanding the fundamental statistical limits of CATE estimation. To that end, we propose and study a local polynomial adaptation of double-residual regression. We show that this estimator can be oracle efficient under even weaker conditions, if used with a specialized form of sample splitting and careful choices of tuning parameters. These are the weakest conditions currently found in the literature, and we conjecture that they are minimal in a minimax sense. We go on to give error bounds in the non-trivial regime where oracle rates cannot be achieved. Some finite-sample properties are explored with simulations.

翻译：异质性效应估计在因果推断中扮演关键角色，广泛应用于医学和社会科学领域。近年来，虽有许多条件平均处理效应（CATE）估计方法被提出，但在理解这些方法是否及何时达到最优方面仍存在重要理论空白，尤其当CATE具有非平凡结构（如光滑性或稀疏性）时。本文的主要贡献包括：首先，我们研究了一种两阶段双重稳健CATE估计量，并给出通用无模型误差界，其虽具普适性却能比现有文献得出更精确的结果。我们利用该界推导了光滑或稀疏非参数模型中的误差率，并给出了达到神谕效率的充分条件。支撑该误差界的是一个针对估计或填补结果变量回归的通用神谕不等式（此为独立贡献），构成第二项主要贡献。第三项贡献旨在理解CATE估计的根本统计极限：为此，我们提出并研究了一种双重残差回归的局部多项式自适应方法，证明该估计器在更弱条件下（结合特殊样本划分和参数精细选择）仍能达到神谕效率。这些条件是当前文献中最弱的，我们猜想其在极小化极大意义上具有最小性。随后，我们给出了神谕率无法达到的非平凡机制下的误差界，并通过仿真探索了有限样本性质。