The Fundamental Limits of Structure-Agnostic Functional Estimation

Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step (first-order) debiasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. These first-order corrections improve on plugin estimators in a black-box fashion, and consequently are often used in conjunction with powerful off-the-shelf estimation methods. These first-order methods are however provably suboptimal in a minimax sense for functional estimation when the nuisance functions live in Holder-type function spaces. This suboptimality of first-order debiasing has motivated the development of "higher-order" debiasing methods. The resulting estimators are, in some cases, provably optimal over Holder-type spaces, but both the estimators which are minimax-optimal and their analyses are crucially tied to properties of the underlying function space. In this paper we investigate the fundamental limits of structure-agnostic functional estimation, where relatively weak conditions are placed on the underlying nuisance functions. We show that there is a strong sense in which existing first-order methods are optimal. We achieve this goal by providing a formalization of the problem of functional estimation with black-box nuisance function estimates, and deriving minimax lower bounds for this problem. Our results highlight some clear tradeoffs in functional estimation -- if we wish to remain agnostic to the underlying nuisance function spaces, impose only high-level rate conditions, and maintain compatibility with black-box nuisance estimators then first-order methods are optimal. When we have an understanding of the structure of the underlying nuisance functions then carefully constructed higher-order estimators can outperform first-order estimators.

翻译：因果推断以及更一般的函数估计问题中的许多最新进展，都源于一个事实：经典的一步（一阶）去偏方法，或其更近期的样本分裂双机器学习变体，能够在令人惊讶的弱条件下优于插件估计器。这些一阶校正以黑箱方式改进了插件估计器，因此常与强大的现成估计方法结合使用。然而，当干扰函数位于霍尔德型函数空间时，这些一阶方法在极小化极大意义上已被证明是次优的。一阶去偏的这种次优性推动了“高阶”去偏方法的发展。由此产生的估计器在某些情况下在霍尔德型空间上被证明是最优的，但无论是极小化极大最优的估计器还是其分析，都关键依赖于底层函数空间的性质。在本文中，我们研究了结构无关函数估计的基本极限，即对底层干扰函数施加相对较弱的条件。我们证明，在强意义下，现有的一阶方法是最优的。我们通过形式化带有黑箱干扰函数估计的函数估计问题，并推导该问题的极小化极大下界来实现这一目标。我们的结果突显了函数估计中一些明确的权衡——如果我们希望保持对底层干扰函数空间的无知，仅施加高阶率条件，并保持与黑箱干扰估计器的兼容性，那么一阶方法是最优的。当我们理解底层干扰函数的结构时，精心构建的高阶估计器可以优于一阶估计器。