Series or orthogonal basis regression is one of the most popular non-parametric regression techniques in practice, obtained by regressing the response on features generated by evaluating the basis functions at observed covariate values. The most routinely used series estimator is based on ordinary least squares fitting, which is known to be minimax rate optimal in various settings, albeit under stringent restrictions on the basis functions and the distribution of covariates. In this work, inspired by the recently developed Forster-Warmuth (FW) learner, we propose an alternative series regression estimator that can attain the minimax estimation rate under strictly weaker conditions imposed on the basis functions and the joint law of covariates, than existing series estimators in the literature. Moreover, a key contribution of this work generalizes the FW-learner to a so-called counterfactual regression problem, in which the response variable of interest may not be directly observed (hence, the name ``counterfactual'') on all sampled units, and therefore needs to be inferred in order to identify and estimate the regression in view from the observed data. Although counterfactual regression is not entirely a new area of inquiry, we propose the first-ever systematic study of this challenging problem from a unified pseudo-outcome perspective. In fact, we provide what appears to be the first generic and constructive approach for generating the pseudo-outcome (to substitute for the unobserved response) which leads to the estimation of the counterfactual regression curve of interest with small bias, namely bias of second order. Several applications are used to illustrate the resulting FW-learner including many nonparametric regression problems in missing data and causal inference literature, for which we establish high-level conditions for minimax rate optimality of the proposed FW-learner.
翻译:级数或正交基回归是实践中最为流行的非参数回归技术之一,它通过将响应变量对由基函数在观测协变量值处求值生成的协变量特征进行回归来实现。最常用的级数估计量基于普通最小二乘拟合,尽管对基函数和协变量分布施加了严格限制,但在各种设定下其已知能达到极小极大最优速率。本研究受近期发展的福斯特-沃穆特(FW)学习器启发,提出了一种替代性级数回归估计量。该估计量在比现有文献中级数估计量更弱的基函数与协变量联合分布条件下,即可达到极小极大估计速率。此外,本研究的核心贡献在于将FW学习器推广至所谓反事实回归问题——其中感兴趣的响应变量可能无法在所有抽样单元上被直接观测(故称“反事实”),因此必须从观测数据中推断该变量,以识别并估计目标回归函数。尽管反事实回归并非全新研究领域,但我们首次从统一伪结局视角对该挑战性问题进行了系统性研究。具体而言,我们提出了首个通用的、具有建设性的伪结局生成方法(用于替代未观测的响应变量),从而能以二阶小偏误估计感兴趣的反事实回归曲线。通过多个应用实例(包括缺失数据与因果推断文献中的众多非参数回归问题)展示所提出FW学习器的性能,并建立了实现该学习器极小极大最优速率的高阶条件。