We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the regularity (eigenvalue decay) of the underlying kernel.
翻译:我们研究连续治疗的效果函数估计问题,该函数将每种治疗值映射为群体平均结果。该场景的一个核心挑战是混杂因素:治疗分配通常依赖于协变量,产生选择偏差,导致对响应变量直接回归治疗的不可靠性。为解决此问题,我们提出一种两阶段核岭回归方法。在第一阶段,我们学习一个响应变量作为治疗和协变量函数的模型;在第二阶段,我们利用该模型构建伪结果以校正分布偏移,然后拟合第二个模型来估计治疗效果。尽管响应变量随治疗和协变量变化,但通过对协变量平均化得到的诱导效果函数通常更为简单,我们的估计器能够适应这一结构。此外,我们引入了一种完全基于数据驱动的模型选择程序,该程序可证明地适应未知的重叠程度以及底层核函数的正则性(特征值衰减)。