A core challenge in causal inference is how to extrapolate long term effects, of possibly continuous actions, from short term experimental data. It arises in artificial intelligence: the long term consequences of continuous actions may be of interest, yet only short term rewards may be collected in exploration. For this estimand, called the long term dose response curve, we propose a simple nonparametric estimator based on kernel ridge regression. By embedding the distribution of the short term experimental data with kernels, we derive interpretable weights for extrapolating long term effects. Our method allows actions, short term rewards, and long term rewards to be continuous in general spaces. It also allows for nonlinearity and heterogeneity in the link between short term effects and long term effects. We prove uniform consistency, with nonasymptotic error bounds reflecting the effective dimension of the data. As an application, we estimate the long term dose response curve of Project STAR, a social program which randomly assigned students to various class sizes. We extend our results to long term counterfactual distributions, proving weak convergence.
翻译:因果推断中的一个核心挑战是如何从短期实验数据中推断出可能连续行动的长程效应。这一问题在人工智能领域尤为突出:虽然连续行动的长程后果可能具有重要意义,但在探索过程中往往只能收集到短期奖励。针对这一被称为长程剂量反应曲线的估计量,我们提出了一种基于核岭回归的简单非参数估计方法。通过使用核函数嵌入短期实验数据的分布,我们推导出用于推断长程效应的可解释权重。我们的方法允许行动、短期奖励和长期奖励在一般空间中是连续的,同时允许短期效应与长期效应之间的关联存在非线性和异质性。我们证明了该估计量的一致收敛性,其非渐近误差界反映了数据的有效维度。作为应用实例,我们估计了STAR项目(一项将学生随机分配到不同班级规模的社会项目)的长程剂量反应曲线。我们将研究结果拓展至长程反事实分布,并证明了其弱收敛性。