Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling from the counterfactual distribution. Numerical simulations illustrate the practical benefits of CPME over existing methods.
翻译:估计反事实策略下的结果分布对于推荐、广告和医疗等领域的决策至关重要。我们提出并分析了一种新颖框架——反事实策略均值嵌入(CPME),该框架在再生核希尔伯特空间(RKHS)中表示完整的反事实结果分布,实现了灵活且非参数化的分布式离策略评估。我们引入了插件估计器和双重稳健估计器;后者通过校正结果嵌入和倾向得分模型的偏差,获得了更优的收敛速率。在此基础上,我们开发了一种用于假设检验的双重稳健核检验统计量,该统计量具有渐近正态性,从而支持计算高效的检验及置信区间的直接构建。我们的框架还支持从反事实分布中进行采样。数值模拟展示了CPME相较于现有方法的实际优势。