We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.
翻译:我们研究了存在未观测混杂因素的离线上下文强盗策略评估问题。敏感性分析方法通常用于在给定不确定性集下估计最坏情况混杂时的策略值。然而,现有工作往往为了可处理性而对不确定性集进行粗略松弛,导致策略值的估计过于保守。本文提出了一种能够给出策略值严格下界的通用估计量。理论分析表明,该估计量包含Dorn和Guo(2022)近期提出的锐化估计量作为特例,并且我们的方法利用f-散度实现了经典边际敏感性模型的新颖扩展。为构造该估计量,我们借助核方法获得条件矩约束的可处理近似,而传统非锐化估计量未能考虑这一约束。在理论分析中,我们给出了核函数选择的条件,该条件确保不会引入偏差下界估计的设定误差。此外,我们提供了策略评估与学习的一致性保证。在合成数据与真实数据实验中,我们验证了所提方法的有效性。