We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.
翻译:我们研究存在未观测混杂因素的离线上下文强盗策略评估问题。敏感性分析方法通常用于在给定不确定性集的最坏混杂情形下估计策略价值。然而,现有工作常因可处理性而对不确定性集进行粗略松弛,导致策略价值估计过度保守。本文提出一种能提供策略价值严格下界的一般估计量。理论表明,该估计量将Dorn与Guo(2022)近期提出的尖锐估计量作为特例,并实现了利用f-散度对经典边际敏感性模型的新颖推广。为构建该估计量,我们利用核方法对条件矩约束进行可处理逼近,而传统非尖锐估计量未能考虑该约束。理论分析中,我们给出了保证下界估计无指定误差的核选择条件,并进一步证明了策略评估与学习的一致性。在合成数据与真实数据的实验中,我们验证了所提方法的有效性。