We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand. The goal is to perform off-policy evaluation for a new personalized pricing policy that maps features to prices. Methods based on inverse propensity weighting (including doubly robust methods) for off-policy evaluation may perform poorly when the logging policy has little exploration or is deterministic, which is common in pricing applications. Building on the balanced policy evaluation framework of Kallus (2018), we propose a new approach tailored to pricing applications. The key idea is to compute an estimate that minimizes the worst-case mean squared error or maximizes a worst-case lower bound on policy performance, where in both cases the worst-case is taken with respect to a set of possible revenue functions. We establish theoretical convergence guarantees and empirically demonstrate the advantage of our approach using a real-world pricing dataset.
翻译:我们考虑一个个性化定价问题,其中数据包含特征信息、历史定价决策以及二元实现需求。目标是对一个新的个性化定价策略(将特征映射至价格)进行离策略评估。基于逆倾向加权的离策略评估方法(包括双稳健方法)在日志策略探索不足或具有确定性时可能表现不佳,而这种情况在定价应用中较为常见。基于Kallus(2018)提出的平衡策略评估框架,我们提出了一种专门针对定价应用的新方法。核心思想是计算一个估计量,该估计量可最小化最坏情况下的均方误差,或最大化策略性能的最坏情况下限,其中在这两种情况下,最坏情况均针对一组可能的收益函数而定义。我们建立了理论收敛保证,并利用真实世界定价数据集实证证明了我们方法的优势。