There has been a surge of recent interest in automatically learning policies to target treatment decisions based on rich individual covariates. In addition, practitioners want confidence that the learned policy has better performance than the incumbent policy according to downstream policy evaluation. However, due to the winner's curse -- an issue where the policy optimization procedure exploits prediction errors rather than finding actual improvements -- predicted performance improvements are often not substantiated by downstream policy evaluation. To address this challenge, we propose a novel strategy called inference-aware policy optimization, which modifies policy optimization to account for how the policy will be evaluated downstream. Specifically, it optimizes not only for the estimated objective value, but also for the chances that the estimate of the policy's improvement passes a significance test during downstream policy evaluation. We mathematically characterize the Pareto frontier of policies according to the tradeoff of these two goals. Based on our characterization, we design a policy optimization algorithm that estimates the Pareto frontier using machine learning models; then, the decision-maker can select the policy that optimizes their desired tradeoff, after which policy evaluation can be performed on the test set as usual. Finally, we perform simulations to illustrate the effectiveness of our methodology.
翻译:近年来,基于丰富个体协变量自动学习治疗决策策略的研究兴趣激增。此外,实践者希望确信所学策略在下游策略评估中表现优于现行策略。然而,由于赢家诅咒——即策略优化过程利用预测误差而非发现实际改进的问题——预测的性能改进往往无法通过下游策略评估得到证实。为应对这一挑战,我们提出一种称为推理感知策略优化的新策略,该方法通过修改策略优化过程来考量策略在下游的评估方式。具体而言,它不仅优化估计的目标值,同时优化策略改进估计值在下游策略评估中通过显著性检验的概率。我们通过数学方法刻画了这两个目标权衡下策略的帕累托前沿。基于此特征描述,我们设计了一种利用机器学习模型估计帕累托前沿的策略优化算法;随后决策者可根据期望的权衡选择最优策略,之后即可照常在测试集上进行策略评估。最后,我们通过仿真实验验证了所提方法的有效性。