In policy learning, the goal is typically to optimize a primary performance metric, but other subsidiary metrics often also warrant attention. This paper presents two strategies for evaluating these subsidiary metrics under a policy that is optimal for the primary one. The first relies on a novel margin condition that facilitates Wald-type inference. Under this and other regularity conditions, we show that the one-step corrected estimator is efficient. Despite the utility of this margin condition, it places strong restrictions on how the subsidiary metric behaves for nearly optimal policies, which may not hold in practice. We therefore introduce alternative, two-stage strategies that do not require a margin condition. The first stage constructs a set of candidate policies and the second builds a uniform confidence interval over this set. We provide numerical simulations to evaluate the performance of these methods in different scenarios.
翻译:在策略学习中,通常以优化主要性能指标为目标,但其他附属指标也常需关注。本文提出了两种在主要指标最优策略下评估附属指标的策略。第一种方法基于一种新的边际条件,可支持沃尔德型推断。在该条件及其他正则性条件下,我们证明了一步校正估计量是有效的。尽管该边际条件具有实用性,但它对接近最优策略下附属指标的行为设置了严格限制,实践中可能不成立。为此,我们引入了无需边际条件的替代性两阶段策略:第一阶段构建候选策略集,第二阶段对该集合构建统一置信区间。我们通过数值模拟评估了这些方法在不同场景下的表现。