This article addresses the question of reporting a lower confidence band (LCB) for optimal welfare in a policy learning problem. A straightforward procedure inverts a one-sided t-test based on an efficient estimator of the optimal welfare. We show that under empirically relevant data-generating processes, this procedure can be dominated by an LCB corresponding to suboptimal welfare, with the average difference of the order N-1/2. We relate the first-order dominance result to a lack of uniformity in the margin assumption, a standard sufficient condition for debiased inference on the optimal welfare ensuring that the first-best policy is well-separated from the suboptimal ones. Finally, we show that inverting the existing tests from the moment inequality literature produces LCBs that are robust to the non-uniqueness of the optimal policy and easy to compute. We find that this approach performs well empirically in the context of the National JTPA study.
翻译:本文探讨了在政策学习问题中如何报告最优福利的下置信带(LCB)。一种直接的方法是基于最优福利的有效估计量,对单边t检验进行逆推。我们证明,在实证相关的数据生成过程中,该程序可能被对应于次优福利的LCB所主导,其平均差异为N-1/2量级。我们将一阶主导结果与边际假设中缺乏均匀性联系起来,该假设是确保最优政策与次优政策明确分离、从而对最优福利进行去偏推断的标准充分条件。最后,我们证明,对矩不等式文献中现有检验进行逆推所产生的LCB对最优政策的非唯一性具有稳健性,且易于计算。我们发现,该方法在全国JTPA研究背景下表现出良好的实证性能。