In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.
翻译:在数据驱动优化中,样本均值近似(SAA)存在所谓的"优化者诅咒",导致对解的性能评价过于乐观。我们论证了一种特殊形式的分布鲁棒优化(DRO)公式,相较于简单的SAA"裕度"调整及其他DRO方法,在修正这种优化者诅咒方面具有理论优势:针对广泛的目标函数与分布类别,该公式在样本外性能上达到了统计界,该统计界在指数衰减率意义上近乎最严格。此DRO采用基于通过适当距离优化、经Wasserstein或Lévy-Prokhorov(LP)距离平滑化的Kullback-Leibler(KL)散度的不确定集。在计算层面,我们还表明此类DRO及其使用平滑化$f$-散度的泛化版本,其难度不超过基于$f$-散度或Wasserstein距离的DRO问题,这使得我们的DRO公式兼具统计最优性与计算可行性。