In data-driven optimization, sample average approximation is known to suffer from the so-called optimizer's curse that causes optimistic bias in evaluating the solution performance. This can be tackled by adding a "margin" to the estimated objective value, or via distributionally robust optimization (DRO), a fast-growing approach based on worst-case analysis, which gives a protective bound on the attained objective value. However, in all these existing approaches, a statistically guaranteed bound on the true solution performance either requires restrictive conditions and knowledge on the objective function complexity, or otherwise exhibits an over-conservative rate that depends on the distribution dimension. We argue that a special type of DRO offers strong theoretical advantages in regard to these challenges: It attains a statistical bound on the true solution performance that is the tightest possible in terms of exponential decay rate, for a wide class of objective functions that notably does not hinge on function complexity. Correspondingly, its calibration also does not require any complexity information. This DRO uses an ambiguity set based on a KL-divergence smoothed by the Wasserstein or Levy-Prokhorov distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized version using smoothed $f$-divergence, is not much harder than standard DRO problems using the $f$-divergence or Wasserstein distance, thus supporting the strengths of such DRO as both statistically optimal and computationally viable.
翻译:在数据驱动优化中,样本均值逼近已知会遭受所谓的“优化者诅咒”,该诅咒导致对解的绩效评估产生乐观偏差。这一问题可通过为目标估计值添加“裕度”或通过分布鲁棒优化(DRO)来解决——这是一种基于最坏情况分析的快速发展的方法,可为所达到的目标值提供保护性界限。然而,在所有这些现有方法中,对真实解的绩效的统计保证性界限要么需要目标函数复杂性的限制性条件与先验知识,要么表现出依赖于分布维度的过度保守的速率。我们论证,一种特殊类型的DRO在应对这些挑战方面具有显著的理论优势:对于一大类目标函数,它能够以指数衰减率获得真实解绩效的最严格可能的统计界限,且值得注意的是,该界限不依赖于函数复杂性。相应地,其校准也不需要任何复杂性信息。这种DRO使用的模糊集基于通过Wasserstein或Levy-Prokhorov距离并结合适当距离优化而平滑的KL散度。在计算方面,我们还表明,此类DRO及其使用平滑$f$-散度的广义版本并不比使用$f$-散度或Wasserstein距离的标准DRO问题复杂多少,从而支持此类DRO兼具统计最优性与计算可行性的优势。