Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective

In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.

翻译：在数据驱动的随机优化中，除优化任务外，还需从数据中估计潜在分布的模型参数。近期文献通过选择能使经验目标性能最优的模型参数，将估计与优化过程进行集成。这种集成方法（我们称之为集成估计优化，IEO）在模型设定错误时，显然优于简单的先估计后优化（ETO）方法。本文表明，当模型类别设定正确且数据充足时，会出现相反情况。具体而言，对于一类非线性随机优化问题，当模型类别包含真实分布时，简单的ETO方法在随机占优的强意义下（即效用的整个分布——不仅包括其均值或其他矩——始终优于IEO）渐近优于IEO。该结论同样适用于决策依赖于观测特征的约束型情境优化问题。在适用情况下，本文还证明：当模型类别设定正确时，标准样本平均逼近（SAA）方法在效用方面表现最差；而当模型设定错误时则表现最佳。最后，我们通过实验验证上述理论对比，并阐明在有限样本情境及不同误设程度下结论的适用条件。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/