In data-driven stochastic optimization, model parameters of the underlying distribution need to be estimated from data in addition to the optimization task. Recent literature considers integrating the estimation and optimization processes by selecting model parameters that lead to the best empirical objective performance. This integrated approach, which we call integrated-estimation-optimization (IEO), can be readily shown to outperform simple estimate-then-optimize (ETO) when the model is misspecified. In this paper, we show that a reverse behavior appears when the model class is well-specified and there is sufficient data. Specifically, for a general class of nonlinear stochastic optimization problems, we show that simple ETO outperforms IEO asymptotically when the model class covers the ground truth, in the strong sense of stochastic dominance of the regret. Namely, the entire distribution of the regret, not only its mean or other moments, is always better for ETO compared to IEO. Our results also apply to constrained, contextual optimization problems where the decision depends on observed features. Whenever applicable, we also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret, and best when it is misspecified. Finally, we provide experimental results to support our theoretical comparisons and illustrate when our insights hold in finite-sample regimes and under various degrees of misspecification.
翻译:在数据驱动的随机优化中,除优化任务外,还需从数据中估计潜在分布的模型参数。近期文献通过选择能使经验目标性能最优的模型参数,将估计与优化过程进行集成。这种集成方法(我们称之为集成估计优化,IEO)在模型设定错误时,显然优于简单的先估计后优化(ETO)方法。本文表明,当模型类别设定正确且数据充足时,会出现相反情况。具体而言,对于一类非线性随机优化问题,当模型类别包含真实分布时,简单的ETO方法在随机占优的强意义下(即效用的整个分布——不仅包括其均值或其他矩——始终优于IEO)渐近优于IEO。该结论同样适用于决策依赖于观测特征的约束型情境优化问题。在适用情况下,本文还证明:当模型类别设定正确时,标准样本平均逼近(SAA)方法在效用方面表现最差;而当模型设定错误时则表现最佳。最后,我们通过实验验证上述理论对比,并阐明在有限样本情境及不同误设程度下结论的适用条件。