We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution and design an efficient sequential sampling algorithm to learn the MPB when the parameter has a finite support. We derive the large deviations rate of the probability of falsely selecting the MPB and formulate an optimal computing budget allocation problem to find the rate-maximizing static sampling ratios. The problem is then relaxed to obtain a set of optimality conditions that are interpretable and computationally efficient to verify. We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases. Furthermore, we show that the empirical performances of the algorithms can be significantly improved by adopting the kernel ridge regression for mean estimation while achieving the same asymptotic convergence results. The algorithms are benchmarked against a state-of-the-art contextual R&S algorithm and demonstrated to have superior empirical performances.
翻译:我们考虑一个期望值排序与选择(R&S)问题,其中所有k个解的模拟输出取决于一个公共参数,该参数的不确定性可通过分布进行建模。我们将最可能最优(MPB)定义为相对于该分布具有最大最优概率的解,并设计了一种高效的序贯采样算法,用于在参数有限支撑的情况下学习MPB。我们推导了错误选择MPB概率的大偏差率,并构建了一个最优计算预算分配问题,以寻找最大化该率的静态采样比率。随后对该问题进行松弛,获得一组可解释且计算验证高效的最优性条件。我们设计了一系列算法,用估计值替代最优性条件中的未知均值,并证明随着模拟预算增加,这些算法的采样比率能够达到该条件。此外,我们表明通过采用核岭回归进行均值估计,在实现相同渐近收敛结果的同时,可显著提升算法的实证性能。这些算法与最先进的情境R&S算法进行了基准对比,并展现出优越的实证表现。