The (Surprising) Sample Optimality of Greedy Procedures for Large-Scale Ranking and Selection

Ranking and selection (R&S) aims to select the best alternative with the largest mean performance from a finite set of alternatives. Recently, considerable attention has turned towards the large-scale R&S problem which involves a large number of alternatives. Ideal large-scale R&S procedures should be sample optimal, i.e., the total sample size required to deliver an asymptotically non-zero probability of correct selection (PCS) grows at the minimal order (linear order) in the number of alternatives, $k$. Surprisingly, we discover that the na\"ive greedy procedure, which keeps sampling the alternative with the largest running average, performs strikingly well and appears sample optimal. To understand this discovery, we develop a new boundary-crossing perspective and prove that the greedy procedure is sample optimal for the scenarios where the best mean maintains at least a positive constant away from all other means as $k$ increases. We further show that the derived PCS lower bound is asymptotically tight for the slippage configuration of means with a common variance. For other scenarios, we consider the probability of good selection and find that the result depends on the growth behavior of the number of good alternatives: if it remains bounded as $k$ increases, the sample optimality still holds; otherwise, the result may change. Moreover, we propose the explore-first greedy procedures by adding an exploration phase to the greedy procedure. The procedures are proven to be sample optimal and consistent under the same assumptions. Last, we numerically investigate the performance of our greedy procedures in solving large-scale R&S problems.

翻译：排序与选择（R&S）旨在从有限备选方案集合中选出均值性能最优的选项。近期，针对包含大量备选方案的大规模R&S问题引起了广泛关注。理想的大规模R&S过程应具备样本最优性，即实现渐近非零正确选择概率（PCS）所需的总样本量随备选方案数量k以最低阶（线性阶）增长。令人惊讶的是，我们发现简单贪心过程（持续对当前运行均值最高的备选方案进行采样）表现出色且具有样本最优性。为解释这一发现，我们提出了新的边界穿越视角，并证明当最优均值与其他均值之差在k增大时保持至少正常数距离时，贪心过程具有样本最优性。进一步证明，在公共方差下的滑动均值配置中，推导出的PCS下界具有渐近紧致性。针对其他场景，我们考虑优良选择概率，发现结果取决于优良备选方案数量的增长行为：若该数量在k增大时保持有界，则样本最优性仍然成立；否则结果可能改变。此外，我们提出先探索后贪心的过程，即在贪心过程中增加探索阶段。证明该过程在相同假设下具有样本最优性与一致性。最后，通过数值实验探究贪心过程在求解大规模R&S问题中的性能表现。