We introduce AlphaRank, an artificial intelligence approach to address the fixed-budget ranking and selection (R&S) problems. We formulate the sequential sampling decision as a Markov decision process and propose a Monte Carlo simulation-based rollout policy that utilizes classic R&S procedures as base policies for efficiently learning the value function of stochastic dynamic programming. We accelerate online sample-allocation by using deep reinforcement learning to pre-train a neural network model offline based on a given prior. We also propose a parallelizable computing framework for large-scale problems, effectively combining "divide and conquer" and "recursion" for enhanced scalability and efficiency. Numerical experiments demonstrate that the performance of AlphaRank is significantly improved over the base policies, which could be attributed to AlphaRank's superior capability on the trade-off among mean, variance, and induced correlation overlooked by many existing policies.
翻译:我们提出AlphaRank,一种用于解决固定预算排序与选择(R&S)问题的人工智能方法。将序贯采样决策建模为马尔可夫决策过程,并提出一种基于蒙特卡洛模拟的滚动策略,该策略以经典R&S流程作为基础策略,用于高效学习随机动态规划的价值函数。通过深度强化学习,基于给定先验信息离线预训练神经网络模型,加速在线样本分配。针对大规模问题,提出一种可并行化的计算框架,有效结合"分治"与"递归"思想以增强可扩展性与效率。数值实验表明,AlphaRank的性能相比基础策略显著提升,这归因于其在均值、方差及现有策略常忽视的诱导相关性之间的权衡能力。