We study best-arm identification in stochastic multi-armed bandits under the fixed-confidence setting, focusing on instances with multiple optimal arms. Unlike prior work that addresses the unknown-number-of-optimal-arms case, we consider the setting where the number of optimal arms is known in advance. We derive a new information-theoretic lower bound on the expected sample complexity that leverages this structural knowledge and is strictly tighter than previous bounds. Building on the Track-and-Stop algorithm, we propose a modified, tie-aware stopping rule and prove that it achieves asymptotic instance-optimality, matching the new lower bound. Our results provide the first formal guarantee of optimality for Track-and-Stop in multi-optimal settings with known cardinality, offering both theoretical insights and practical guidance for efficiently identifying any optimal arm.
翻译:本文研究随机多臂老虎机问题中固定置信度设置下的最佳臂识别,重点关注存在多个最优臂的情形。与先前处理未知最优臂数量的研究不同,我们考虑预先已知最优臂数量的设置。利用这一结构信息,我们推导出期望样本复杂度的新信息论下界,该下界严格优于先前界限。基于Track-and-Stop算法,我们提出改进的平局感知停止规则,并证明其能达到渐近实例最优性,匹配新的下界。我们的研究首次为已知基数多重最优设置中的Track-and-Stop算法提供了最优性形式保证,为高效识别任意最优臂提供了理论洞见和实践指导。