Stochastic Rising Bandits is a setting in which the values of the expected rewards of the available options increase every time they are selected. This framework models a wide range of scenarios in which the available options are learning entities whose performance improves over time. In this paper, we focus on the Best Arm Identification (BAI) problem for the stochastic rested rising bandits. In this scenario, we are asked, given a fixed budget of rounds, to provide a recommendation about the best option at the end of the selection process. We propose two algorithms to tackle the above-mentioned setting, namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a successive reject procedure. We show that they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process. Finally, we numerically validate the proposed algorithms in synthetic and realistic environments and compare them with the currently available BAI strategies.
翻译:随机上升老虎机是一种设置,其中可用选项的期望奖励值在其每次被选择时都会增加。该框架模拟了多种场景,其中可用选项是性能随时间提升的学习实体。本文针对随机休憩上升老虎机的最佳臂识别(BAI)问题展开研究。在此场景中,给定固定轮次预算,我们需在选型过程结束时提供关于最优选项的建议。我们提出两种算法以应对上述设置:R-UCBE采用类似UCB的策略,R-SR则采用逐次拒绝流程。我们证明这些算法能够为在训练过程结束时正确识别最优选项的概率提供保障。最后,我们在合成环境与真实环境下对所提算法进行数值验证,并将其与当前可用的BAI策略进行对比。