We study fixed budget constrained best-arm identification in grouped bandits, where each arm consists of multiple independent attributes with stochastic rewards. An arm is considered feasible only if all its attributes' means are above a given threshold. The aim is to find the feasible arm with the largest overall mean. We first derive a lower bound on the error probability for any algorithm on this setting. We then propose Feasibility Constrained Successive Rejects (FCSR), a novel algorithm that identifies the best arm while ensuring feasibility. We show it attains optimal dependence on problem parameters up to constant factors in the exponent. Empirically, FCSR outperforms natural baselines while preserving feasibility guarantees.
翻译:本文研究分组多臂赌博机中固定预算约束下的最优可行臂识别问题,其中每个臂包含多个具有随机奖励的独立属性。仅当某臂所有属性的均值均超过给定阈值时,该臂才被视为可行臂。研究目标在于寻找具有最大整体均值的可行臂。我们首先推导了该设定下任意算法的错误概率下界。随后提出可行性约束连续排除算法——一种在确保可行性的同时识别最优臂的新型算法。理论证明表明该算法在指数阶上实现了对问题参数的最优依赖关系(至常数因子)。实证研究表明,FCSR算法在保持可行性保证的同时,性能显著优于自然基线方法。