This paper addresses the problem of designing efficient no-swap regret algorithms for combinatorial bandits, where the number of actions $N$ is exponentially large in the dimensionality of the problem. In this setting, designing efficient no-swap regret translates to sublinear -- in horizon $T$ -- swap regret with polylogarithmic dependence on $N$. In contrast to the weaker notion of external regret minimization - a problem which is fairly well understood in the literature - achieving no-swap regret with a polylogarithmic dependence on $N$ has remained elusive in combinatorial bandits. Our paper resolves this challenge, by introducing a no-swap-regret learning algorithm with regret that scales polylogarithmically in $N$ and is tight for the class of combinatorial bandits. To ground our results, we also demonstrate how to implement the proposed algorithm efficiently -- that is, with a per-iteration complexity that also scales polylogarithmically in $N$ -- across a wide range of well-studied applications.
翻译:本文研究了组合多臂老虎机中高效无交换遗憾算法的设计问题,其中动作数量$N$相对于问题维度呈指数级增长。在此设定下,设计高效无交换遗憾算法意味着实现与时间范围$T$呈次线性关系、且对$N$具有多对数依赖性的交换遗憾。相较于文献中已有较深入研究的较弱概念——外部遗憾最小化问题,在组合多臂老虎机中实现具有多对数$N$依赖性的无交换遗憾始终是未解决的难题。本文通过提出一种遗憾对$N$呈多对数缩放、且对组合多臂老虎机类别达到紧界的无交换遗憾学习算法,成功解决了这一挑战。为验证结果的实际意义,我们还展示了如何在多种经典应用场景中高效实现该算法——即每次迭代的计算复杂度同样对$N$呈多对数缩放。