We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $\widetilde{\mathcal{O}}(\sqrt{T})$ regret in the adversarial regime and $\widetilde{\mathcal{O}}(\ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.
翻译:我们提出了首个适用于上下文组合半赌博机的双世界最优算法,该算法在对抗性场景下确保$\widetilde{\mathcal{O}}(\sqrt{T})$的遗憾界,并在含噪随机场景下确保$\widetilde{\mathcal{O}}(\ln T)$的遗憾界。我们的方法基于配备香农熵正则化器的正则化跟随者(FTRL)框架,形成了一种支持高效实现的灵活方法。除遗憾界之外,我们解决了FTRL(等价于在线随机镜像下降)中因每轮交互所需的高维投影步骤而引发的实际瓶颈。通过利用Karush-Kuhn-Tucker条件,我们将$K$维凸投影问题转化为单变量求根问题,从而大幅加速了每轮计算。实验评估表明,该组合策略不仅实现了双世界最优算法的理想遗憾界,还显著提升了每轮速度,使其适用于大规模实时应用场景。