The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data. In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms.
翻译:隐式归一化预测器(INF)算法被认为是解决对抗性多臂赌博机(MAB)问题的最优方案。然而,现有关于INF的大多数复杂度结果依赖于奖励有界等限制性假设。近期,一种同时适用于对抗性和随机重尾MAB场景的相关算法被提出,但该算法未能充分挖掘可用数据。本文针对重尾奖励分布的MAB问题,提出INF算法的新版本——带裁剪的隐式归一化预测器(INF-clip)。我们在奖励分布的温和假设下建立了收敛性结果,并证明INF-clip在线性重尾随机MAB问题中达到最优,且在非线性问题上表现良好。此外,我们证明在难以区分不同臂的场景中,INF-clip的性能优于"两全其美"算法。