The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data. In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms.
翻译:隐式归一化预测器(INF)算法被视为对抗性多臂赌博机(MAB)问题的最优解。然而,现有关于INF的大多数复杂度结果均基于有界奖励等严格假设。近期虽有研究提出一种可同时适用于对抗性与随机重尾MAB场景的相关算法,但该算法未能充分挖掘可用数据。本文针对重尾奖励分布下的MAB问题,提出一种名为带裁剪的隐式归一化预测器(INF-clip)的INF新变体。我们在对奖励分布的宽松假设下建立了收敛性结论,并证明INF-clip在线性重尾随机MAB问题中具有最优性,且在非线性问题中表现优异。此外,我们表明在难以区分不同臂的复杂场景中,INF-clip的性能超越"两全其美"算法。