Heavy-tailed distributions naturally arise in many settings, from finance to telecommunications. While regret minimization under sub-Gaussian or bounded support rewards has been widely studied, learning on heavy-tailed distributions only gained popularity over the last decade. In the stochastic heavy-tailed bandit problem, an agent learns under the assumption that the distributions have finite moments of maximum order $1+\epsilon$ which are uniformly bounded by a constant $u$, for some $\epsilon \in (0,1]$. To the best of our knowledge, literature only provides algorithms requiring these two quantities as an input. In this paper, we study the stochastic adaptive heavy-tailed bandit, a variation of the standard setting where both $\epsilon$ and $u$ are unknown to the agent. We show that adaptivity comes at a cost, introducing two lower bounds on the regret of any adaptive algorithm, implying a higher regret w.r.t. the standard setting. Finally, we introduce a specific distributional assumption and provide Adaptive Robust UCB, a regret minimization strategy matching the known lower bound for the heavy-tailed MAB problem.
翻译:重尾分布在众多领域中自然出现,从金融到电信。虽然关于次高斯或有界支持奖励的遗憾最小化已被广泛研究,但基于重尾分布的学习仅在过去十年中才受到关注。在随机重尾赌博机问题中,智能体在假设分布具有最大阶数$1+\epsilon$的有限矩且被常数$u$一致有界的条件下进行学习,其中$\epsilon \in (0,1]$。据我们所知,现有文献仅提供需要这两个量作为输入的算法。本文研究了随机自适应重尾赌博机,即标准设置的一种变体,其中$\epsilon$和$u$对智能体均未知。我们表明自适应机制需要付出代价,为任何自适应算法的遗憾提出了两个下界,表明其相对于标准设置具有更高的遗憾。最后,我们引入特定的分布假设,并提出自适应鲁棒UCB策略,该策略匹配已知的重尾MAB问题下界。