Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+ε)$-absolute central moment bounded by $\upsilon$ for some $ε\in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+ε}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $Ω(d^{\fracε{1+ε}} T^{\frac{1}{1+ε}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($ε= 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3ε}{2(1+ε)}} T^{\frac{1}{1+ε}})$, thus improving the dependence on $d$ for all $ε\in (0,1)$ and recovering a known optimal result for $ε= 1$. We also establish a lower bound of $Ω(d^{\frac{2ε}{1+ε}} T^{\frac{1}{1+ε}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + ε$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Matérn kernel that are the first to be sublinear for all $ε\in (0, 1]$.

翻译：我们研究具有重尾奖励的随机线性赌博机问题，其中奖励的有限$(1+ε)$阶绝对中心矩以$\upsilon$为界，其中$ε\in (0,1]$。与先前工作相比，我们改进了极小极大遗憾的上界和下界。当$\upsilon = \mathcal{O}(1)$时，先前已知的最佳遗憾上界为$\tilde{\mathcal{O}}(d T^{\frac{1}{1+ε}})$。虽然已有相同尺度的下界给出，但其依赖于使用$\upsilon = \mathcal{O}(d)$的构造，而将该构造调整到$\upsilon = \mathcal{O}(1)$的有界矩情形仅能得到$Ω(d^{\fracε{1+ε}} T^{\frac{1}{1+ε}})$的下界。这与多臂赌博机的已知速率相匹配，但对于线性赌博机而言通常是宽松的，特别是在有限方差情形($ε= 1$)下比最优速率低$\sqrt{d}$。我们提出一种基于实验设计引导的新型淘汰算法，其遗憾为$\tilde{\mathcal{O}}(d^{\frac{1+3ε}{2(1+ε)}} T^{\frac{1}{1+ε}})$，从而改进了对所有$ε\in (0,1)$的$d$依赖关系，并在$ε= 1$时恢复了已知的最优结果。我们还建立了$Ω(d^{\frac{2ε}{1+ε}} T^{\frac{1}{1+ε}})$的下界，该下界严格优于多臂赌博机速率，并凸显了重尾线性赌博机问题的困难性。对于有限动作集，我们推导了类似改进的遗憾上界和下界。最后，我们提供了动作集依赖的遗憾上界，表明对于某些几何结构（例如$p \le 1 + ε$的$l_p$范数球），可以进一步降低对$d$的依赖，并且能够通过核技巧处理无限维设置，特别是为Matérn核建立了新的遗憾界，这是首次对所有$ε\in (0, 1]$实现亚线性遗憾。