We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{O}\left( d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d^2 g_{\max} κ+ d κC \right)$, where $\dotμ_{t,\star}$ is the slope of $μ$ around the optimal arm at time $t$, $g(τ_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(τ_t) = σ_t^2$ for heteroskedastic linear bandits, $g(τ_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(τ_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tildeΩ(d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $κ$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.
翻译:本文研究了具有对抗性干扰的异方差广义线性赌博机问题,该框架涵盖了多种随机上下文赌博机场景,包括异方差线性赌博机和逻辑/泊松赌博机。我们提出HCW-GLB-OMD算法,该算法包含两个核心组件:基于在线镜像下降的估计器和基于Hessian矩阵的置信权重机制,以实现对干扰的鲁棒性。该算法在计算上高效,每次迭代仅需${O}(1)$的空间和时间复杂度。在连接函数满足自相容性的假设下,我们证明了$\tilde{O}\left( d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d^2 g_{\max} κ+ d κC \right)$的遗憾上界,其中$\dotμ_{t,\star}$表示时刻$t$最优臂附近$μ$的斜率,$g(τ_t)$为可能随时间外生变化的离散度参数(例如异方差线性赌博机中$g(τ_t) = σ_t^2$,伯努利与泊松情形中$g(τ_t) = 1$),$g_{\max} = \max_{t \in [T]} g(τ_t)$为最大离散度,$C \geq 0$表示对抗者的总干扰预算。我们进一步给出了$\tildeΩ(d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d C)$的下界,统一了以往针对特定问题的下界结果。因此,在干扰项存在$κ$因子差异的范围内,我们的算法在具有对抗性干扰的各类异方差广义线性赌博机实例中同时实现了实例层面的极小极大最优性。