In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)$ using constant arm memory.
翻译:本文研究了流式赌博机问题,其中学习者旨在通过处理在线到达的臂与次线性臂记忆来最小化遗憾。我们为任何具有时间水平$T$、臂数$K$和轮数$B$的算法建立了紧致的 worst-case 遗憾下界 $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$。该结果揭示了经典集中式设置中的随机赌博机问题与具有有限臂记忆的流式设置之间的差异。值得注意的是,与众所周知的 $\Omega(\sqrt{KT})$ 下界相比,对于任何允许次线性记忆的流式赌博机算法,一个额外的双对数因子是不可避免的。此外,我们为流式赌博机建立了首个实例相关的下界 $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$。这些下界通过一种独特的归约方法导出,即将遗憾最小化设置转化为一系列$\epsilon$-最优臂识别任务的样本复杂度分析,该方法本身可能具有独立的研究价值。为补充下界,我们还提供了一种多轮算法,该算法在恒定臂记忆下实现了 $\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)$ 的遗憾上界。