We study distributed adversarial bandits, where $N$ agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$, where $T$ is the horizon, $K$ is the number of actions, and $ρ$ is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$ of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost $ρ^{-1/4}\sqrt{T}$ and a bandit cost $\sqrt{KT/N}$. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in $R^d$, obtaining a regret bound of $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$, achieved with only $O(d)$ communication cost per agent and per round via a volumetric spanner.
翻译:我们研究分布式对抗性赌博机问题,其中 $N$ 个智能体协同最小化全局平均损失,同时仅观察到各自的局部损失。我们证明该问题的最小最大遗憾为 $\tildeΘ(\sqrt{(ρ^{-1/2}+K/N)T})$,其中 $T$ 为时间范围,$K$ 为动作数量,$ρ$ 为通信矩阵的谱间隙。我们的算法基于一种新颖的将问题归约到带延迟反馈赌博机的黑箱方法,仅要求智能体通过 gossip 协议通信。该算法的上界显著优于 Yi 和 Vojnovic(2023)先前得到的最佳上界 $\tilde{O}(ρ^{-1/3}(KT)^{2/3})$。我们同时给出匹配的下界,证明问题的难度可分解为通信代价 $ρ^{-1/4}\sqrt{T}$ 和赌博机代价 $\sqrt{KT/N}$。进一步,通过在分布式对抗性设定下推导一阶界和两全其美界,展示我们方法的通用性。最后,我们将框架扩展到 $\mathbb{R}^d$ 中的分布式线性赌博机,借助体积张量生成器,每个智能体每轮仅需 $O(d)$ 通信代价即可获得 $\tilde{O}(\sqrt{(ρ^{-1/2}+1/N)dT})$ 的遗憾界。