Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3}})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.
翻译:多臂老虎机问题不仅促进了具有可证明遗憾上界的方法的发展,其对应的遗憾下界也在这一背景下得到了广泛研究。近年来,多智能体多臂老虎机问题在多个领域引起了广泛关注,其中个体客户端以分布式方式面临老虎机问题,而目标在于系统整体性能(通常由遗憾度量)。尽管已经出现了具有遗憾上界的高效算法,但除了最近针对对抗性设定提出的存在间隙与已知上界的下界外,对相应遗憾下界的关注仍然有限。为此,本文首次全面研究了不同设定下的遗憾下界,并证明了其紧性。具体而言,当图具有良好连通性质且奖励服从随机分布时,我们证明了实例依赖下界为$O(\log T)$阶,均值间隙无关下界为$\sqrt{T}$阶,且均为紧下界。在对抗性奖励假设下,我们为连通图建立了$O(T^{\frac{2}{3}})$阶下界,从而弥合了先前工作中下界与上界之间的间隙。此外,我们证明当图不连通时,遗憾下界为线性阶。尽管先前工作已探索了这些设定下的上界,我们提供了关于紧下界的深入研究。