We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor when the number of arms is large enough relative to degrees of agents in the communication graph. We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter. We present numerical experiments validating our theoretical results and demonstrate cases when our algorithms outperform previously proposed algorithms.
翻译:我们考虑了通过具有延迟的通信网络协作的非随机多智能体多臂老虎机问题。我们证明了所有智能体个体遗憾的下界。结果表明,在合适的正则化器和通信协议下,协作多智能体“跟随正则化领导者”(FTRL)算法的个体遗憾上界在臂的数量相对于通信图中智能体度数足够大时,与下界相差至多一个常数因子。我们还证明,采用适当正则化器的FTRL算法在边缘延迟参数的缩放方面是遗憾最优的。我们通过数值实验验证了理论结果,并展示了我们的算法优于此前提出算法的情形。