We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients, when each client has access to a {\em subset} of arms and each arm yields independent Gaussian observations. The goal is to identify the best arm of each client subject to an upper bound on the error probability; here, the best arm is one that has the largest {\em average} value of the means averaged across all clients having access to the arm. Our interest is in the asymptotics as the error probability vanishes. We provide an asymptotic lower bound on the growth rate of the expected stopping time of any algorithm. Furthermore, we show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant ({\em almost-optimal} algorithm), the ratio of any two consecutive communication time instants must be {\em bounded}, a result that is of independent interest. We thereby infer that an algorithm can communicate no more sparsely than at exponential time instants in order to be almost-optimal. For the class of almost-optimal algorithms, we present the first-of-its-kind asymptotic lower bound on the expected number of {\em communication rounds} until stoppage. We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is asymptotically almost-optimal.
翻译:我们研究了联邦多臂老虎机设置中的最佳臂识别问题,该设置包含一个中央服务器和多个客户端,每个客户端可访问一个子集的臂,且每个臂产生独立的高斯观测值。目标是在满足错误概率上界的前提下,识别每个客户端的最佳臂;此处的最佳臂定义为所有可访问该臂的客户端中,其均值平均值最大的臂。我们关注错误概率趋于零时的渐近性质。我们给出了任意算法预期停止时间增长率的渐近下界。此外,我们证明了对于任何预期停止时间上界与下界相差仅一个乘法常数的算法(即几乎最优算法),任意两个连续通信时间点的比值必须是有界的,这一结果具有独立的研究意义。由此我们推断,要实现近乎最优,算法的通信间隔不能超过指数时间间隔。针对这类几乎最优算法,我们首次提出了停止前预期通信轮数的渐近下界。我们提出了一种在指数时间点进行通信的新算法,并证明了其渐近近似最优性。