We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of $\tilde{O}(\sqrt{d T})$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as $O(d M^2 \log(d)\log(T))$ and $O(\sqrt{d^3 M^3} \log(d))$, respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of $\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} \sigma_t^2})$ can be achieved with $\sigma_t^2$ being the noise variance of round $t$; and (2) adversarial corruption, where a total regret of $\tilde{O}(\sqrt{dT} + d C_p)$ can be achieved with $C_p$ being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of FedSupLinUCB on both synthetic and real-world datasets.
翻译:我们研究了一种联邦线性Bandit模型,其中$M$个客户端与中央服务器通信,以解决具有有限对抗动作集(可能因客户端而异)的线性上下文Bandit问题。为应对有限对抗动作集带来的独特挑战,我们提出了FedSupLinUCB算法,该算法扩展了线性上下文Bandit问题中SupLinUCB与OFUL算法的原理。我们证明,FedSupLinUCB可实现$\tilde{O}(\sqrt{d T})$的总遗憾,其中$T$为所有客户端的总臂拉取次数,$d$为线性模型的本征维度。该结果与极小化极大下界匹配,因此在阶数上(忽略多对数项)达到最优。我们分别研究了异步与同步两种情形,并表明通信成本可控制在$O(d M^2 \log(d)\log(T))$与$O(\sqrt{d^3 M^3} \log(d))$的量级。FedSupLinUCB的设计进一步被扩展至两种场景:(1)方差自适应场景,此时可实现$\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} \sigma_t^2})$的总遗憾,其中$\sigma_t^2$为第$t$轮的噪声方差;(2)对抗性腐败场景,此时可实现$\tilde{O}(\sqrt{dT} + d C_p)$的总遗憾,其中$C_p$为总腐败预算。实验结果与理论分析一致,并验证了FedSupLinUCB在合成数据集与真实数据集上的有效性。