We study collaborative learning among distributed clients facilitated by a central server. Each client is interested in maximizing a personalized objective function that is a weighted sum of its local objective and a global objective. Each client has direct access to random bandit feedback on its local objective, but only has a partial view of the global objective and relies on information exchange with other clients for collaborative learning. We adopt the kernel-based bandit framework where the objective functions belong to a reproducing kernel Hilbert space. We propose an algorithm based on surrogate Gaussian process (GP) models and establish its order-optimal regret performance (up to polylogarithmic factors). We also show that the sparse approximations of the GP models can be employed to reduce the communication overhead across clients.
翻译:我们研究由中央服务器协调的分布式客户端之间的协同学习问题。每个客户端旨在最大化一个个性化目标函数,该函数是其本地目标与全局目标的加权和。每个客户端可直接访问其本地目标的随机bandit反馈,但仅能观测全局目标的部分信息,需通过与其他客户端的信息交换实现协同学习。我们采用目标函数属于再生核希尔伯特空间的核函数bandit框架,提出一种基于代理高斯过程模型的算法,并证明其遗憾值在阶数上达到最优(仅含多项式对数因子)。此外,我们展示了通过高斯过程模型的稀疏近似方法可降低客户端间的通信开销。