There is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from others, but the part of the information it shares for others to learn from could be sensitive; thus, it desires its privacy. This work investigates a group of agents working concurrently to solve similar combinatorial bandit problems while maintaining quality constraints. Can these agents collectively learn while keeping their sensitive information confidential by employing differential privacy? We observe that communicating can reduce the regret. However, differential privacy techniques for protecting sensitive information makes the data noisy and may deteriorate than help to improve regret. Hence, we note that it is essential to decide when to communicate and what shared data to learn to strike a functional balance between regret and privacy. For such a federated combinatorial MAB setting, we propose a Privacy-preserving Federated Combinatorial Bandit algorithm, P-FCB. We illustrate the efficacy of P-FCB through simulations. We further show that our algorithm provides an improvement in terms of regret while upholding quality threshold and meaningful privacy guarantees.
翻译:在线学习环境中,协同学习范式(即联邦学习)正迅速发展。与大多数联邦学习场景不同,许多情形下智能体之间存在竞争关系。每个智能体既希望向他人学习,但其分享以供他人学习的信息可能涉及敏感内容,因此需要保护隐私。本研究探讨一组智能体在维持质量约束的同时,并发解决相似组合赌博机问题。这些智能体能否在利用差分隐私保护敏感信息的同时实现集体学习?我们发现通信可以降低遗憾值,但保护敏感信息的差分隐私技术会使数据产生噪声,反而可能阻碍而非改善遗憾值。因此,我们认为必须审慎决定通信时机和共享数据的来源,以在遗憾值与隐私保护之间实现功能性平衡。针对这种联邦组合MAB场景,我们提出了一种隐私保护联邦组合赌博机算法P-FCB。通过仿真实验验证了P-FCB的有效性,进一步证明该算法在满足质量阈值和提供实质性隐私保障的同时,能够有效改善遗憾值性能。