Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution in a truthful manner, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. We also ensure that agents are incentivized to truthfully reveal information in the intermediate stages of the algorithm. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism that enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FEMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees and better model performance for all agents.
翻译:经典的联邦学习(FL)假设客户端拥有有限量的含噪数据,并自愿以原则性方式参与和贡献,以学习更精确的全局模型。学习过程以分布式方式进行,无需与中心服务器共享数据。然而,这些方法未考虑客户端参与和贡献的激励问题,因为数据收集和运行分布式算法对客户端而言成本高昂。近期文献已开始探讨贡献的理性问题,并存在部分相关研究成果。本文致力于同时解决参数学习和以真实方式激励贡献的问题,这区别于现有文献。我们提出的第一种机制在纳什均衡状态下激励每个客户端参与联邦学习过程,并同步学习模型参数。我们还确保算法在中间阶段能激励客户端真实披露信息。但该均衡结果可能偏离最优状态——即客户端贡献全部数据且算法学习到最优参数。为此,我们提出第二种机制,既能实现全数据贡献,又能达成最优参数学习。基于真实联邦数据集(CIFAR-10、FEMNIST和Twitter)的大规模实验表明,这些算法在实践中收敛速度较快,能为所有参与者提供良好的福利保证和更优的模型性能。