Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution in a truthful manner, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. We also ensure that agents are incentivized to truthfully reveal information in the intermediate stages of the algorithm. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism that enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FEMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees and better model performance for all agents.
翻译:经典的联邦学习假设客户端拥有有限数量的带噪数据,并自愿参与其中,以原则性方式共同学习一个更精确的全局模型。学习过程以分布式方式进行,无需与中心服务器共享数据。然而,这些方法未考虑智能体参与和贡献过程的激励问题,因为数据收集和运行分布式算法对客户端而言成本高昂。近期文献已开始探讨贡献的合理性问题,并存在一些相关研究成果。本文旨在同时解决参数学习和以真实方式激励贡献的问题,这使其区别于现有文献。我们提出的第一种机制在纳什均衡状态下激励每个客户端为联邦学习过程做出贡献,并同时学习模型参数。我们还确保智能体在算法的中间阶段有激励真实披露信息。然而,该均衡结果可能偏离最优状态,即客户端贡献其全部数据且算法学习到最优参数。我们提出了第二种机制,能够实现全数据贡献与最优参数学习。在真实(联邦)数据集(CIFAR-10、FEMNIST 和 Twitter)上的大规模实验表明,这些算法在实践中收敛速度较快,能为所有智能体提供良好的福利保证和更优的模型性能。