Federated learning (FL) has emerged as a promising paradigm that trains machine learning (ML) models on clients' devices in a distributed manner without the need of transmitting clients' data to the FL server. In many applications of ML, the labels of training data need to be generated manually by human agents. In this paper, we study FL with crowdsourced data labeling where the local data of each participating client of FL are labeled manually by the client. We consider the strategic behavior of clients who may not make desired effort in their local data labeling and local model computation and may misreport their local models to the FL server. We characterize the performance bounds on the training loss as a function of clients' data labeling effort, local computation effort, and reported local models. We devise truthful incentive mechanisms which incentivize strategic clients to make truthful efforts and report true local models to the server. The truthful design exploits the non-trivial dependence of the training loss on clients' efforts and local models. Under the truthful mechanisms, we characterize the server's optimal local computation effort assignments. We evaluate the proposed FL algorithms with crowdsourced data labeling and the incentive mechanisms using experiments.
翻译:联邦学习(FL)作为一种有前景的范式,允许在客户端设备上以分布式方式训练机器学习(ML)模型,而无需将客户端数据传输至FL服务器。在众多ML应用中,训练数据的标签需要由人工代理手动生成。本文研究了采用众包数据标注的联邦学习,其中每个参与FL的客户端的本地数据由客户端手动标注。我们考虑了客户端的策略性行为:他们可能未在本地数据标注和本地模型计算中投入所需的努力,且可能向FL服务器虚报其本地模型。我们刻画了训练损失作为客户端数据标注努力、本地计算努力及上报本地模型函数的性能边界。我们设计了诚实的激励机制,激励策略性客户端付出真实努力并向服务器上报真实的本地模型。该诚实机制的设计利用了训练损失对客户端努力程度及本地模型的非平凡依赖性。在诚实机制下,我们刻画了服务器的最优本地计算努力分配方案。通过实验,我们对所提出的基于众包数据标注的联邦学习算法及激励机制进行了评估。