Federated learning (FL) is a framework for training machine learning models in a distributed and collaborative manner. During training, a set of participating clients process their data stored locally, sharing only the model updates obtained by minimizing a cost function over their local inputs. FL was proposed as a stepping-stone towards privacy-preserving machine learning, but it has been shown vulnerable to issues such as leakage of private information, lack of personalization of the model, and the possibility of having a trained model that is fairer to some groups than to others. In this paper, we address the triadic interaction among personalization, privacy guarantees, and fairness attained by models trained within the FL framework. Differential privacy and its variants have been studied and applied as cutting-edge standards for providing formal privacy guarantees. However, clients in FL often hold very diverse datasets representing heterogeneous communities, making it important to protect their sensitive information while still ensuring that the trained model upholds the aspect of fairness for the users. To attain this objective, a method is put forth that introduces group privacy assurances through the utilization of $d$-privacy (aka metric privacy). $d$-privacy represents a localized form of differential privacy that relies on a metric-oriented obfuscation approach to maintain the original data's topological distribution. This method, besides enabling personalized model training in a federated approach and providing formal privacy guarantees, possesses significantly better group fairness measured under a variety of standard metrics than a global model trained within a classical FL template. Theoretical justifications for the applicability are provided, as well as experimental validation on real-world datasets to illustrate the working of the proposed method.
翻译:联邦学习(FL)是一种以分布式协作方式训练机器学习模型的框架。在训练过程中,一组参与客户端处理本地存储的数据,仅共享通过最小化局部输入上代价函数所获得的模型更新。联邦学习最初被提出作为迈向隐私保护机器学习的一步,但研究表明其存在若干问题,例如隐私信息泄露、模型缺乏个性化,以及训练出的模型可能对不同群体存在不公平差异。本文探讨了联邦学习框架内训练模型在个性化、隐私保障和公平性三者之间的交互关系。差分隐私及其变体作为提供正式隐私保障的前沿标准已被广泛研究和应用。然而,联邦学习中的客户端通常持有代表异构群体的多样化数据集,这要求在保护其敏感信息的同时,确保训练模型维护对用户的公平性。为实现这一目标,本文提出了一种方法,通过利用$d$-隐私(又称度量隐私)引入群体隐私保障。$d$-隐私是一种依赖基于度量的混淆方法以保持原始数据拓扑分布的局部化差分隐私形式。该方法不仅能够以联邦方式进行个性化模型训练并提供正式的隐私保障,而且在多种标准指标下,其群体公平性显著优于经典联邦学习框架下训练的全局模型。本文提供了该方法适用性的理论依据,并在真实数据集上进行了实验验证,以说明所提出方法的工作原理。