Task-Agnostic Privacy-Preserving Representation Learning for Federated Learning Against Attribute Inference Attacks

Federated learning (FL) has been widely studied recently due to its property to collaboratively train data from different devices without sharing the raw data. Nevertheless, recent studies show that an adversary can still be possible to infer private information about devices' data, e.g., sensitive attributes such as income, race, and sexual orientation. To mitigate the attribute inference attacks, various existing privacy-preserving FL methods can be adopted/adapted. However, all these existing methods have key limitations: they need to know the FL task in advance, or have intolerable computational overheads or utility losses, or do not have provable privacy guarantees. We address these issues and design a task-agnostic privacy-preserving presentation learning method for FL ({\bf TAPPFL}) against attribute inference attacks. TAPPFL is formulated via information theory. Specifically, TAPPFL has two mutual information goals, where one goal learns task-agnostic data representations that contain the least information about the private attribute in each device's data, and the other goal ensures the learnt data representations include as much information as possible about the device data to maintain FL utility. We also derive privacy guarantees of TAPPFL against worst-case attribute inference attacks, as well as the inherent tradeoff between utility preservation and privacy protection. Extensive results on multiple datasets and applications validate the effectiveness of TAPPFL to protect data privacy, maintain the FL utility, and be efficient as well. Experimental results also show that TAPPFL outperforms the existing defenses\footnote{Source code and full version: \url{https://github.com/TAPPFL}}.

翻译：联邦学习因其能在不共享原始数据的情况下协作训练来自不同设备的数据而近年来被广泛研究。然而，近期研究表明攻击者仍可能推断设备数据的隐私信息，例如收入、种族和性取向等敏感属性。为缓解属性推断攻击，现有多种隐私保护联邦学习可被采用或适配。但所有这些现有方法均存在关键局限性：需要预先知晓联邦学习任务、计算开销或效用损失难以容忍、或缺乏可证明的隐私保证。我们针对这些问题设计了一种面向联邦学习的任务无关隐私保护表示学习方法（{\bf TAPPFL}）以抵御属性推断攻击。TAPPFL基于信息论构建，具体包含两个互信息目标：一个目标学习包含各设备数据中最少隐私属性信息的任务无关数据表示，另一个目标确保学习到的数据表示尽可能多地保留设备数据信息以维持联邦学习效用。我们还推导了TAPPFL针对最坏情况属性推断攻击的隐私保证，以及效用保持与隐私保护之间的内在权衡。在多个数据集与应用上的广泛结果验证了TAPPFL在保护数据隐私、维持联邦学习效用及保持高效性方面的有效性。实验结果表明TAPPFL优于现有防御方法\footnote{源代码与完整版本：\url{https://github.com/TAPPFL}}。