Personalized learning is a proposed approach to address the problem of data heterogeneity in collaborative machine learning. In a decentralized setting, the two main challenges of personalization are client clustering and data privacy. In this paper, we address these challenges by developing P4 (Personalized Private Peer-to-Peer) a method that ensures that each client receives a personalized model while maintaining differential privacy guarantee of each client's local dataset during and after the training. Our approach includes the design of a lightweight algorithm to identify similar clients and group them in a private, peer-to-peer (P2P) manner. Once grouped, we develop differentially-private knowledge distillation for clients to co-train with minimal impact on accuracy. We evaluate our proposed method on three benchmark datasets (FEMNIST or Federated EMNIST, CIFAR-10 and CIFAR-100) and two different neural network architectures (Linear and CNN-based networks) across a range of privacy parameters. The results demonstrate the potential of P4, as it outperforms the state-of-the-art of differential private P2P by up to 40 percent in terms of accuracy. We also show the practicality of P4 by implementing it on resource constrained devices, and validating that it has minimal overhead, e.g., about 7 seconds to run collaborative training between two clients.
翻译:个性化学习是解决协作机器学习中数据异构性问题的一种被提出的方法。在去中心化环境中,个性化面临的两个主要挑战是客户端聚类与数据隐私。本文通过开发P4(个性化私有对等学习)方法应对这些挑战,该方法确保每个客户端获得个性化模型的同时,在训练期间及训练后均能维持各客户端本地数据集的差分隐私保障。我们的方法包含设计一种轻量级算法,以私有化的对等(P2P)方式识别相似客户端并将其分组。分组完成后,我们开发了差分隐私知识蒸馏技术,使客户端能够以对准确性影响最小的方式进行协同训练。我们在三个基准数据集(FEMNIST或称联邦化EMNIST、CIFAR-10与CIFAR-100)和两种不同神经网络架构(线性网络与基于CNN的网络)上,针对一系列隐私参数评估了所提出的方法。实验结果表明P4具有显著潜力,其在准确性方面超越当前最优的差分隐私P2P方法达40%。我们还通过在资源受限设备上部署P4验证了其实用性,证明其具有极低的开销,例如两个客户端间运行协作训练仅需约7秒。