In the evolving landscape of machine learning (ML), Federated Learning (FL) presents a paradigm shift towards decentralized model training while preserving user data privacy. This paper introduces the concept of ``privacy drift", an innovative framework that parallels the well-known phenomenon of concept drift. While concept drift addresses the variability in model accuracy over time due to changes in the data, privacy drift encapsulates the variation in the leakage of private information as models undergo incremental training. By defining and examining privacy drift, this study aims to unveil the nuanced relationship between the evolution of model performance and the integrity of data privacy. Through rigorous experimentation, we investigate the dynamics of privacy drift in FL systems, focusing on how model updates and data distribution shifts influence the susceptibility of models to privacy attacks, such as membership inference attacks (MIA). Our results highlight a complex interplay between model accuracy and privacy safeguards, revealing that enhancements in model performance can lead to increased privacy risks. We provide empirical evidence from experiments on customized datasets derived from CIFAR-100 (Canadian Institute for Advanced Research, 100 classes), showcasing the impact of data and concept drift on privacy. This work lays the groundwork for future research on privacy-aware machine learning, aiming to achieve a delicate balance between model accuracy and data privacy in decentralized environments.
翻译:在机器学习的演进格局中,联邦学习提出了一种去中心化模型训练的新范式,同时保护用户数据隐私。本文引入了"隐私漂移"的概念,这是一个与广为人知的概念漂移现象相平行的创新框架。概念漂移处理的是由于数据变化导致的模型准确性随时间变化的问题,而隐私漂移则捕捉了模型在增量训练过程中隐私信息泄露程度的变化。通过定义和考察隐私漂移,本研究旨在揭示模型性能演变与数据隐私完整性之间的微妙关系。通过严谨的实验,我们研究了联邦学习系统中隐私漂移的动态特性,重点关注模型更新和数据分布变化如何影响模型对隐私攻击(如成员推理攻击)的敏感性。我们的结果突显了模型准确性与隐私保护之间复杂的相互作用,表明模型性能的提升可能导致隐私风险的增加。我们基于CIFAR-100(加拿大高级研究所,100个类别)衍生的定制数据集提供了实验证据,展示了数据和概念漂移对隐私的影响。这项工作为未来隐私感知机器学习研究奠定了基础,旨在去中心化环境中实现模型准确性与数据隐私之间的微妙平衡。