Federated learning (FL) is vulnerable to data poisoning attacks due to its distributed nature. Although recent GAN-based data poisoning methods have indicated the potential of using generative AI to generate seemingly legitimate poisoned data, the inherent consistency of GAN outputs can still reveal a sign of data poisoning. In this paper, we propose a diffusion-based data poisoning framework against FL systems, which leverages a Poisoning-Oriented Conditional Diffusion Model (PCDM) to enable fine-grained control over the local generation of poisoned data while ensuring both attack effectiveness and stealthiness. Our PCDM incorporates an adjustable poisoning vector within the global context to precisely control the generation of poisoned data, with theoretical guarantees on attack performance. Furthermore, it employs a novel jumping diffusion strategy for lightweight and efficient poisoned data generation. We conduct the most systematic and broad experimental evaluation for FL poisoning attacks against various defenses, including advanced Byzantine robust aggregation mechanisms, on four open datasets: MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and a real-world wireless-specific dataset VRAI. Our results demonstrate that PCDM is less likely to exhibit statistical anomalies compared with the state-of-the-art methods while more effectively degrading global FL performance, which poses a significant risk to data security in FL.
翻译:联邦学习因其分布式特性而易受数据投毒攻击。尽管近期基于GAN的数据投毒方法已表明利用生成式AI可生成看似合法的投毒数据,但GAN输出固有的连贯性仍可能暴露投毒迹象。本文提出一种针对联邦学习系统的基于扩散的数据投毒框架,利用面向投毒的条件扩散模型(PCDM)实现对本地投毒数据生成的细粒度控制,同时兼顾攻击有效性与隐蔽性。我们的PCDM在全局上下文中引入可调投毒向量,精准控制投毒数据的生成,并具有攻击性能的理论保障。此外,该模型采用新型跳跃扩散策略,实现轻量高效的投毒数据生成。我们针对多种防御机制(包括先进拜占庭鲁棒聚合机制)进行了联邦学习投毒攻击领域最系统、最广泛的实验评估,覆盖四个公开数据集(MNIST、Fashion-MNIST、CIFAR-10、CIFAR-100)及真实无线专用数据集VRAI。结果表明,与现有最优方法相比,PCDM不仅更不易呈现统计异常,还能更有效地降低全局联邦学习性能,对联邦学习中的数据安全构成重大威胁。