Federated learning security research has predominantly focused on backdoor threats from a minority of malicious clients that intentionally corrupt model updates. This paper challenges this paradigm by investigating a more pervasive and insidious threat: \textit{backdoor vulnerabilities from low-concentration poisoned data distributed across the datasets of benign clients.} This scenario is increasingly common in federated instruction tuning for language models, which often rely on unverified third-party and crowd-sourced data. We analyze two forms of backdoor data through real cases: 1) \textit{natural trigger (inherent features as implicit triggers)}; 2) \textit{adversary-injected trigger}. To analyze this threat, we model the backdoor implantation process from signal aggregation, proposing the Backdoor Signal-to-Noise Ratio to quantify the dynamics of the distributed backdoor signal. Extensive experiments reveal the severity of this threat: With just less than 10\% of training data poisoned and distributed across clients, the attack success rate exceeds 85\%, while the primary task performance remains largely intact. Critically, we demonstrate that state-of-the-art backdoor defenses, designed for attacks from malicious clients, are fundamentally ineffective against this threat. Our findings highlight an urgent need for new defense mechanisms tailored to the realities of modern, decentralized data ecosystems.
翻译:联邦学习安全研究主要集中于少数恶意客户端故意破坏模型更新所带来的后门威胁。本文挑战了这一范式,通过研究一种更为普遍且隐蔽的威胁:\textit{分布于良性客户端数据集中的低浓度污染数据所引发的后门漏洞}。这一场景在语言模型的联邦指令微调中日益常见,因其常依赖未经验证的第三方及众包数据。我们通过实际案例分析了两种后门数据形式:1) \textit{自然触发器(固有特征作为隐式触发器)};2) \textit{攻击者注入的触发器}。为分析此威胁,我们从信号聚合角度对后门植入过程进行建模,提出后门信噪比以量化分布式后门信号的动态特性。大量实验揭示了该威胁的严重性:仅需不足10%的训练数据被污染并分布于各客户端,攻击成功率即可超过85%,而主任务性能基本不受影响。关键的是,我们证明了针对恶意客户端攻击设计的最先进后门防御机制,对此类威胁从根本上无效。我们的研究结果强调,亟需针对现代去中心化数据生态系统的现实情况,设计新的防御机制。