Backdoor attacks on federated learning (FL) are most often evaluated with synthetic corner patches or out-of-distribution (OOD) patterns that are unlikely to arise in practice. In this paper, we revisit the backdoor threat to standard FL (a single global model) under a more realistic setting where triggers must be semantically meaningful, in-distribution, and visually plausible. We propose SABLE, a Semantics-Aware Backdoor for LEarning in federated settings, which constructs natural, content-consistent triggers (e.g., semantic attribute changes such as sunglasses) and optimizes an aggregation-aware malicious objective with feature separation and parameter regularization to keep attacker updates close to benign ones. We instantiate SABLE on CelebA hair-color classification and the German Traffic Sign Recognition Benchmark (GTSRB), poisoning only a small, interpretable subset of each malicious client's local data while otherwise following the standard FL protocol. Across heterogeneous client partitions and multiple aggregation rules (FedAvg, Trimmed Mean, MultiKrum, and FLAME), our semantics-driven triggers achieve high targeted attack success rates while preserving benign test accuracy. These results show that semantics-aligned backdoors remain a potent and practical threat in federated learning, and that robustness claims based solely on synthetic patch triggers can be overly optimistic.
翻译:联邦学习(FL)中的后门攻击通常采用合成的角块补丁或分布外(OOD)模式进行评估,而这些在现实中不太可能出现。本文在更现实的场景下重新审视了标准联邦学习(单一全局模型)所面临的后门威胁,其中触发器必须具有语义意义、属于分布内且视觉上合理。我们提出SABLE,一种面向联邦学习环境的语义感知后门攻击方案,该方案构建自然且内容一致的触发器(例如,太阳镜等语义属性变化),并通过特征分离和参数正则化优化聚合感知的恶意目标,使攻击者的更新接近良性更新。我们在CelebA发色分类任务和德国交通标志识别基准(GTSRB)上实例化SABLE,仅污染每个恶意客户端本地数据中一个较小的、可解释的子集,同时遵循标准FL协议。在不同客户端分区和多种聚合规则(FedAvg、Trimmed Mean、MultiKrum和FLAME)下,我们的语义驱动触发器实现了高目标攻击成功率,同时保持了良性测试准确率。这些结果表明,语义对齐的后门在联邦学习中仍然是一种强大且实际的威胁,而仅基于合成补丁触发器的鲁棒性声明可能过于乐观。