Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Semantic segmentation models are widely deployed in safety-critical applications such as autonomous driving, yet their vulnerability to backdoor attacks remains largely underexplored. Prior segmentation backdoor studies transfer threat settings from existing image classification tasks, focusing primarily on object-to-background mis-segmentation. In this work, we revisit the threats by systematically examining backdoor attacks tailored to semantic segmentation. We identify four coarse-grained attack vectors (Object-to-Object, Object-to-Background, Background-to-Object, and Background-to-Background attacks), as well as two fine-grained vectors (Instance-Level and Conditional attacks). To formalize these attacks, we introduce BADSEG, a unified framework that optimizes trigger designs and applies label manipulation strategies to maximize attack performance while preserving victim model utility. Extensive experiments across diverse segmentation architectures on benchmark datasets demonstrate that BADSEG achieves high attack effectiveness with minimal impact on clean samples. We further evaluate six representative defenses and find that they fail to reliably mitigate our attacks, revealing critical gaps in current defenses. Finally, we demonstrate that these vulnerabilities persist in recent emerging architectures, including transformer-based networks and the Segment Anything Model (SAM), thereby compromising their security. Our work reveals previously overlooked security vulnerabilities in semantic segmentation, and motivates the development of defenses tailored to segmentation-specific threat models.

翻译：语义分割模型广泛应用于自动驾驶等安全关键领域，但其对后门攻击的脆弱性仍未得到充分探索。现有分割后门研究主要借鉴图像分类任务的威胁设定，重点关注物体到背景的错误分割。本研究通过系统性地考察针对语义分割定制的后门攻击，重新审视相关威胁。我们识别出四种粗粒度攻击向量（物体到物体、物体到背景、背景到物体、背景到背景攻击）以及两种细粒度向量（实例级与条件攻击）。为形式化这些攻击，我们提出BADSEG统一框架，该框架通过优化触发器设计并应用标签操纵策略，在保持受害模型效用的同时最大化攻击性能。在基准数据集上对多种分割架构的广泛实验表明，BADSEG能以对干净样本最小的影响实现高攻击成功率。我们进一步评估六种代表性防御方法，发现它们均无法可靠缓解我们的攻击，揭示了当前防御体系的关键缺陷。最后，我们证明这些脆弱性在近期新兴架构中持续存在，包括基于Transformer的网络和Segment Anything Model（SAM），从而危及这些模型的安全性。本研究揭示了语义分割中先前被忽视的安全漏洞，并推动针对分割特定威胁模型的防御机制发展。