Monocular depth estimation (MDE) has advanced significantly, primarily through the integration of convolutional neural networks (CNNs) and more recently, Transformers. However, concerns about their susceptibility to adversarial attacks have emerged, especially in safety-critical domains like autonomous driving and robotic navigation. Existing approaches for assessing CNN-based depth prediction methods have fallen short in inducing comprehensive disruptions to the vision system, often limited to specific local areas. In this paper, we introduce SSAP (Shape-Sensitive Adversarial Patch), a novel approach designed to comprehensively disrupt monocular depth estimation (MDE) in autonomous navigation applications. Our patch is crafted to selectively undermine MDE in two distinct ways: by distorting estimated distances or by creating the illusion of an object disappearing from the system's perspective. Notably, our patch is shape-sensitive, meaning it considers the specific shape and scale of the target object, thereby extending its influence beyond immediate proximity. Furthermore, our patch is trained to effectively address different scales and distances from the camera. Experimental results demonstrate that our approach induces a mean depth estimation error surpassing 0.5, impacting up to 99% of the targeted region for CNN-based MDE models. Additionally, we investigate the vulnerability of Transformer-based MDE models to patch-based attacks, revealing that SSAP yields a significant error of 0.59 and exerts substantial influence over 99% of the target region on these models.
翻译:单目深度估计(MDE)通过卷积神经网络(CNN)及近年兴起的Transformer集成已取得显著进展。然而,其在安全关键领域(如自动驾驶与机器人导航)中对抗攻击的脆弱性引发担忧。现有针对CNN深度预测方法的评估手段往往仅对视觉系统造成局部区域破坏,难以实现全面干扰。本文提出SSAP(形状敏感对抗补丁)——一种旨在全面破坏自主导航应用中单目深度估计的新方法。该补丁通过两种破坏模式选择性削弱MDE:扭曲估计距离或制造目标物体从系统视角中消失的幻觉。值得注意的是,SSAP具备形状敏感性特征,可结合目标对象的特定形状与尺寸扩展其影响范围至近邻区域之外。此外,该补丁经训练后可有效应对相机视角下的不同尺度与距离。实验结果表明,针对基于CNN的MDE模型,本方法诱导的平均深度估计误差超过0.5,影响范围覆盖目标区域99%以上。同时,我们进一步探究了基于Transformer的MDE模型对补丁攻击的脆弱性,证明SSAP可使此类模型产生0.59的显著误差,并对超过99%的目标区域产生实质性影响。