APARATE: Adaptive Adversarial Patch for CNN-based Monocular Depth Estimation for Autonomous Navigation

In recent years, monocular depth estimation (MDE) has witnessed a substantial performance improvement due to convolutional neural networks (CNNs). However, CNNs are vulnerable to adversarial attacks, which pose serious concerns for safety-critical and security-sensitive systems. Specifically, adversarial attacks can have catastrophic impact on MDE given its importance for scene understanding in applications like autonomous driving and robotic navigation. To physically assess the vulnerability of CNN-based depth prediction methods, recent work tries to design adversarial patches against MDE. However, these methods are not powerful enough to fully fool the vision system in a systemically threatening manner. In fact, their impact is partial and locally limited; they mislead the depth prediction of only the overlapping region with the input image regardless of the target object size, shape and location. In this paper, we investigate MDE vulnerability to adversarial patches in a more comprehensive manner. We propose a novel adaptive adversarial patch (APARATE) that is able to selectively jeopardize MDE by either corrupting the estimated distance, or simply manifesting an object as disappeared for the autonomous system. Specifically, APARATE is optimized to be shape and scale-aware, and its impact adapts to the target object instead of being limited to the immediate neighborhood. Our proposed patch achieves more than $14~meters$ mean depth estimation error, with $99\%$ of the target region being affected. We believe this work highlights the threat of adversarial attacks in the context of MDE, and we hope it would alert the community to the real-life potential harm of this attack and motivate investigating more robust and adaptive defenses for autonomous robots.

翻译：近年来，单目深度估计（MDE）因卷积神经网络（CNN）的引入而取得显著性能提升。然而，CNN易受对抗攻击，这给安全关键与保密敏感系统带来严峻挑战。特别是，鉴于MDE在自动驾驶和机器人导航等场景理解任务中的重要性，对抗攻击可能对其造成灾难性影响。为物理评估基于CNN的深度预测方法的脆弱性，近期研究尝试设计针对MDE的对抗补丁。但这些方法不足以系统性地威胁视觉系统——其影响具有局部局限性，仅能误导输入图像中与补丁重叠区域的深度预测，且不随目标对象尺寸、形状和位置变化。本文以更全面的视角研究MDE对对抗补丁的脆弱性，提出一种新型自适应对抗补丁（APARATE），既能通过破坏估计距离选择性地干扰MDE，也可使自主系统中对象表现为消失状态。具体而言，APARATE经优化后具有形状与尺度感知能力，其影响可自适应作用于目标对象而非局限在邻近区域。所提补丁可实现超过14米的平均深度估计误差，并影响99%的目标区域。我们相信该工作凸显了MDE场景下对抗攻击的威胁，期望能警示学界关注此类攻击的现实危害，并推动研究更鲁棒的自适应防御机制以保护自主机器人。