Monocular Depth Estimation (MDE) is a critical component in applications such as autonomous driving. There are various attacks against MDE networks. These attacks, especially the physical ones, pose a great threat to the security of such systems. Traditional adversarial training method requires ground-truth labels hence cannot be directly applied to self-supervised MDE that does not have ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) ignore the domain knowledge of MDE and can hardly achieve optimal performance. In this work, we propose a novel adversarial training method for self-supervised MDE models based on view synthesis without using ground-truth depth. We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training. We compare our method with supervised learning based and contrastive learning based methods that are tailored for MDE. Results on two representative MDE networks show that we achieve better robustness against various adversarial attacks with nearly no benign performance degradation.
翻译:单目深度估计(MDE)是自动驾驶等应用中的关键组成部分。针对MDE网络存在多种攻击,尤其是物理攻击,对这类系统的安全性构成巨大威胁。传统对抗训练方法需要真实标注,因此无法直接应用于无真实深度标注的自监督MDE。部分自监督模型加固技术(如对比学习)忽视了MDE的领域知识,难以达到最优性能。本文提出一种基于视图合成的自监督MDE模型新型对抗训练方法,无需使用真实深度。通过引入L0范数有界扰动进行训练,我们提升了模型对物理世界攻击的对抗鲁棒性。将本方法与针对MDE设计的监督学习和对比学习方法进行比较,在两种代表性MDE网络上的结果表明,我们能在几乎不降低良性性能的情况下,实现对多种对抗攻击更优的鲁棒性。