Monocular Depth Estimation (MDE) is a critical component in applications such as autonomous driving. There are various attacks against MDE networks. These attacks, especially the physical ones, pose a great threat to the security of such systems. Traditional adversarial training method requires ground-truth labels hence cannot be directly applied to self-supervised MDE that does not have ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) ignore the domain knowledge of MDE and can hardly achieve optimal performance. In this work, we propose a novel adversarial training method for self-supervised MDE models based on view synthesis without using ground-truth depth. We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training. We compare our method with supervised learning based and contrastive learning based methods that are tailored for MDE. Results on two representative MDE networks show that we achieve better robustness against various adversarial attacks with nearly no benign performance degradation.
翻译:单目深度估计(MDE)是自动驾驶等应用中的关键组成部分。针对MDE网络存在多种攻击手段,尤其是物理世界攻击对系统安全性构成重大威胁。传统对抗训练方法需要真实标签,因此无法直接应用于缺乏真实深度标签的自监督MDE任务。现有的自监督模型加固技术(如对比学习)忽视了MDE的领域知识,难以达到最优性能。本文提出一种基于视图合成的自监督MDE对抗训练新方法,无需使用真实深度标签。通过引入L0范数有界扰动训练,我们提升了模型对物理世界攻击的鲁棒性。将所提方法与面向MDE的监督学习及对比学习方法进行对比,在两种代表性MDE网络上的实验结果表明,本方法在几乎不损失正常性能的前提下,实现了对多种对抗攻击的更优鲁棒性。