Monocular Depth Estimation (MDE) is a critical component in applications such as autonomous driving. There are various attacks against MDE networks. These attacks, especially the physical ones, pose a great threat to the security of such systems. Traditional adversarial training method requires ground-truth labels hence cannot be directly applied to self-supervised MDE that does not have ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) ignore the domain knowledge of MDE and can hardly achieve optimal performance. In this work, we propose a novel adversarial training method for self-supervised MDE models based on view synthesis without using ground-truth depth. We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training. We compare our method with supervised learning based and contrastive learning based methods that are tailored for MDE. Results on two representative MDE networks show that we achieve better robustness against various adversarial attacks with nearly no benign performance degradation.
翻译:单目深度估计(MDE)是自动驾驶等应用中的关键组成部分。目前存在多种针对MDE网络的攻击方法,尤其是物理攻击,对此类系统的安全性构成了巨大威胁。传统对抗训练方法需要真实标签,因此无法直接应用于缺乏真值深度的自监督MDE任务。部分自监督模型加固技术(如对比学习)忽略了MDE的领域知识,难以达到最优性能。本文提出一种基于视图合成的新型对抗训练方法,无需使用真值深度即可应用于自监督MDE模型。我们通过训练中引入L0范数有界扰动,提升了模型对物理世界攻击的对抗鲁棒性。与针对MDE的监督学习方法和对比学习方法相比,在两种代表性MDE网络上的实验结果表明,我们的方法在显著提升对抗鲁棒性的同时,几乎不会造成正常性能下降。