In recent years, computer vision has made remarkable advancements in autonomous driving and robotics. However, it has been observed that deep learning-based visual perception models lack robustness when faced with camera motion perturbations. The current certification process for assessing robustness is costly and time-consuming due to the extensive number of image projections required for Monte Carlo sampling in the 3D camera motion space. To address these challenges, we present a novel, efficient, and practical framework for certifying the robustness of 3D-2D projective transformations against camera motion perturbations. Our approach leverages a smoothing distribution over the 2D pixel space instead of in the 3D physical space, eliminating the need for costly camera motion sampling and significantly enhancing the efficiency of robustness certifications. With the pixel-wise smoothed classifier, we are able to fully upper bound the projection errors using a technique of uniform partitioning in camera motion space. Additionally, we extend our certification framework to a more general scenario where only a single-frame point cloud is required in the projection oracle. This is achieved by deriving Lipschitz-based approximated partition intervals. Through extensive experimentation, we validate the trade-off between effectiveness and efficiency enabled by our proposed method. Remarkably, our approach achieves approximately 80% certified accuracy while utilizing only 30% of the projected image frames.
翻译:近年来,计算机视觉在自动驾驶和机器人领域取得了显著进展。然而,研究发现基于深度学习的视觉感知模型在面对相机运动扰动时缺乏鲁棒性。当前评估鲁棒性的认证过程成本高昂且耗时,因为需要在三维相机运动空间中进行蒙特卡洛采样,生成大量图像投影。针对这些挑战,我们提出了一种新颖、高效且实用的框架,用于认证三维-二维投影变换对相机运动扰动的鲁棒性。该方法利用二维像素空间上的平滑分布,替代三维物理空间中的处理,从而消除昂贵的相机运动采样需求,显著提升鲁棒性认证效率。借助逐像素平滑分类器,我们通过相机运动空间的均匀划分技术,能够完全上界投影误差。此外,我们将认证框架扩展至更通用的场景,其中投影查询仅需单帧点云数据,这通过推导基于Lipschitz的近似划分区间实现。大量实验验证了所提方法在有效性与效率之间的平衡效果。值得注意的是,我们的方法在仅使用30%投影图像帧的情况下,即可达到约80%的认证准确率。