Viewpoint invariance remains challenging for visual recognition in the 3D world, as altering the viewing directions can significantly impact predictions for the same object. While substantial efforts have been dedicated to making neural networks invariant to 2D image translations and rotations, viewpoint invariance is rarely investigated. Motivated by the success of adversarial training in enhancing model robustness, we propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers. Regarding viewpoint transformation as an attack, we formulate VIAT as a minimax optimization problem, where the inner maximization characterizes diverse adversarial viewpoints by learning a Gaussian mixture distribution based on the proposed attack method GMVFool. The outer minimization obtains a viewpoint-invariant classifier by minimizing the expected loss over the worst-case viewpoint distributions that can share the same one for different objects within the same category. Based on GMVFool, we contribute a large-scale dataset called ImageNet-V+ to benchmark viewpoint robustness. Experimental results show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool. Furthermore, we propose ViewRS, a certified viewpoint robustness method that provides a certified radius and accuracy to demonstrate the effectiveness of VIAT from the theoretical perspective.
翻译:在三维世界中,视角不变性仍然是视觉识别的挑战,因为改变观察方向会显著影响同一物体的预测结果。尽管已有大量研究致力于使神经网络对二维图像平移和旋转具有不变性,但视角不变性却鲜有探讨。受对抗训练在增强模型鲁棒性方面取得成功的启发,我们提出视角不变对抗训练(VIAT)以提升图像分类器的视角鲁棒性。将视角变换视为攻击,我们将VIAT形式化为一个极小极大优化问题:内层最大化通过基于所提出的攻击方法GMVFool学习高斯混合分布来刻画多样化的对抗视角;外层最小化则通过最小化最坏情况视角分布下的期望损失来获得视角不变分类器,该分布对于同一类别中的不同物体可保持一致。基于GMVFool,我们构建了名为ImageNet-V+的大规模数据集以基准测试视角鲁棒性。实验结果表明,基于GMVFool生成的对抗视角多样性,VIAT显著提升了多种图像分类器的视角鲁棒性。此外,我们提出经认证的视角鲁棒方法ViewRS,通过提供经认证的半径和精度,从理论角度证明了VIAT的有效性。