Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift. Through a large-scale correlation analysis framework, we study shift invariance of CNNs by inspecting existing downsampling operators in terms of their maximum-sampling bias (MSB), and find that MSB is negatively correlated with shift invariance. Based on this crucial insight, we propose a learnable pooling operator called Translation Invariant Polyphase Sampling (TIPS) and two regularizations on the intermediate feature maps of TIPS to reduce MSB and learn translation-invariant representations. TIPS can be integrated into any CNN and can be trained end-to-end with marginal computational overhead. Our experiments demonstrate that TIPS results in consistent performance gains in terms of accuracy, shift consistency, and shift fidelity on multiple benchmarks for image classification and semantic segmentation compared to previous methods and also leads to improvements in adversarial and distributional robustness. TIPS results in the lowest MSB compared to all previous methods, thus explaining our strong empirical results.
翻译:下采样算子破坏了卷积神经网络(CNN)的平移不变性,这影响了CNN在处理微小像素级平移时所学习特征的鲁棒性。通过一个大规模相关性分析框架,我们通过检查现有下采样算子的最大采样偏差(MSB)来研究CNN的平移不变性,并发现MSB与平移不变性呈负相关。基于这一关键见解,我们提出了一种称为平移不变多相采样(TIPS)的可学习池化算子,并对TIPS的中间特征图施加两种正则化,以减少MSB并学习平移不变表示。TIPS可以集成到任何CNN中,并能以微小的计算开销进行端到端训练。我们的实验表明,与先前方法相比,TIPS在图像分类和语义分割的多个基准测试中,在准确率、平移一致性和平移保真度方面均带来了一致的性能提升,同时也提高了对抗鲁棒性和分布鲁棒性。与所有先前方法相比,TIPS实现了最低的MSB,从而解释了我们的强实证结果。