Convolutional Neural Networks (CNN) offer state of the art performance in various computer vision tasks. Many of those tasks require different subtypes of affine invariances (scale, rotational, translational) to image transformations. Convolutional layers are translation equivariant by design, but in their basic form lack invariances. In this work we investigate how best to include rotational invariance in a CNN for image classification. Our experiments show that networks trained with data augmentation alone can classify rotated images nearly as well as in the normal unrotated case; this increase in representational power comes only at the cost of training time. We also compare data augmentation versus two modified CNN models for achieving rotational invariance or equivariance, Spatial Transformer Networks and Group Equivariant CNNs, finding no significant accuracy increase with these specialized methods. In the case of data augmented networks, we also analyze which layers help the network to encode the rotational invariance, which is important for understanding its limitations and how to best retrain a network with data augmentation to achieve invariance to rotation.
翻译:卷积神经网络(CNN)在各类计算机视觉任务中展现出最先进的性能。其中许多任务需要针对图像变换的不同子类型仿射不变性(尺度、旋转、平移)。卷积层虽在设计上天然具有平移等变性,但其基本形式缺乏不变性。本文探究如何在图像分类CNN中最优地引入旋转不变性。实验表明,仅通过数据增强训练的网络对旋转图像的分类能力几乎与标准未旋转情况相当,这种表征能力的提升仅以训练时间为代价。此外,我们将数据增强与两种为达成旋转不变性或等变性而改进的CNN模型(空间变换网络和群等变CNN)进行对比,发现这些专用方法并未带来显著的精度提升。针对数据增强网络,我们还分析了哪些层级有助于网络编码旋转不变性,这对理解其局限性以及如何通过数据增强最优地重新训练网络以达成旋转不变性具有重要意义。