Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.
翻译:计算机视觉研究长期以来致力于构建对自然数据中空间变换具有鲁棒性的系统。传统上,这通过数据增强或硬编码不变性到架构中来实现。然而,过多或过少的不变性都可能损害性能,且正确的不变量是先验未知的,并依赖于具体实例。理想情况下,适当的不变性应从数据中学习并在测试时推断。我们将不变性视为一个预测问题。对于任意给定图像,我们使用归一化流预测变换上的分布,并对其预测进行平均。由于该分布仅依赖于实例,我们可以在分类前对齐实例,并跨类泛化不变性。相同的分布还可用于适应分布外姿态。这种归一化流通过端到端训练,能够学习比Augerino和InstaAug更广范围的变换。当用作数据增强时,我们的方法在CIFAR-10、CIFAR10-LT和TinyImageNet上展现了准确率和鲁棒性的提升。