Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.
翻译:计算机视觉研究长期以来致力于构建对自然数据中空间变换具有鲁棒性的系统。传统方法通过数据增强或架构中硬编码不变性来实现这一目标。然而,不变性过强或不足都可能损害性能,且正确的不变性程度无法先验获知,且取决于具体实例。理想情况下,应通过数据学习合适的不变性,并在测试时进行推断。我们将不变性视为预测问题:给定任意图像,我们使用归一化流预测变换分布,并对其预测结果进行平均。由于该分布仅依赖于实例本身,我们可在分类前对齐实例,并跨类别泛化不变性。同一分布还可用于适应分布外的姿态。该归一化流以端到端方式训练,能够学习比Augerino和InstaAug大得多的变换范围。当用作数据增强时,我们的方法在CIFAR-10、CIFAR10-LT和TinyImageNet数据集上展现出精度与鲁棒性的提升。