Despite the successes of deep learning in computer vision, difficulties persist in recognizing objects that have undergone group-symmetric transformations rarely seen during training$\unicode{x2013}$for example objects seen in unusual poses, scales, positions, or combinations thereof. Equivariant neural networks are a solution to the problem of generalizing across symmetric transformations, but require knowledge of transformations a priori. An alternative family of architectures proposes to learn equivariant operators in a latent space, from examples of symmetric transformations. Here, using simple datasets of rotated and translated noisy MNIST, we illustrate how such architectures can successfully be harnessed for out-of-distribution classification, thus overcoming the limitations of both traditional and equivariant networks. While conceptually enticing, we discuss challenges ahead on the path of scaling these architectures to more complex datasets.
翻译:尽管深度学习在计算机视觉领域取得了成功,但在识别经历过训练中罕见群对称变换的物体时仍存在困难——例如以非常见姿态、尺度、位置或其组合出现的物体。等变神经网络是解决跨对称变换泛化问题的一种方案,但需要先验了解变换类型。另一类架构体系提出在潜在空间中从对称变换的示例中学习等变算子。本文通过使用旋转和平移的噪声MNIST简单数据集,展示了此类架构如何成功应用于分布外分类任务,从而克服传统网络和等变网络的局限性。尽管概念上颇具吸引力,我们仍讨论了在将这些架构扩展到更复杂数据集的道路上即将面临的挑战。