Despite the successes of deep learning in computer vision, difficulties persist in recognizing objects that have undergone group-symmetric transformations rarely seen during training-for example objects seen in unusual poses, scales, positions, or combinations thereof. Equivariant neural networks are a solution to the problem of generalizing across symmetric transformations, but require knowledge of transformations a priori. An alternative family of architectures proposes to earn equivariant operators in a latent space from examples of symmetric transformations. Here, using simple datasets of rotated and translated noisy MNIST, we illustrate how such architectures can successfully be harnessed for out-of-distribution classification, thus overcoming the limitations of both traditional and equivariant networks. While conceptually enticing, we discuss challenges ahead on the path of scaling these architectures to more complex datasets.
翻译:尽管深度学习在计算机视觉领域取得了成功,但在识别经历过训练中罕见群对称变换的物体时仍存在困难——例如以异常姿态、尺度、位置或其组合出现的物体。等变神经网络是解决跨对称变换泛化问题的一种方案,但需要先验了解变换类型。另一类架构提出从对称变换的示例中学习潜在空间的等变算子。本文通过使用旋转和平移噪声MNIST的简单数据集,阐释了此类架构如何能成功应用于分布外分类,从而克服传统网络与等变网络的局限性。尽管概念上颇具吸引力,我们仍讨论了在将这些架构扩展到更复杂数据集的道路上即将面临的挑战。