Real-world geometry and 3D vision tasks are replete with challenging symmetries that defy tractable analytical expression. In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. Specifically, we regularize the latent space such that maps between encodings preserve a learned inner product and commute with a learned functional operator, in the same manner as rigid-body transformations commute with the Laplacian. This approach forms an effective backbone for self-supervised representation learning, and we demonstrate that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks designed to handle complex, nonlinear symmetries. Furthermore, isometric maps capture information about the respective transformations in world space, and we show that this allows us to regress camera poses directly from the coefficients of the maps between encodings of adjacent views of a scene.
翻译:现实世界的几何与三维视觉任务充满了难以用可处理的解析表达式描述的挑战性对称性。本文提出神经等距映射——一种自编码器框架,该框架学习将观测空间映射到通用潜空间,其中当观测数据在世界空间中具有几何关联时,其对应的编码通过等距变换相互关联。具体而言,我们通过正则化约束潜空间,使得编码间的映射保持一个学习得到的内积,并与一个学习得到的函数算子可交换,其方式类似于刚体变换与拉普拉斯算子的可交换性。该方法构成了自监督表示学习的有效骨干框架,我们证明在预训练的潜空间中运行简单的现成等变网络,其性能可与精心设计、专门处理复杂非线性对称性的手工网络相媲美。此外,等距映射捕获了关于世界空间中相应变换的信息,我们证明这使我们能够直接从相邻场景视图编码间映射的系数回归出相机位姿。