Learning invariant representations has been the longstanding approach to self-supervised learning. However, recently progress has been made in preserving equivariant properties in representations, yet do so with highly prescribed architectures. In this work, we propose an invariant-equivariant self-supervised architecture that employs Capsule Networks (CapsNets) which have been shown to capture equivariance with respect to novel viewpoints. We demonstrate that the use of CapsNets in equivariant self-supervised architectures achieves improved downstream performance on equivariant tasks with higher efficiency and fewer network parameters. To accommodate the architectural changes of CapsNets, we introduce a new objective function based on entropy minimisation. This approach which we name CapsIE (Capsule Invariant Equivariant Network) achieves state-of-the-art performance on the equivariant rotation tasks on the 3DIEBench dataset compared to prior equivariant SSL methods, while performing competitively against supervised counterparts. Our results demonstrate the ability of CapsNets to learn complex and generalised representations for large-scale, multi-task datasets compared to previous CapsNet benchmarks. Code is available at https://github.com/AberdeenML/CapsIE.
翻译:学习不变表示一直是自监督学习的长期方法。然而,最近在保持表示的等变特性方面取得了进展,但需要使用高度预设的架构。在本工作中,我们提出了一种采用胶囊网络(CapsNets)的不变-等变自监督架构,该网络已被证明能够捕捉相对于新视角的等变性。我们证明,在等变自监督架构中使用CapsNets可以在等变任务上实现更高的下游性能,同时具有更高的效率和更少的网络参数。为适应CapsNets的架构变化,我们引入了一种基于熵最小化的新目标函数。我们将这种方法命名为CapsIE(胶囊不变等变网络),与先前的等变SSL方法相比,在3DIEBench数据集的等变旋转任务上实现了最先进的性能,同时与监督方法相比具有竞争力。我们的结果表明,与之前的CapsNet基准相比,CapsNets能够为大规模、多任务数据集学习复杂且泛化的表示。代码可在https://github.com/AberdeenML/CapsIE获取。