CNNs exhibit inherent equivariance to image translation, leading to efficient parameter and data usage, faster learning, and improved robustness. The concept of translation equivariant networks has been successfully extended to rotation transformation using group convolution for discrete rotation groups and harmonic functions for the continuous rotation group encompassing $360^\circ$. We explore the compatibility of the SA mechanism with full rotation equivariance, in contrast to previous studies that focused on discrete rotation. We introduce the Harmformer, a harmonic transformer with a convolutional stem that achieves equivariance for both translation and continuous rotation. Accompanied by an end-to-end equivariance proof, the Harmformer not only outperforms previous equivariant transformers, but also demonstrates inherent stability under any continuous rotation, even without seeing rotated samples during training.
翻译:卷积神经网络(CNN)天生具有图像平移等变性,这带来了参数和数据的高效利用、更快的学习速度以及更强的鲁棒性。平移等变网络的概念已通过群卷积(针对离散旋转群)和调和函数(针对包含$360^\circ$的连续旋转群)成功扩展至旋转变换。与先前专注于离散旋转的研究不同,我们探索了自注意力机制与完全旋转等变性的兼容性。我们提出了Harmformer——一种具有卷积基干的谐波Transformer,能够同时实现平移和连续旋转的等变性。结合端到端的等变性证明,Harmformer不仅性能优于以往的等变Transformer,而且即使在没有见过旋转样本的训练条件下,仍表现出对任意连续旋转的内在稳定性。