Revolutionizing the field of deep learning, Transformer-based models have achieved remarkable performance in many tasks. Recent research has recognized these models are robust to shuffling but are limited to inter-token permutation in the forward propagation. In this work, we propose our definition of permutation equivariance, a broader concept covering both inter- and intra- token permutation in the forward and backward propagation of neural networks. We rigorously proved that such permutation equivariance property can be satisfied on most vanilla Transformer-based models with almost no adaptation. We examine the property over a range of state-of-the-art models including ViT, Bert, GPT, and others, with experimental validations. Further, as a proof-of-concept, we explore how real-world applications including privacy-enhancing split learning, and model authorization, could exploit the permutation equivariance property, which implicates wider, intriguing application scenarios.
翻译:革新深度学习领域的Transformer模型已在众多任务中展现出卓越性能。近期研究认识到这些模型具有抗混洗特性,但仅局限于前向传播中的词元间置换。本文提出置换等变性的定义,这是一个涵盖神经网络前向与反向传播过程中词元间与词元内置换的广义概念。我们严格证明了在大多数标准Transformer模型上,该置换等变性几乎无需调整即可实现。通过实验验证,我们在包括ViT、BERT、GPT等多项前沿模型中检验了这一特性。进一步地,作为概念验证,我们探索了隐私增强的分割学习与模型授权等实际应用如何利用置换等变性,这预示着更广泛而有趣的应用场景。