The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain its remarkable success and limitations has therefore become an increasingly prominent focus in recent research. Some notable directions have explored understanding self-attention through the lens of image denoising and nonparametric regression. While promising, existing frameworks still lack a deeper mechanistic interpretation of various architectural components that enhance self-attention, both in its original formulation and subsequent variants. In this work, we aim to advance this understanding by developing a unifying image processing framework, capable of explaining not only the self-attention computation itself but also the role of components such as positional encoding and residual connections, including numerous later variants. We also pinpoint potential distinctions between the two concepts building upon our framework, and make effort to close this gap. We introduce two independent architectural modifications within transformers. While our primary objective is interpretability, we empirically observe that image processing-inspired modifications can also lead to notably improved accuracy and robustness against data contamination and adversaries across language and vision tasks as well as better long sequence understanding.
翻译:自注意力机制作为基于Transformer的先进深度学习架构的基石,很大程度上是由启发式驱动的,且本质上难以解释。因此,建立坚实的理论基础以解释其显著成功与局限性,已成为近期研究中日益突出的焦点。一些重要方向尝试通过图像去噪和非参数回归的视角来理解自注意力。尽管前景广阔,但现有框架仍缺乏对增强自注意力的各种架构组件(无论是其原始形式还是后续变体)更深入的机制性解释。在本工作中,我们旨在通过构建一个统一的图像处理框架来推进这一理解,该框架不仅能够解释自注意力计算本身,还能阐明位置编码和残差连接等组件的作用,涵盖众多后续变体。基于我们的框架,我们还指出了这两个概念之间潜在的区别,并努力弥合这一差距。我们在Transformer中引入了两种独立的架构修改。虽然我们的主要目标是可解释性,但我们通过实验观察到,受图像处理启发的修改也能在语言和视觉任务中显著提高精度、增强对数据污染和对抗攻击的鲁棒性,并改善对长序列的理解能力。