Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer block, dubbed Algorithm Transformer (abbreviated as AlgoFormer). Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can achieve significantly higher expressiveness in algorithm representation when using the same number of parameters. In particular, inspired by the structure of human-designed learning algorithms, our transformer block consists of a pre-transformer that is responsible for task pre-processing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to be smarter than human-designed algorithms. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some challenging tasks.
翻译:除了自然语言处理之外,Transformer在更广泛的应用领域(包括科学计算和计算机视觉)中展现出卓越的性能。先前的研究试图从标准Transformer具备执行某些算法的表达能力与功能的角度来解释这一点。为了赋予Transformer算法能力,并受近期提出的循环Transformer(Yang等人,2024;Giannou等人,2023)启发,我们设计了一种新型Transformer块,称为算法Transformer(简称AlgoFormer)。与标准Transformer和原始循环Transformer相比,所提出的AlgoFormer在相同参数数量下能够实现显著更高的算法表示表达能力。具体而言,受人类设计的算法结构启发,该Transformer块由负责任务预处理的预Transformer、用于迭代优化算法的循环Transformer,以及负责后处理以产生期望结果的后Transformer组成。我们提供了AlgoFormer在解决某些具有挑战性问题时模仿人类设计算法的表达能力的理论证据。此外,理论和实证结果表明,所设计的Transformer有潜力比人类设计的算法更智能。实验结果表明,所提出的Transformer在若干具有挑战性的任务上优于标准Transformer和原始循环Transformer,展现了其实证优越性。