In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.
翻译:本文通过将Transformer算法的核心机制重新解释为路径积分形式,提出了一种广义的Transformer公式化框架。在此视角下,注意力机制被重构为一种整合所有可能通向未来词元状态的转移路径的过程,其时间演化由前馈网络控制。通过将Transformer的每个组件系统性地映射到路径积分公式中的对应部分,我们获得了一种更紧凑高效的表示方法,其中序列的上下文信息被凝聚为类记忆片段。这些片段在Transformer各层间进行循环处理,从而实现了更有效的长期信息保留。我们通过密钥检索任务和摘要任务验证了该方法的有效性,证明所提出的方法在保持历史信息的同时,其内存使用随序列长度呈线性增长。这与标准注意力机制中通常观察到的非线性内存增长形成鲜明对比。我们预期这种受量子启发的Transformer架构广义化将为提升未来Transformer模型的效率和表达能力开辟新途径。