While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.
翻译:尽管Transformer彻底改变了深度学习,但其二次复杂度注意力机制限制了其处理无限长输入的能力。我们提出反馈注意力记忆(FAM),一种新颖的Transformer架构,通过引入反馈循环使网络能够关注自身的潜在表征。该设计促进了Transformer内部工作记忆的形成,使其能处理任意长度的序列。TransformerFAM无需额外权重参数,可无缝集成至预训练模型。实验表明,在不同参数规模(1B、8B和24B)的Transformer模型上,TransformerFAM均显著提升了长上下文任务性能。这些结果展现了赋予大语言模型(LLMs)处理无限长序列能力的巨大潜力。