While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.
翻译:尽管Transformer已彻底改变了深度学习,但其二次注意力复杂度阻碍了其处理无限长输入的能力。我们提出反馈注意力记忆(FAM),一种新型Transformer架构,通过利用反馈循环使网络能够关注自身的潜在表征。该设计促进了Transformer内部工作记忆的产生,使其能够处理无限长的序列。TransformerFAM无需额外权重,可无缝集成至预训练模型。实验表明,在不同规模(1B、8B和24B)的模型上,TransformerFAM显著提升了Transformer在长上下文任务中的性能。这些结果展示了赋予大语言模型(LLMs)处理无限长序列能力的潜力。