Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we speculate that imitating brain memory hierarchy is beneficial for model memorization. We propose the Hierarchical Memory Transformer (HMT), a novel framework that enables and improves models' long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input token segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling (Wikitext-103, PG-19) and question-answering tasks (PubMedQA), we show that HMT steadily improves the long-context processing ability of context-constrained and long-context models. With an additional 0.5% - 2% of parameters, HMT can easily plug in and augment future LLMs to handle long context effectively. Our code is open-sourced on Github: https://github.com/OswaldHe/HMT-pytorch.
翻译:摘要:基于Transformer的大型语言模型(LLM)已广泛应用于语言处理任务。然而,多数模型限制了上下文窗口,使其仅能关注输入中的每个令牌。既往循环模型可通过记忆历史令牌实现无界上下文并保持有效性,但这类模型采用"扁平化"记忆架构,在信息选择与过滤方面存在局限。基于人类善于学习与自我调节的特性,我们推测模仿人脑记忆层次结构有益于模型记忆能力提升。为此提出层次化记忆Transformer(HMT),该新颖框架通过模仿人类记忆行为,增强并改进了模型的长上下文处理能力。通过采用记忆增强的段级循环机制,我们以保留早期输入令牌片段、沿序列传递记忆嵌入、以及从历史信息中召回相关内容的方式组织记忆层次结构。在通用语言建模(Wikitext-103、PG-19)与问答任务(PubMedQA)上的评估表明,HMT稳定提升了上下文受限模型与长上下文模型的长文本处理能力。仅需增加0.5%-2%的参数,HMT即可便捷集成并增强现有及未来LLM以有效处理长上下文。我们的代码已在GitHub开源:https://github.com/OswaldHe/HMT-pytorch。