Large Language Models (LLMs) have shown strong abilities in general language tasks, yet adapting them to specific domains remains a challenge. Current method like Domain Adaptive Pretraining (DAPT) requires costly full-parameter training and suffers from catastrophic forgetting. Meanwhile, Retrieval-Augmented Generation (RAG) introduces substantial inference latency due to expensive nearest-neighbor searches and longer context. This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters. Memory Decoder employs a small transformer decoder that learns to imitate the behavior of an external non-parametric retriever. Once trained, Memory Decoder can be seamlessly integrated with any pretrained language model that shares the same tokenizer, requiring no model-specific modifications. Experimental results demonstrate that Memory Decoder enables effective adaptation of various Qwen and Llama models to three distinct specialized domains: biomedicine, finance, and law, reducing perplexity by an average of 6.17 points. Overall, Memory Decoder introduces a novel paradigm centered on a specially pretrained memory component designed for domain-specific adaptation. This memory architecture can be integrated in a plug-and-play manner, consistently enhancing performance across multiple models within the target domain.
翻译:大型语言模型(LLMs)在通用语言任务中展现出强大能力,但将其适配到特定领域仍具挑战性。当前方法如领域自适应预训练(DAPT)需要昂贵的全参数训练,且存在灾难性遗忘问题。而检索增强生成(RAG)则因昂贵的最近邻搜索和更长的上下文引入显著推理延迟。本文提出记忆解码器,一种即插即用的预训练记忆模块,可在不改变原始模型参数的情况下实现高效的领域适配。记忆解码器采用小型Transformer解码器,学习模仿外部非参数检索器的行为。训练完成后,该模块可与任何共享相同分词器的预训练语言模型无缝集成,无需针对特定模型进行修改。实验结果表明,记忆解码器成功将多种Qwen和Llama模型适配至生物医学、金融和法律三个不同专业领域,平均降低困惑度6.17个点。总体而言,记忆解码器提出了一种以专为领域适配设计的预训练记忆组件为核心的新范式。这种记忆架构能以即插即用方式集成,在目标领域内持续提升多种模型的性能。