Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented generation keeps knowledge outside the model but only provides prompt-level augmentation, whereas post-training based methods encode new knowledge into shared parameters but may introduce catastrophic forgetting, knowledge conflict, and costly updates. In this paper, we propose Decoupled Mixture-of-Experts (DMoE), a modular architecture for parametric knowledge injection that decouples both experts and the router from the base model. DMoE converts external knowledge corpora into independently updatable expert modules and uses a lightweight uncertainty-aware router to activate relevant experts only when the base model lacks sufficient knowledge during generation. To support efficient auto-regressive inference, DMoE attaches experts only to the final-layer feed-forward network, preserving KV-cache reuse while enabling parameter-level knowledge augmentation. Experiments on knowledge-intensive benchmarks show that DMoE consistently improves answer quality over retrieval and adapter-based baselines.
翻译:知识注入旨在使大语言模型(LLMs)具备外部、领域特定或时效性强的知识。现有方法通常在灵活性与集成性之间面临权衡:检索增强生成将知识保留在模型外部,但仅能提供提示层面的增强;而后训练方法将新知识编码进共享参数,但可能引发灾难性遗忘、知识冲突及高昂的更新成本。本文提出解耦混合专家模型(Decoupled Mixture-of-Experts, DMoE),这是一种用于参数化知识注入的模块化架构,将专家模块和路由模块与基础模型解耦。DMoE将外部知识语料转化为可独立更新的专家模块,并采用轻量级不确定性感知路由机制,仅在基础模型生成过程中缺乏足够知识时激活相关专家。为支持高效自回归推理,DMoE仅将专家模块附加至最后一层的前馈网络,在保留KV缓存复用能力的同时实现参数级知识增强。在知识密集型基准上的实验表明,DMoE相比基于检索和适配器的基线方法,能够持续提升回答质量。