Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models

In domains such as biomedicine, materials, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have pronounced drawbacks: fine-tuning is expensive to iterate, and continual updates risk catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but is brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval drift, and long-context pressure that yields query-dependent prompt inflation. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an additional expert modality and injects it via a compact, representation-level interface aligned to the frozen base model, avoiding prompt-time evidence serialization while enabling plug-and-play specialization and scalable multi-domain composition with reliable selective activation. Across two private scientific QA benchmarks (immunology adjuvant and catalytic materials) and mixed-domain evaluations, GAG improves specialist performance over strong RAG baselines by 15.34% and 14.86% on the two benchmarks, respectively, while maintaining performance on six open general benchmarks and enabling near-oracle selective activation for scalable multi-domain deployment.

翻译：在生物医学、材料和金融等领域，大型语言模型（LLMs）的高风险部署需要注入私有的、领域特定的知识，这些知识具有专有性、快速演变性且在公开预训练中代表性不足。然而，当前两种主流的私有知识注入范式各自存在显著缺陷：微调方法迭代成本高昂，且持续更新易导致灾难性遗忘和通用能力退化；检索增强生成（RAG）虽能保持基础模型完整，但在专业私有语料库中表现脆弱，这源于文本块导致的证据碎片化、检索漂移以及长上下文压力引发的查询依赖性提示膨胀。受多模态LLMs将异构模态对齐到共享语义空间的启发，我们提出了生成增强生成（GAG），该方法将私有专业知识视为一种额外的专家模态，并通过一个紧凑的、表示级别的接口将其注入到冻结的基础模型中，从而避免了推理时的证据序列化，同时实现了即插即用的专业化、可扩展的多领域组合以及可靠的选择性激活。在两个私有科学问答基准（免疫学佐剂和催化材料）以及混合领域评估中，GAG在两个基准上的专家性能分别比强RAG基线提高了15.34%和14.86%，同时在六个开放通用基准上保持了性能，并为可扩展的多领域部署实现了近乎最优的选择性激活。