Large Language Models (LLMs) possess remarkable generalization capabilities but struggle with multi-task adaptation, particularly in balancing knowledge retention with task-specific specialization. Conventional fine-tuning methods suffer from catastrophic forgetting and substantial resource consumption, while existing parameter-efficient methods perform suboptimally in complex multi-task scenarios. To address this, we propose Contextual Attention Modulation (CAM), a novel mechanism that dynamically modulates the representations of self-attention modules in LLMs. CAM enhances task-specific features while preserving general knowledge, thereby facilitating more effective and efficient adaptation. For effective multi-task adaptation, CAM is integrated into our Hybrid Contextual Attention Modulation (HyCAM) framework, which combines a shared, full-parameter CAM module with multiple specialized, lightweight CAM modules, enhanced by a dynamic routing strategy for adaptive knowledge fusion. Extensive experiments on heterogeneous tasks, including question answering, code generation, and logical reasoning, demonstrate that our approach significantly outperforms existing approaches, achieving an average performance improvement of 3.65%. The implemented code and data are available to ease reproducibility at https://github.com/Applied-Machine-Learning-Lab/HyCAM.
翻译:大型语言模型(LLMs)具备卓越的泛化能力,但在多任务适应方面存在困难,尤其是在平衡知识保留与任务特定专业化方面。传统的微调方法存在灾难性遗忘和大量资源消耗的问题,而现有的参数高效方法在复杂的多任务场景中表现欠佳。为解决这一问题,我们提出了上下文注意力调制(CAM),这是一种新颖的机制,能够动态调制LLMs中自注意力模块的表征。CAM在保留通用知识的同时增强了任务特定特征,从而促进了更有效和高效的适应。为实现有效的多任务适应,CAM被集成到我们的混合上下文注意力调制(HyCAM)框架中,该框架结合了一个共享的、全参数的CAM模块与多个专门的、轻量级的CAM模块,并通过用于自适应知识融合的动态路由策略进行增强。在包括问答、代码生成和逻辑推理在内的异构任务上进行的大量实验表明,我们的方法显著优于现有方法,实现了平均3.65%的性能提升。已实现的代码和数据可在 https://github.com/Applied-Machine-Learning-Lab/HyCAM 获取,以方便复现。