大型语言模型中的灾难性遗忘：跨语言任务的比较分析 (Catastrophic Forgetting in LLMs: A Comparative Analysis Across Language Tasks)

Large Language Models (LLMs) have significantly advanced Natural Language Processing (NLP), particularly in Natural Language Understanding (NLU) tasks. As we progress toward an agentic world where LLM-based agents autonomously handle specialized tasks, it becomes crucial for these models to adapt to new tasks without forgetting previously learned information - a challenge known as catastrophic forgetting. This study evaluates the continual fine-tuning of various open-source LLMs with different parameter sizes (specifically models under 10 billion parameters) on key NLU tasks from the GLUE benchmark, including SST-2, MRPC, CoLA, and MNLI. By employing prompt engineering and task-specific adjustments, we assess and compare the models' abilities to retain prior knowledge while learning new tasks. Our results indicate that models such as Phi-3.5-mini exhibit minimal forgetting while maintaining strong learning capabilities, making them well-suited for continual learning environments. Additionally, models like Orca-2-7b and Qwen2.5-7B demonstrate impressive learning abilities and overall performance after fine-tuning. This work contributes to understanding catastrophic forgetting in LLMs and highlights prompting engineering to optimize model performance for continual learning scenarios.

翻译：大型语言模型（LLMs）显著推动了自然语言处理（NLP）的发展，尤其在自然语言理解（NLU）任务中表现突出。随着我们迈向一个由LLM驱动的智能体世界，这些模型需要自主处理专业任务，因此它们必须能够在适应新任务的同时不遗忘先前学到的信息——这一挑战被称为灾难性遗忘。本研究评估了多种不同参数规模（特别是参数量低于100亿）的开源LLMs在GLUE基准测试中的关键NLU任务（包括SST-2、MRPC、CoLA和MNLI）上的持续微调效果。通过采用提示工程和任务特定调整，我们评估并比较了模型在学习新任务时保留已有知识的能力。结果表明，像Phi-3.5-mini这样的模型在保持强大学习能力的同时表现出极低的遗忘率，非常适合持续学习环境。此外，Orca-2-7b和Qwen2.5-7B等模型在微调后展现出优异的学习能力和整体性能。本研究有助于理解LLMs中的灾难性遗忘现象，并强调了通过提示工程优化模型在持续学习场景中性能的重要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日