Sequential fine-tuning of Large Language Models (LLMs) adaptation to target tasks often triggers catastrophic forgetting, where the acquisition of novel target skills degrades ancestral capabilities. This paper presents a systematic comparative study of catastrophic forgetting across twenty premier models representing the state-of-the-art in mid-2026. We categorize our investigation into two primary research lines: (i) a behavioral and semantic output drift analysis of ten leading closed-source models (including Claude Fable 5, GPT-5.5 High, and Gemini 3.5 Flash), and (ii) a deep mechanistic interpretation of ten prominent open-weight architectures (such as DeepSeek-V4-Pro, Llama 4 Maverick, and Qwen 3.6-27B). Through weight-space trajectory tracking, Centered Kernel Alignment (CKA), and routing gate drift calculations in Mixture-of-Experts (MoE) layers, we localize the neural circuits highly susceptible to parameter overwriting. Our findings indicate that early-layer attention heads exhibit systemic entropic dispersion, while mid-to-deep feed-forward networks (or sparse expert blocks) suffer localized representation collapse. Informed by these insights, we introduce Low-Rank Circuit Projection (LRCP), a subspace-regularized training intervention. Empirical evaluations show that LRCP successfully mitigates up to 94.2% of ancestral capabilities in open-weight configurations and matches the adaptation velocity of standard PEFT baselines.
翻译:对大型语言模型(LLMs)进行顺序微调以适应目标任务时,往往会引发灾难性遗忘,即获取新目标任务能力会损害原有能力。本文对代表2026年中旬最先进水平的二十个顶级模型进行了系统性的灾难性遗忘比较研究。我们将研究分为两个主要方向:(i)对十个领先闭源模型(包括Claude Fable 5、GPT-5.5 High和Gemini 3.5 Flash)的行为与语义输出漂移分析,以及(ii)对十个著名开源权重架构(如DeepSeek-V4-Pro、Llama 4 Maverick和Qwen 3.6-27B)的深层机制解释。通过权重空间轨迹追踪、中心核对齐(CKA)以及混合专家(MoE)层中路由门控漂移计算,我们定位了极易受参数重写影响的神经回路。研究结果表明,浅层注意力头表现出系统性熵分散,而中深层前馈网络(或稀疏专家模块)则出现局部表示崩溃。基于这些发现,我们提出了低秩回路投影(LRCP),一种子空间正则化训练干预方法。实验评估显示,LRCP在开源权重配置中成功缓解了高达94.2%的原有能力丧失,同时与标准PEFT基线的适应速度相当。