Large language models (LLMs) have been widely used in various applications but are known to suffer from issues related to untruthfulness and toxicity. While parameter-efficient modules (PEMs) have demonstrated their effectiveness in equipping models with new skills, leveraging PEMs for deficiency unlearning remains underexplored. In this work, we propose a PEMs operation approach, namely Extraction-before-Subtraction (Ext-Sub), to enhance the truthfulness and detoxification of LLMs through the integration of ``expert'' PEM and ``anti-expert'' PEM. Remarkably, even anti-expert PEM possess valuable capabilities due to their proficiency in generating fabricated content, which necessitates language modeling and logical narrative competence. Rather than merely negating the parameters, our approach involves extracting and eliminating solely the deficiency capability within anti-expert PEM while preserving the general capabilities. To evaluate the effectiveness of our approach in terms of truthfulness and detoxification, we conduct extensive experiments on LLMs, encompassing additional abilities such as language modeling and mathematical reasoning. Our empirical results demonstrate that our approach effectively improves truthfulness and detoxification, while largely preserving the fundamental abilities of LLMs.
翻译:大型语言模型(LLMs)已广泛应用于各类场景,但已知存在事实错误和毒性输出等问题。虽然参数高效模块(PEMs)在赋予模型新技能方面表现出有效性,但利用PEM进行缺陷遗忘的研究仍不充分。本文提出一种PEM操作方案——"提取-减法"(Ext-Sub),通过整合"专家"PEM与"反专家"PEM来增强LLMs的事实准确性与去毒性能力。值得注意的是,反专家PEM因其生成虚构内容的能力而具备语言建模与逻辑叙事等基础能力。我们的方法并非简单否定参数,而是提取并消除反专家PEM中的缺陷能力,同时保留其通用能力。为评估该方法在事实准确性与去毒性方面的效果,我们在LLMs上开展了涵盖语言建模与数学推理等能力的全面实验。实证结果表明,本方法能有效提升事实准确性与去毒性,同时基本保持LLMs的核心能力。