Model editing aims to precisely modify the behaviours of large language models (LLMs) on specific knowledge while keeping irrelevant knowledge unchanged. It has been proven effective in resolving hallucination and out-of-date issues in LLMs. As a result, it can boost the application of LLMs in many critical domains (e.g., medical domain), where the hallucination is not tolerable. In this paper, we propose two model editing studies and validate them in the medical domain: (1) directly editing the factual medical knowledge and (2) editing the explanations to facts. Meanwhile, we observed that current model editing methods struggle with the specialization and complexity of medical knowledge. Therefore, we propose MedLaSA, a novel Layer-wise Scalable Adapter strategy for medical model editing. It employs causal tracing to identify the precise location of knowledge in neurons and then introduces scalable adapters into the dense layers of LLMs. These adapters are assigned scaling values based on the corresponding specific knowledge. To evaluate the editing impact, we build two benchmark datasets and introduce a series of challenging and comprehensive metrics. Extensive experiments on medical LLMs demonstrate the editing efficiency of MedLaSA, without affecting irrelevant knowledge that is not edited.
翻译:模型编辑旨在精确修改大型语言模型(LLMs)在特定知识上的行为,同时保持无关知识不变。该方法已被证明能有效解决LLMs中的幻觉和知识过时问题,因此可推动LLMs在幻觉不可容忍的关键领域(例如医学领域)的应用。本文提出两项模型编辑研究,并在医学领域进行验证:(1) 直接编辑事实性医学知识;(2) 编辑对事实的解释。同时,我们发现现有模型编辑方法难以应对医学知识的专业性和复杂性,为此提出MedLaSA——一种面向医学模型编辑的新型分层可扩展适配器策略。该方法通过因果追踪定位神经元中知识的精确位置,然后在LLMs的密集层中引入可扩展适配器,并根据对应的特定知识为这些适配器分配缩放值。为评估编辑影响,我们构建了两个基准数据集,并引入一系列具有挑战性的综合评估指标。在医学LLMs上的大量实验表明,MedLaSA在保持未编辑知识不变的同时,实现了高效的编辑效果。