LLMs hold great promise for healthcare applications, but the rapid evolution of medical knowledge and errors in training data often cause them to generate outdated or inaccurate information, limiting their applicability in high-stakes clinical practice. Model editing has emerged as a potential remedy without full retraining. While parameter-based editing often compromises locality and is thus ill-suited for the medical domain, retrieval-based editing offers a more viable alternative. However, it still faces two critical challenges: (1) representation overlap within the medical knowledge space often causes inaccurate retrieval and reduces editing accuracy; (2) existing methods are restricted to single-sample edits, while batch-editing remains largely unexplored despite its importance for real-world medical applications. To address these challenges, we first construct MedVersa, \hk{an enhanced benchmark with broader coverage of medical subjects, designed to evaluate both single and batch edits under strict locality constraints}. We then propose MedREK, a retrieval-based editing framework that integrates a shared query-key module for precise matching with an attention-based prompt encoder for informative guidance. Experimental results on various medical benchmarks demonstrate that our MedREK achieves superior performance across different core metrics and provides the first validated solution for batch-editing in medical LLMs. Our code and dataset are available at https://github.com/mylittleriver/MedREK.
翻译:大语言模型在医疗健康应用中展现出巨大潜力,但医学知识的快速演变及训练数据中的错误常导致其生成过时或不准确的信息,限制了其在高风险临床实践中的适用性。模型编辑已成为一种无需完整重训练的潜在解决方案。尽管基于参数的编辑方法常损害局部性,因而不适用于医学领域,而基于检索的编辑提供了更可行的替代方案。然而,该方法仍面临两个关键挑战:(1) 医学知识空间内的表示重叠常导致检索不准确,降低编辑精度;(2) 现有方法仅限于单样本编辑,而批量编辑虽对实际医疗应用至关重要,却仍鲜有探索。为应对这些挑战,我们首先构建了MedVersa,\hk{一个覆盖更广医学主题的增强基准,旨在严格局部性约束下评估单样本与批量编辑}。随后,我们提出MedREK,一种基于检索的编辑框架,其整合了用于精确匹配的共享查询-关键模块与用于信息引导的基于注意力的提示编码器。在多种医学基准上的实验结果表明,我们的MedREK在不同核心指标上均实现了优越性能,并为医学大语言模型的批量编辑提供了首个经验证解决方案。我们的代码与数据集公开于https://github.com/mylittleriver/MedREK。