It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Identifying and Reweighting Context-Aware Neurons) to capitalize on neurons that are crucial in processing contextual cues. Specifically, IRCAN first identifies neurons that significantly contribute to context processing, utilizing a context-aware attribution score derived from integrated gradients. Subsequently, the identified context-aware neurons are strengthened via reweighting. In doing so, we steer LLMs to generate context-sensitive outputs with respect to the new knowledge provided in the context. Extensive experiments conducted across a variety of models and tasks demonstrate that IRCAN not only achieves remarkable improvements in handling knowledge conflicts but also offers a scalable, plug-and-play solution that can be integrated seamlessly with existing models. Our codes are released at https://github.com/danshi777/IRCAN.
翻译:广泛认可的是,大语言模型(LLMs)在经过海量数据训练后编码了庞大的知识库。近期研究揭示了LLM生成中存在知识冲突,即过时或错误的参数化知识(即编码知识)与上下文中提供的新知识相矛盾。为缓解此类知识冲突,我们提出了一种新颖的框架IRCAN(识别与重加权上下文感知神经元),以充分利用在处理上下文线索中至关重要的神经元。具体而言,IRCAN首先利用源自积分梯度的上下文感知归因分数,识别出对上下文处理有显著贡献的神经元。随后,通过重加权增强已识别的上下文感知神经元。通过这种方式,我们引导LLMs根据上下文中提供的新知识生成上下文敏感的答案。在多种模型和任务上进行的大量实验表明,IRCAN不仅在处理知识冲突方面取得了显著改进,而且提供了一种可扩展、即插即用的解决方案,能够与现有模型无缝集成。我们的代码发布于 https://github.com/danshi777/IRCAN。