Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability -- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.
翻译:理解多模态大型语言模型(LLMs)如何解读不同模态并整合跨模态表征的内部机制,对学术界和工业界的持续改进日益关键。本文提出一种新颖方法,用于识别关键神经元以实现可解释性——揭示多模态LLMs如何桥接视觉与文本概念以完成图像描述。相较于传统方法,本方法无需昂贵梯度计算,显著提升了效率与适用范围。基于识别出的神经元,我们进一步设计多模态知识编辑方法,有助于缓解敏感词或幻觉问题。我们为设计原理提供了理论假设,并通过大量定量与定性实验进行实证评估。结果不仅验证了方法的有效性,还揭示了多模态神经元的三个关键特性:敏感性、特异性和因果效应,为未来研究提供启发。