Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge. Previous work has largely focused on locating entity-related (often single-token) facts in smaller models. However, several key questions remain unanswered: (1) How can we effectively locate query-relevant neurons in decoder-only LLMs, such as Llama and Mistral? (2) How can we address the challenge of long-form (or free-form) text generation? (3) Are there localized knowledge regions in LLMs? In this study, we introduce Query-Relevant Neuron Cluster Attribution (QRNCA), a novel architecture-agnostic framework capable of identifying query-relevant neurons in LLMs. QRNCA allows for the examination of long-form answers beyond triplet facts by employing the proxy task of multi-choice question answering. To evaluate the effectiveness of our detected neurons, we build two multi-choice QA datasets spanning diverse domains and languages. Empirical evaluations demonstrate that our method outperforms baseline methods significantly. Further, analysis of neuron distributions reveals the presence of visible localized regions, particularly within different domains. Finally, we show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
翻译:大语言模型(LLMs)在其参数中蕴含海量知识,这促使了针对定位与编辑这些知识的方法研究。先前工作主要集中于在较小模型中定位实体相关(通常是单标记)的事实。然而,几个关键问题仍未得到解答:(1)我们如何能有效定位仅解码器LLMs(如Llama和Mistral)中的查询相关神经元?(2)我们如何应对长文本(或自由形式)生成的挑战?(3)LLMs中是否存在局部化的知识区域?在本研究中,我们提出了查询相关神经元簇归因(QRNCA),这是一种新颖的架构无关框架,能够识别LLMs中的查询相关神经元。QRNCA通过采用多项选择问答的代理任务,允许对超越三元组事实的长文本答案进行检验。为了评估我们检测到的神经元的有效性,我们构建了两个涵盖不同领域和语言的多项选择问答数据集。实证评估表明,我们的方法显著优于基线方法。此外,对神经元分布的分析揭示了可见的局部化区域的存在,尤其是在不同领域内。最后,我们展示了检测到的神经元在知识编辑和基于神经元的预测中的潜在应用。