We presented a neuron-level analysis of legal-domain reasoning in LLMs, comparing it with other applied domain tasks across seven open-weight models. Using neuron attribution scores to rank and suppress influential neurons, we confirmed that suppressing the identified neurons collapses accuracy on the target task, whereas suppressing the same number of random neurons does not. We further found a small subset of neurons influential across all seven tasks; once these are removed, suppressing the remaining neurons degrades only the task they were identified from, revealing genuinely task-specific neurons in every model studied. Within the legal domain, the three benchmarks exhibit relatively high neuron overlap and tend to be affected jointly, suggesting of legal components neurons that span jurisdictions. The distribution of identified neurons in our experiments suggests that the hypothesis that influential neurons are concentrated in middle MLP layers may depend on the input format and content, rather than being a universal phenomenon.
翻译:本文对大型语言模型(LLM)在法律领域推理中的表现进行了神经元层级的分析,并将其与七个开放权重模型中的其他应用领域任务进行了比较。通过使用神经元归因分数对关键神经元进行排序和抑制,我们证实:抑制识别出的神经元会导致目标任务的准确性大幅下降,而抑制同等数量的随机神经元则不会产生此影响。进一步地,我们发现所有七项任务中存在一个共同的神经元小子集;一旦这些神经元被移除,抑制剩余神经元只会降低其所属任务的性能,从而揭示了每个所研究模型中真正任务特异的神经元。在法律领域内,三个基准测试展现出相对较高的神经元重叠性,且倾向于共同受影响,这表明存在跨司法管辖区的法律组件神经元。实验中所识别出的神经元分布表明,"关键神经元集中于中间MLP层"这一假设可能取决于输入格式与内容,而非普遍现象。