层级频率标记探针（HFTP）：一种研究大型语言模型与人脑句法结构表征的统一方法 (Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain)

2025 年 10 月 15 日

Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

翻译：层级频率标记探针（HFTP）：一种研究大型语言模型与人脑句法结构表征的统一方法

Jingmin An,Yilong Song,Ruolin Yang,Nai Ding,Lingxi Lu,Yuxuan Wang,Wei Wang,Chu Zhuang,Qian Wang,Fang Fang

Large Language Models (LLMs) demonstrate human-level or even superior language abilities, effectively modeling syntactic structures, yet the specific computational modules responsible remain unclear. A key question is whether LLM behavioral capabilities stem from mechanisms akin to those in the human brain. To address these questions, we introduce the Hierarchical Frequency Tagging Probe (HFTP), a tool that utilizes frequency-domain analysis to identify neuron-wise components of LLMs (e.g., individual Multilayer Perceptron (MLP) neurons) and cortical regions (via intracranial recordings) encoding syntactic structures. Our results show that models such as GPT-2, Gemma, Gemma 2, Llama 2, Llama 3.1, and GLM-4 process syntax in analogous layers, while the human brain relies on distinct cortical regions for different syntactic levels. Representational similarity analysis reveals a stronger alignment between LLM representations and the left hemisphere of the brain (dominant in language processing). Notably, upgraded models exhibit divergent trends: Gemma 2 shows greater brain similarity than Gemma, while Llama 3.1 shows less alignment with the brain compared to Llama 2. These findings offer new insights into the interpretability of LLM behavioral improvements, raising questions about whether these advancements are driven by human-like or non-human-like mechanisms, and establish HFTP as a valuable tool bridging computational linguistics and cognitive neuroscience. This project is available at https://github.com/LilTiger/HFTP.

翻译：大型语言模型（LLM）展现出人类水平甚至更优的语言能力，能有效建模句法结构，但负责的具体计算模块仍不明确。一个关键问题是LLM的行为能力是否源于与人脑相似的机制。为解决这些问题，我们提出了层级频率标记探针（HFTP），该工具利用频域分析来识别LLM中编码句法结构的神经元级组件（例如，单个多层感知机（MLP）神经元）以及皮层区域（通过颅内记录）。我们的结果表明，GPT-2、Gemma、Gemma 2、Llama 2、Llama 3.1和GLM-4等模型在类似层级处理句法，而人脑则依赖不同的皮层区域处理不同层级的句法。表征相似性分析显示，LLM的表征与大脑左半球（在语言处理中占主导地位）有更强的对齐性。值得注意的是，升级模型表现出不同的趋势：Gemma 2比Gemma显示出更大的大脑相似性，而Llama 3.1与大脑的对齐性相比Llama 2更低。这些发现为LLM行为改进的可解释性提供了新的见解，引发了这些进步是由类人还是非类人机制驱动的疑问，并确立了HFTP作为连接计算语言学和认知神经科学的有价值工具。本项目可在 https://github.com/LilTiger/HFTP 获取。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日