Representations from large language models are highly effective at predicting BOLD fMRI responses to language stimuli. However, these representations are largely opaque: it is unclear what features of the language stimulus drive the response in each brain area. We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain from predictive models and then testing those explanations in follow-up experiments using LLM-generated stimuli.This approach is successful at explaining selectivity both in individual voxels and cortical regions of interest (ROIs), including newly identified microROIs in prefrontal cortex. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying predictive models. Finally, we show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity. These results demonstrate that LLMs can be used to bridge the widening gap between data-driven models and formal scientific theories.
翻译:大语言模型的表征在预测语言刺激引发的BOLD fMRI响应方面具有高度有效性。然而,这些表征在很大程度上仍是不透明的:尚不清楚语言刺激的哪些特征驱动了各脑区的响应。我们提出生成式因果检验(GCT)框架,该框架可从预测模型中生成关于大脑语言选择性的简洁解释,并通过使用LLM生成刺激的后续实验对这些解释进行检验。该方法不仅成功解释了个体体素和感兴趣皮层区域(ROIs)的选择性,包括新发现的前额叶皮层微ROIs,而且表明解释准确性与底层预测模型的预测能力和稳定性密切相关。最后,我们证明GCT能够剖析具有相似功能选择性的脑区间细微差异。这些结果表明,大语言模型可用于弥合数据驱动模型与形式化科学理论之间日益扩大的鸿沟。