Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales log-linearly with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar log-linear behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.
翻译:来自基于Transformer的单向语言模型的表示已知能有效预测大脑对自然语言的反应。然而,大多数比较语言模型与大脑的研究使用了GPT-2或类似规模的语言模型。在此,我们测试了更大规模的开源模型(如OPT和LLaMA系列)是否能更好地预测通过fMRI记录的大脑反应。仿照其他背景下的缩放结果,我们发现,从125M参数到30B参数模型,大脑预测性能随模型规模呈对数线性提升,与3名受试者的留出测试集的相关性显示编码性能提升约15%。当缩放fMRI训练集规模时,也观察到类似的对数线性行为。我们还描述了使用HuBERT、WavLM和Whisper的声学编码模型的缩放特性,并发现随模型规模有同等程度的改进。对这些大型高性能编码模型的噪声天花板分析表明,在楔前叶和高级听觉皮层等脑区,性能正接近理论最大值。这些结果表明,模型和数据的规模扩大将催生极其有效的大脑语言处理模型,从而促进科学理解及解码等应用。