Scaling laws for language encoding models in fMRI

Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.

翻译：基于Transformer的单向语言模型表征已被证实能有效预测大脑对自然语言的响应。然而，多数将语言模型与大脑进行比较的研究仅使用了GPT-2或同等规模的语言模型。本研究测试了更大规模的开源模型（如OPT与LLaMA系列）是否能更准确地预测fMRI记录的大脑响应。与其他领域的缩放结果一致，我们发现：当模型参数量从1.25亿扩展至300亿时，大脑预测性能呈对数尺度提升——通过3名受试者保留测试集的相关性评估，编码性能提升约15%。类似的对数行为在扩充fMRI训练集规模时同样被观测到。我们还刻画了基于HuBERT、WavLM和Whisper的声学编码模型的缩放特性，发现模型规模扩大时存在同等程度的性能提升。对这些大规模高性能编码模型进行噪声天花板分析表明，在楔前叶及高级听觉皮层等脑区，其性能已逼近理论极限值。这些结果表明，同步扩大模型与数据规模将催生极为高效的大脑语言处理模型，为科学认知突破及解码等应用奠定基础。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日