Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models

Neural decoding, the process of understanding how brain activity corresponds to different stimuli, has been a primary objective in cognitive sciences. Over the past three decades, advancements in functional Magnetic Resonance Imaging and machine learning have greatly improved our ability to map visual stimuli to brain activity, especially in the visual cortex. Concurrently, research has expanded into decoding more complex processes like language and memory across the whole brain, utilizing techniques to handle greater variability and improve signal accuracy. We argue that "seeing" involves more than just mapping visual stimuli onto the visual cortex; it engages the entire brain, as various emotions and cognitive states can emerge from observing different scenes. In this paper, we develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps while individuals are exposed to visual stimuli. We utilize large-scale fMRI encoders and Image generative models pre-trained on large public datasets, which are then fine-tuned through Image-fMRI contrastive learning. Our models hence can decode visual experience across the entire cerebral cortex, surpassing the traditional confines of the visual cortex. We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%. A network ablation analysis suggests that beyond the visual cortex, the default mode network contributes most to decoding stimuli, in line with the proposed role of this network in sense-making and semantic processing. Additionally, we implemented zero-shot imagination decoding on an extra validation dataset, achieving a p-value of 0.0206 for mapping the reconstructed images and ground-truth text stimuli, which substantiates the model's capability to capture semantic meanings across various scenarios.

翻译：神经解码——理解大脑活动如何对应不同刺激的过程——一直是认知科学的主要目标。过去三十年间，功能磁共振成像和机器学习的进步极大地提升了我们将视觉刺激映射到大脑活动的能力，尤其是在视觉皮层。与此同时，研究已扩展到解码全脑中更复杂的进程，如语言和记忆，利用技术处理更大的变异性并提高信号准确性。我们认为，“看见”不仅仅是把视觉刺激映射到视觉皮层；它涉及整个大脑，因为观察不同场景可能引发各种情绪和认知状态。在本文中，我们开发了算法，通过整合个体暴露于视觉刺激时的全脑激活图，来增强对视觉过程的理解。我们利用在大规模公共数据集上预训练的大规模fMRI编码器和图像生成模型，并通过图像-fMRI对比学习进行微调。因此，我们的模型能够解码整个大脑皮层的视觉体验，超越了传统视觉皮层的局限。我们首先将我们的方法与最先进的视觉处理解码方法进行比较，结果显示预测语义准确性提高了43%。网络消融分析表明，除了视觉皮层外，默认模式网络对解码刺激的贡献最大，这与该网络在意义建构和语义处理中的假定作用一致。此外，我们在额外的验证数据集上实现了零样本想象解码，在映射重建图像和真实文本刺激时获得了0.0206的p值，这证实了模型在各种场景下捕捉语义意义的能力。