The brain transforms visual inputs into high-dimensional cortical representations that support diverse cognitive and behavioral goals. Characterizing how this information is organized and routed across the human brain is essential for understanding how we process complex visual scenes. Here, we applied representational similarity analysis to 7T fMRI data collected during natural scene viewing. We quantified representational geometry shared across individuals and compared it to hierarchical features from vision and language neural networks. This analysis revealed two distinct processing routes: a ventromedial pathway specialized for scene layout and environmental context, and a lateral occipitotemporal pathway selective for animate content. Vision models aligned with shared structure in both routes, whereas language models corresponded primarily with the lateral pathway. These findings refine classical visual-stream models by characterizing scene perception as a distributed cortical network with separable representational routes for context and animate content.
翻译:大脑将视觉输入转化为支持多样化认知和行为目标的高维皮层表征。阐明此类信息如何在大脑中组织与传递,对于理解人类处理复杂视觉场景的机制至关重要。本研究对自然场景观看期间采集的7T fMRI数据进行了表征相似性分析。我们量化了个体间共享的表征几何结构,并将其与视觉及语言神经网络的分层特征进行比较。该分析揭示了两条独立的处理通路:一条专门处理场景布局与环境背景的腹内侧通路,以及一条对生命体内容具有选择性的外侧枕颞通路。视觉模型与两条通路的共享结构均存在对应关系,而语言模型主要与外侧通路相关联。这些发现通过将场景感知描述为具有可分离的上下文与生命体内容表征通路的分布式皮层网络,从而完善了经典的视觉流模型。