While significant advancements in artificial intelligence (AI) have catalyzed progress across various domains, its full potential in understanding visual perception remains underexplored. We propose an artificial neural network dubbed VISION, an acronym for "Visual Interface System for Imaging Output of Neural activity," to mimic the human brain and show how it can foster neuroscientific inquiries. Using visual and contextual inputs, this multimodal model predicts the brain's functional magnetic resonance imaging (fMRI) scan response to natural images. VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%. We further probe the trained networks to reveal representational biases in different visual areas, generate experimentally testable hypotheses, and formulate an interpretable metric to associate these hypotheses with cortical functions. With both a model and evaluation metric, the cost and time burdens associated with designing and implementing functional analysis on the visual cortex could be reduced. Our work suggests that the evolution of computational models may shed light on our fundamental understanding of the visual cortex and provide a viable approach toward reliable brain-machine interfaces.
翻译:尽管人工智能(AI)领域的重大进展已推动各领域取得突破,但其在理解视觉感知方面的全部潜力仍未得到充分探索。我们提出一种名为VISION(“神经活动成像输出视觉接口系统”的缩写)的人工神经网络,用于模拟人脑并展示其如何促进神经科学研究。该多模态模型利用视觉和上下文输入,预测大脑对自然图像的功能性磁共振成像(fMRI)扫描响应。VISION成功地将人类血流动力学响应(以fMRI体素值形式)预测为视觉输入,其准确率超过现有最优方法达45%。我们进一步探究训练后的网络,以揭示不同视觉区域的表征偏好,生成可实验验证的假设,并建立可解释指标将这些假设与皮层功能相关联。凭借模型与评估指标,设计和实施视觉皮层功能分析的成本与时间负担得以降低。我们的研究表明,计算模型的演进可能加深我们对视觉皮层的基本理解,并为实现可靠的脑机接口提供可行途径。