In this paper, we introduce Recon3DMind, a groundbreaking task focused on reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals. This represents a major step forward in cognitive neuroscience and computer vision. To support this task, we present the fMRI-Shape dataset, utilizing 360-degree view videos of 3D objects for comprehensive fMRI signal capture. Containing 55 categories of common objects from daily life, this dataset will bolster future research endeavors. We also propose MinD-3D, a novel and effective three-stage framework that decodes and reconstructs the brain's 3D visual information from fMRI signals. This method starts by extracting and aggregating features from fMRI frames using a neuro-fusion encoder, then employs a feature bridge diffusion model to generate corresponding visual features, and ultimately recovers the 3D object through a generative transformer decoder. Our experiments demonstrate that this method effectively extracts features that are valid and highly correlated with visual regions of interest (ROIs) in fMRI signals. Notably, it not only reconstructs 3D objects with high semantic relevance and spatial similarity but also significantly deepens our understanding of the human brain's 3D visual processing capabilities. Project page at: https://jianxgao.github.io/MinD-3D.
翻译:本文提出了一项开创性任务——Recon3DMind,旨在从功能性磁共振成像(fMRI)信号中重建三维视觉信息,这标志着认知神经科学与计算机视觉领域的重要进展。为支撑该任务,我们构建了fMRI-Shape数据集,利用三维物体的360度视角视频实现全面的fMRI信号采集。该数据集包含55类日常常见物体,将为未来研究提供有力支持。同时,我们提出了MinD-3D——一种新颖且高效的三阶段框架,可从fMRI信号中解码并重建大脑的三维视觉信息。该方法首先通过神经融合编码器提取并聚合fMRI帧的特征,随后利用特征桥扩散模型生成对应的视觉特征,最终通过生成式Transformer解码器恢复三维物体。实验表明,该方法能有效提取与fMRI信号中视觉感兴趣区域(ROIs)高度相关的有效特征。值得注意的是,它不仅重建出具有高语义相关性和空间相似性的三维物体,更显著深化了我们对人类大脑三维视觉处理能力的理解。项目主页:https://jianxgao.github.io/MinD-3D。