In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called ``Brain-Diffuser''. In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling ``ROI-optimal'' scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
翻译:在神经解码研究中,最引人入胜的课题之一是基于fMRI信号重建感知到的自然图像。以往研究虽已成功再现视觉的不同层面——如低级属性(形状、纹理、布局)或高级特征(物体类别、场景描述语义),但在复杂场景图像中往往无法将这些属性协同重建。随着生成式人工智能取得突破性进展,具备高复杂度图像生成能力的潜在扩散模型应运而生。本研究探究如何将这一创新技术应用于脑解码领域,提出名为"Brain-Diffuser"的两阶段场景重建框架:第一阶段基于fMRI信号,采用VDVAE(极深变分自编码器)模型重建保留低级属性与整体布局的图像;第二阶段以预测的多模态(文本与视觉)特征为条件,利用潜在扩散模型(Versatile Diffusion)的图像到图像框架生成最终重建图像。在公开的Natural Scenes Dataset基准测试中,本方法在定性与定量评估上均超越现有模型。当将训练模型应用于由各ROI(感兴趣区域)掩膜生成的合成fMRI模式时,模型生成的"ROI最优"场景与神经科学知识高度吻合。因此,该研究方法对应用神经科学(如脑机接口)与基础神经科学均具有重要影响。