In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called ``Brain-Diffuser''. In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling ``ROI-optimal'' scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
翻译:在神经解码研究中,最引人入胜的课题之一是基于功能磁共振信号重建感知的自然图像。以往研究虽能成功再现视觉的不同维度,例如低层级属性(形状、纹理、布局)或高层级特征(物体类别、场景描述语义),但通常难以针对复杂场景图像同时重建这些属性。近年来,生成式AI取得突破性进展,潜在扩散模型已能生成高复杂度图像。本研究探索如何利用这一创新技术进行脑解码。我们提出名为"Brain-Diffuser"的两阶段场景重建框架:第一阶段从功能磁共振信号出发,采用VDVAE(极深变分自编码器)模型重建捕获低层级属性与整体布局的图像;第二阶段使用基于预测多模态(文本与视觉)特征条件约束的潜在扩散模型(Versatile Diffusion)图像到图像框架,生成最终重建图像。在公开的Natural Scenes Dataset基准测试中,本方法在定性和定量评估上均超越既有模型。当应用于由个体ROI(感兴趣区域)掩膜生成的合成功能磁共振模式时,训练后的模型能够创建与神经科学认知一致的"ROI最优"场景。因此,本方法对应用性(如脑机接口)和基础神经科学均具有重要影响。