We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye's performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. All code is available on GitHub.
翻译:我们提出MindEye,一种新型fMRI图像重建方法,能够从大脑活动中检索和重建所视图像。该模型由两个并行子模块构成,分别专攻检索(利用对比学习)和重建(基于扩散先验)。MindEye可将fMRI脑活动映射至任意高维多模态潜在空间(如CLIP图像空间),从而能够使用接受该潜在空间嵌入的生成模型实现图像重建。我们通过定性并排对比与定量评估,全面比较了本方法与现有其他方法,证明MindEye在重建和检索任务中均达到最先进水平。特别值得注意的是,即使在高相似度候选图像中,MindEye仍能准确检索原始图像,表明其脑部嵌入保留了细粒度的图像特异性信息。这使得我们甚至能从LAION-5B等大规模数据库中精确检索图像。通过消融实验,我们证明MindEye相较以往方法的性能提升源于:专用于检索与重建的子模块、改进的训练技术,以及参数量高出数个量级的模型训练。此外,我们展示了通过结合独立自编码器的输出进行img2img处理,MindEye能更好地保留重建图像中的低级特征。全部代码已在GitHub开源。