The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natural images has greatly improved its quality, and approaches that combine the diverse information contained in visual experiences have proliferated rapidly in recent years. In this technical paper, by taking advantage of the simple and generic framework that we proposed (Takagi and Nishimoto, CVPR 2023), we examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction. Specifically, we combined our earlier work with the following three techniques: using decoded text from brain activity, nonlinear optimization for structural image reconstruction, and using decoded depth information from brain activity. We confirmed that these techniques contributed to improving accuracy over the baseline. We also discuss what researchers should consider when performing visual reconstruction using deep generative models trained on large datasets. Please check our webpage at https://sites.google.com/view/stablediffusion-with-brain/. Code is also available at https://github.com/yu-takagi/StableDiffusionReconstruction.
翻译:深度学习与神经科学的融合正在快速推进,这不仅提升了脑活动分析的质量,也深化了从神经科学视角对深度学习模型的理解。从人脑活动中重构视觉经验的研究领域尤为受益:基于大量自然图像训练的深度学习模型显著提升了重构质量,而整合视觉经验中多元信息的方法近年来也迅速发展。在本技术论文中,我们利用所提出的简单通用框架(Takagi and Nishimoto, CVPR 2023),系统考察了多种额外解码技术对视觉经验重构性能的影响。具体而言,我们将早期工作与以下三种技术相结合:基于脑活动的文本解码、面向结构图像重建的非线性优化,以及基于脑活动解码的深度信息。实验证实这些技术相较于基线方法有效提升了重构精度。同时,本文讨论了研究者在使用基于大数据集训练的深度生成模型进行视觉重建时需注意的关键问题。详情请参见项目主页 https://sites.google.com/view/stablediffusion-with-brain/,代码开源地址为 https://github.com/yu-takagi/StableDiffusionReconstruction。