UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity

Image reconstruction and captioning from brain activity evoked by visual stimuli allow researchers to further understand the connection between the human brain and the visual perception system. While deep generative models have recently been employed in this field, reconstructing realistic captions and images with both low-level details and high semantic fidelity is still a challenging problem. In this work, we propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity. For the first time, we unify image reconstruction and captioning from visual-evoked functional magnetic resonance imaging (fMRI) through a latent diffusion model termed Versatile Diffusion. Specifically, we transform fMRI voxels into text and image latent for low-level information and guide the backward diffusion process through fMRI-based image and text conditions derived from CLIP to generate realistic captions and images. UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes Dataset (NSD) dataset. Moreover, the ablation experiments and functional region-of-interest (ROI) analysis further exhibit the superiority of UniBrain and provide comprehensive insight for visual-evoked brain decoding.

翻译：视觉刺激诱发的脑活动进行图像重建与描述，使研究者能够进一步理解人脑与视觉感知系统之间的关联。尽管深度生成模型近期已被应用于该领域，但生成兼具低级细节与高级语义保真度的真实感描述和图像仍是极具挑战性的问题。本研究提出UniBrain：统一人类脑活动图像重建与描述的全合一扩散模型。我们首次通过名为Versatile Diffusion的潜在扩散模型，统一了基于视觉诱发功能磁共振成像（fMRI）的图像重建与描述。具体而言，我们将fMRI体素转化为文本和图像潜变量以获取低级信息，并通过基于fMRI的CLIP图像和文本条件引导反向扩散过程，生成真实描述与图像。UniBrain在图像重建方面，定性和定量结果均优于现有方法，并首次在自然场景数据集（NSD）上报告了图像描述结果。此外，消融实验和功能感兴趣区域（ROI）分析进一步展示了UniBrain的优越性，并为视觉诱发脑解码提供了全面见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日