Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches. In this paper, we pay attention to visual stimuli reconstruction by reconstructing the observed images based on portably accessible brain signals, i.e., electroencephalography (EEG) data. Since EEG signals are dynamic in the time-series format and are notorious to be noisy, processing and extracting useful information requires more dedicated efforts; In this paper, we propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals. Specifically, we incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data. A latent diffusion model will then leverage the extracted information to reconstruct the high-resolution visual stimuli images. The experimental results have illustrated the effectiveness of image reconstruction and superior quantitative performance of our proposed method.
翻译:眼见为实,然而人类视觉感知与认知之间相互交织的内在机制仍是一个未解之谜。得益于神经科学与人工智能领域的最新进展,我们已能够记录视觉诱发的脑活动,并通过计算方法模拟视觉感知能力。本文聚焦于视觉刺激重建,基于便携式脑信号(即脑电图数据)重构观察图像。由于脑电图信号以时间序列形式动态变化且噪声显著,其处理与有用信息提取需要更精细的研究。为此,我们提出名为NeuroImagen的综合流程,用于从脑电图信号重建视觉刺激图像。具体而言,我们引入新颖的多层级感知信息解码机制,从给定脑电图数据中提取多粒度输出;随后利用潜在扩散模型基于所提取信息重建高分辨率视觉刺激图像。实验结果证明了图像重建的有效性,并展示了该方法优越的定量性能。