Seeing is believing, however, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to the recent advances in both neuroscience and artificial intelligence, we have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches. In this paper, we pay attention to visual stimuli reconstruction by reconstructing the observed images based on portably accessible brain signals, i.e., electroencephalography (EEG) data. Since EEG signals are dynamic in the time-series format and are notorious to be noisy, processing and extracting useful information requires more dedicated efforts; In this paper, we propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals. Specifically, we incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data. A latent diffusion model will then leverage the extracted information to reconstruct the high-resolution visual stimuli images. The experimental results have illustrated the effectiveness of image reconstruction and superior quantitative performance of our proposed method.
翻译:眼见为实,然而人类视觉感知如何与认知相互交织的潜在机制仍是一个谜团。得益于神经科学与人工智能的最新进展,我们已能够记录视觉诱发的脑活动,并通过计算方法模拟视觉感知能力。本文关注视觉刺激重建,基于便携可获取的脑信号(即脑电图数据)重建观察到的图像。由于脑电图信号呈时间序列动态特性且以噪声显著著称,处理并提取有用信息需要更多专门投入。本文提出名为NeuroImagen的综合流水线,用于从脑电图信号重建视觉刺激图像。具体而言,我们引入一种新颖的多层级感知信息解码方法,从给定脑电图数据中提取多粒度输出。随后,潜在扩散模型将利用提取的信息重建高分辨率视觉刺激图像。实验结果表明了图像重建的有效性及所提方法在定量性能上的优越性。