Reconstructing visual stimuli from functional Magnetic Resonance Imaging (fMRI) based on Latent Diffusion Models (LDM) provides a fine-grained retrieval of the brain. A challenge persists in reconstructing a cohesive alignment of details (such as structure, background, texture, color, etc.). Moreover, LDMs would generate different image results even under the same conditions. For these, we first uncover the neuroscientific perspective of LDM-based methods that is top-down creation based on pre-trained knowledge from massive images but lack of detail-driven bottom-up perception resulting in unfaithful details. We propose NeuralDiffuser which introduces primary visual feature guidance to provide detail cues in the form of gradients, extending the bottom-up process for LDM-based methods to achieve faithful semantics and details. We also developed a novel guidance strategy to ensure the consistency of repeated reconstructions rather than a variety of results. We obtain the state-of-the-art performance of NeuralDiffuser on the Natural Senses Dataset (NSD), which offers more faithful details and consistent results.
翻译:基于潜在扩散模型(LDM)从功能磁共振成像(fMRI)重建视觉刺激,能够实现对大脑活动的细粒度检索。然而,在重建细节(如结构、背景、纹理、颜色等)的连贯对齐方面仍存在挑战。此外,即便在相同条件下,LDM也会生成不同的图像结果。针对这些问题,我们首先揭示了基于LDM方法的神经科学视角:这是一种基于海量图像预训练知识的自上而下生成过程,但由于缺乏细节驱动的自下而上感知,导致细节不忠实。我们提出NeuralDiffuser,通过引入初级视觉特征引导,以梯度形式提供细节线索,扩展了基于LDM方法的自下而上过程,从而实现对语义和细节的忠实重建。我们还开发了一种新颖的引导策略,确保重复重建的一致性而非生成多样化的结果。在自然感官数据集(NSD)上,NeuralDiffuser取得了最先进的性能,提供了更忠实的细节和一致的结果。