Reconstructing visual stimuli from functional Magnetic Resonance Imaging (fMRI) based on Latent Diffusion Models (LDM) provides a fine-grained retrieval of the brain. A challenge persists in reconstructing a cohesive alignment of details (such as structure, background, texture, color, etc.). Moreover, LDMs would generate different image results even under the same conditions. For these, we first uncover the neuroscientific perspective of LDM-based methods that is top-down creation based on pre-trained knowledge from massive images but lack of detail-driven bottom-up perception resulting in unfaithful details. We propose NeuralDiffuser which introduces primary visual feature guidance to provide detail cues in the form of gradients, extending the bottom-up process for LDM-based methods to achieve faithful semantics and details. We also developed a novel guidance strategy to ensure the consistency of repeated reconstructions rather than a variety of results. We obtain the state-of-the-art performance of NeuralDiffuser on the Natural Senses Dataset (NSD), which offers more faithful details and consistent results.
翻译:基于潜在扩散模型(LDM)从功能磁共振成像(fMRI)中重建视觉刺激,能够实现对大脑活动的精细提取。然而,如何在细节(如结构、背景、纹理、颜色等)上实现连贯对齐的重建仍是一大挑战。此外,即使条件相同,LDM也会生成不同的图像结果。针对这些问题,我们首先揭示了基于LDM方法的神经科学视角:其本质是基于海量图像预训练知识的自上而下生成过程,但缺乏以细节驱动的自下而上感知,导致细节不忠实。为此,我们提出NeuralDiffuser,通过引入初级视觉特征引导,以梯度形式提供细节线索,拓展了基于LDM方法的自下而上过程,从而实现语义与细节的忠实重建。我们还开发了一种全新的引导策略,确保重复重建结果的一致性,而非多样性。在自然感官数据集(NSD)上,NeuralDiffuser取得了最优性能,提供了更忠实的细节和一致的结果。