Research in decoding visual information from the brain, particularly through the non-invasive fMRI method, is rapidly progressing. The challenge arises from the limited data availability and the low signal-to-noise ratio of fMRI signals, leading to a low-precision task of fMRI-to-image retrieval. State-of-the-art MindEye remarkably improves fMRI-to-image retrieval performance by leveraging a deep MLP with a high parameter count orders of magnitude, i.e., a 996M MLP Backbone per subject, to align fMRI embeddings to the final hidden layer of CLIP's vision transformer. However, significant individual variations exist among subjects, even within identical experimental setups, mandating the training of subject-specific models. The substantial parameters pose significant challenges in deploying fMRI decoding on practical devices, especially with the necessitating of specific models for each subject. To this end, we propose Lite-Mind, a lightweight, efficient, and versatile brain representation network based on discrete Fourier transform, that efficiently aligns fMRI voxels to fine-grained information of CLIP. Our experiments demonstrate that Lite-Mind achieves an impressive 94.3% fMRI-to-image retrieval accuracy on the NSD dataset for Subject 1, with 98.7% fewer parameters than MindEye. Lite-Mind is also proven to be able to be migrated to smaller brain datasets and establishes a new state-of-the-art for zero-shot classification on the GOD dataset. The code is available at https://github.com/gongzix/Lite-Mind.
翻译:从大脑中解码视觉信息的研究,特别是通过非侵入式fMRI方法,正在迅速发展。挑战源于fMRI信号数据有限且信噪比低,导致fMRI到图像检索任务精度低。最先进的MindEye通过利用深度MLP(每个被试拥有高达996M参数的MLP主干),将fMRI嵌入与CLIP视觉Transformer的最终隐藏层对齐,显著提升了fMRI到图像检索的性能。然而,即使在相同的实验设置下,不同被试间也存在显著个体差异,这要求为每个被试训练专属模型。大量参数给在实用设备上部署fMRI解码带来了巨大挑战,尤其是需要为每个被试定制模型。为此,我们提出Lite-Mind——一种基于离散傅里叶变换的轻量、高效且通用的脑表征网络,能够高效地将fMRI体素与CLIP的细粒度信息对齐。实验表明,在NSD数据集上,Lite-Mind对被试1实现了94.3%的fMRI到图像检索准确率,而参数量仅为MindEye的1.3%。Lite-Mind还被证实可迁移至更小的脑数据集,并在GOD数据集上创下了零样本分类的新纪录。代码开源地址:https://github.com/gongzix/Lite-Mind。