CLIP-MUSED: CLIP-Guided Multi-Subject Visual Neural Information Semantic Decoding

The study of decoding visual neural information faces challenges in generalizing single-subject decoding models to multiple subjects, due to individual differences. Moreover, the limited availability of data from a single subject has a constraining impact on model performance. Although prior multi-subject decoding methods have made significant progress, they still suffer from several limitations, including difficulty in extracting global neural response features, linear scaling of model parameters with the number of subjects, and inadequate characterization of the relationship between neural responses of different subjects to various stimuli. To overcome these limitations, we propose a CLIP-guided Multi-sUbject visual neural information SEmantic Decoding (CLIP-MUSED) method. Our method consists of a Transformer-based feature extractor to effectively model global neural representations. It also incorporates learnable subject-specific tokens that facilitates the aggregation of multi-subject data without a linear increase of parameters. Additionally, we employ representational similarity analysis (RSA) to guide token representation learning based on the topological relationship of visual stimuli in the representation space of CLIP, enabling full characterization of the relationship between neural responses of different subjects under different stimuli. Finally, token representations are used for multi-subject semantic decoding. Our proposed method outperforms single-subject decoding methods and achieves state-of-the-art performance among the existing multi-subject methods on two fMRI datasets. Visualization results provide insights into the effectiveness of our proposed method. Code is available at https://github.com/CLIP-MUSED/CLIP-MUSED.

翻译：视觉神经信息解码研究面临个体差异导致单被试解码模型难以泛化至多被试的挑战。此外，单被试数据有限性对模型性能产生制约。尽管现有跨被试解码方法已取得显著进展，但仍存在若干局限性，包括难以提取全局神经响应特征、模型参数与被试数量呈线性扩展、以及未能充分刻画不同被试对不同刺激的神经响应关系。为解决上述问题，我们提出一种CLIP引导的多被试视觉神经信息语义解码方法（CLIP-MUSED）。该方法采用基于Transformer的特征提取器有效建模全局神经表征，并通过引入可学习的被试专属令牌，在参数不线性增加的情况下实现多被试数据聚合。此外，我们利用表征相似性分析（RSA）基于CLIP表征空间中视觉刺激的拓扑关系引导令牌表征学习，从而充分刻画不同被试在不同刺激下的神经响应关系。最终，令牌表征被用于多被试语义解码。在两个fMRI数据集上，所提方法性能优于单被试解码方法，并在现有跨被试方法中达到最优。可视化结果进一步验证了方法的有效性。代码已开源至https://github.com/CLIP-MUSED/CLIP-MUSED。