Visual neural decoding aims to extract and interpret original visual experiences directly from human brain activity. Recent studies have demonstrated the feasibility of decoding visual semantic categories from electroencephalography (EEG) signals, among which metric learning-based approaches have delivered promising results. However, these methods that directly map EEG features into a pre-trained embedding space inevitably introduce mapping bias, resulting in a modality gap and semantic inconsistency that impair cross-modal alignment. To address these issues, this work constructs a Visual-EEG Joint Semantic Space to bridge the gap between visual images and neural signals. Building upon this space, we propose two novel approaches to improve semantic consistency between cross-modal representations and facilitate optimal alignment. Specifically, (1) we introduce a Visual-EEG Semantic Decoupling Network (VE-SDN) to explicitly disentangle semantic components from modality representations, thereby achieving purely semantic-level cross-modal alignment. (2) We introduce a Neural-Guided Intra-Class Consistency (NGIC) objective, an asymmetric representation alignment strategy designed to effectively enhance the robustness of visual representations and further boost decoding performance. Extensive experiments on a large-scale Visual-EEG dataset validate the effectiveness of the proposed method. Compared to the strongest baseline, our approach demonstrates superior decoding performance, yielding relative Top-1/Top-5 accuracy improvements of 38.9%/17.9% in intra-subject and 16.1%/11.3% in inter-subject settings. The code is available at https://github.com/hzalanchen/Cross-Modal-EEG
翻译:视觉神经解码旨在从人类大脑活动中直接提取并解读原始视觉体验。近年研究已证明从脑电图(EEG)信号解码视觉语义类别的可行性,其中基于度量学习的方法取得了令人鼓舞的成果。然而,这些将EEG特征直接映射到预训练嵌入空间的方法不可避免地会引入映射偏差,导致模态差距与语义不一致,从而损害跨模态对齐。为解决这些问题,本文构建了一个视觉-EEG联合语义空间以弥合视觉图像与神经信号之间的鸿沟。在此基础上,我们提出两种新方法以提升跨模态表征间的语义一致性并促进最优对齐。具体而言:(1)引入视觉-EEG语义解耦网络(VE-SDN),显式地从模态表征中分离语义成分,从而实现纯语义层面的跨模态对齐;(2)提出神经引导类内一致性(NGIC)目标函数——一种非对称表征对齐策略,旨在有效增强视觉表征的鲁棒性并进一步提升解码性能。在大型视觉-EEG数据集上的大量实验验证了所提方法的有效性。与最强基线相比,本方法展现出更优解码性能,在受试者内和受试者间设置下分别实现了相对Top-1/Top-5准确率提升38.9%/17.9%与16.1%/11.3%。代码开源于https://github.com/hzalanchen/Cross-Modal-EEG