Despite significant progress in visual decoding with fMRI data, its high cost and low temporal resolution limit widespread applicability. To address these challenges, we introduce RealMind, a novel EEG-based visual decoding framework that leverages multi-modal models to efficiently interpret semantic information. By integrating semantic and geometric consistency learning, RealMind enhances feature alignment, leading to improved decoding performance. Our framework achieves a 56.73\% Top-5 accuracy in a 200-way retrieval task and a 26.59\% BLEU-1 score in a 200-way visual captioning task, representing the first successful attempt at zero-shot visual captioning using EEG data. RealMind provides a robust, adaptable, and cost-effective alternative to fMRI-based methods, offering scalable solutions for EEG-based visual decoding in practical applications.
翻译:尽管基于功能磁共振成像数据的视觉解码已取得显著进展,但其高昂成本和低时间分辨率限制了广泛应用。为应对这些挑战,我们提出RealMind——一种基于脑电图的新型视觉解码框架,该框架利用多模态模型高效解析语义信息。通过融合语义一致性与几何一致性学习,RealMind增强了特征对齐能力,从而提升了解码性能。我们的框架在200类别检索任务中实现了56.73%的Top-5准确率,在200类别视觉描述任务中达到26.59%的BLEU-1分数,这是首次成功利用脑电图数据实现零样本视觉描述的尝试。RealMind为基于功能磁共振成像的方法提供了鲁棒、适应性强且经济高效的替代方案,为实际应用中基于脑电图的视觉解码提供了可扩展的解决方案。