Existing approaches to modeling associations between visual stimuli and brain responses are facing difficulties in handling between-subject variance and model generalization. Inspired by the recent progress in modeling speech-brain response, we propose in this work a "match-vs-mismatch" deep learning model to classify whether a video clip induces excitatory responses in recorded EEG signals and learn associations between the visual content and corresponding neural recordings. Using an exclusive experimental dataset, we demonstrate that the proposed model is able to achieve the highest accuracy on unseen subjects as compared to other baseline models. Furthermore, we analyze the inter-subject noise using a subject-level silhouette score in the embedding space and show that the developed model is able to mitigate inter-subject noise and significantly reduce the silhouette score. Moreover, we examine the Grad-CAM activation score and show that the brain regions associated with language processing contribute most to the model predictions, followed by regions associated with visual processing. These results have the potential to facilitate the development of neural recording-based video reconstruction and its related applications.
翻译:现有视觉刺激与脑反应关联建模方法在处理个体间差异及模型泛化方面面临困难。受语音-脑反应建模研究进展启发,本文提出一种"匹配-不匹配"深度学习模型,用于分类视频片段是否在记录的脑电信号中诱发兴奋性反应,并学习视觉内容与对应神经记录之间的关联。基于专属实验数据集,我们证明所提模型在未见受试者上相比其他基线模型能实现最高准确率。进一步,采用嵌入空间中受试者级轮廓分数分析个体间噪声,表明所开发模型能有效抑制个体间噪声并显著降低轮廓分数。此外,通过Grad-CAM激活分数分析发现,与语言处理相关的脑区对模型预测贡献最大,其次是视觉处理相关脑区。这些结果有望促进基于神经记录的视频重建及相关应用的发展。