Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits of using self-supervised speech representation and contextual text embedding. Through feature fusion and model ensemble, an accuracy of 60.29% was achieved, and the performance was ranked as No.2 in Task 1 of the Auditory EEG Challenge (ICASSP 2024). The code to implement our work is available on Github: https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch.
翻译:将语音与脑电图关联具有重要意义,但极具挑战性。本研究采用深度卷积网络从脑电图数据中提取时空特征,以自监督语音表示和上下文文本嵌入作为语音特征,并利用对比学习关联脑电图特征与语音特征。实验结果表明,采用自监督语音表示与上下文文本嵌入具有显著优势。通过特征融合与模型集成,本研究实现了60.29%的准确率,在听觉脑电图挑战赛(ICASSP 2024)任务一中排名第二。本研究的实现代码已发布于Github:https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch。