Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, we observe that, on our recently presented dataset with 15-class directional focus, models relying exclusively on EEG inputs exhibits significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. We employ the CNN, LSM-CNN, and EEG-Deformer models to decode the directional focus from listeners' EEG signals with the auxiliary audio spatial spectra. The proposed Sp-Aux-Deformer model achieves notable 15-class decoding accuracies of 57.48% and 61.83% in leave-one-subject-out and leave-one-trial-out scenarios, respectively.
翻译:从听者的脑电图(EEG)信号中解码其关注的说话者的方向,对于开发脑机接口以改善听力障碍者的生活质量至关重要。先前的研究主要集中在二值方向关注解码上,即判断关注的说话者位于听者的左侧还是右侧。然而,为实现有效的语音处理,需要对关注说话者的精确方向进行更精细的解码。此外,音频空间信息尚未得到有效利用,导致解码结果欠佳。在本文中,我们观察到,在我们近期提出的包含15类方向关注的数据集上,仅依赖EEG输入的模型在留一被试和留一试验两种场景下解码方向关注时,准确率显著较低。通过将音频空间谱与EEG特征相结合,可以有效地提升解码准确率。我们采用CNN、LSM-CNN和EEG-Deformer模型,辅以音频空间谱,从听者的EEG信号中解码方向关注。所提出的Sp-Aux-Deformer模型在留一被试和留一试验场景下,分别取得了57.48%和61.83%的显著15类解码准确率。