Predicting EEG Responses to Attended Speech via Deep Neural Networks for Speech

Attending to the speech stream of interest in multi-talker environments can be a challenging task, particularly for listeners with hearing impairment. Research suggests that neural responses assessed with electroencephalography (EEG) are modulated by listener`s auditory attention, revealing selective neural tracking (NT) of the attended speech. NT methods mostly rely on hand-engineered acoustic and linguistic speech features to predict the neural response. Only recently, deep neural network (DNN) models without specific linguistic information have been used to extract speech features for NT, demonstrating that speech features in hierarchical DNN layers can predict neural responses throughout the auditory pathway. In this study, we go one step further to investigate the suitability of similar DNN models for speech to predict neural responses to competing speech observed in EEG. We recorded EEG data using a 64-channel acquisition system from 17 listeners with normal hearing instructed to attend to one of two competing talkers. Our data revealed that EEG responses are significantly better predicted by DNN-extracted speech features than by hand-engineered acoustic features. Furthermore, analysis of hierarchical DNN layers showed that early layers yielded the highest predictions. Moreover, we found a significant increase in auditory attention classification accuracies with the use of DNN-extracted speech features over the use of hand-engineered acoustic features. These findings open a new avenue for development of new NT measures to evaluate and further advance hearing technology.

翻译：在多说话者环境中，集中注意力于感兴趣的语音流可能是一项具有挑战性的任务，尤其对于听力受损的听众。研究表明，通过脑电图（EEG）评估的神经反应受听众听觉注意力的调节，揭示了对注意力语音的选择性神经追踪（NT）。NT方法主要依赖手工设计的声学和语言学语音特征来预测神经反应。直到最近，才出现了不使用特定语言学信息的深度神经网络（DNN）模型来提取用于NT的语音特征，表明层次化DNN层中的语音特征可以预测整个听觉通路中的神经反应。在本研究中，我们更进一步，探讨类似的用于语音的DNN模型在预测EEG中观察到的竞争语音神经反应方面的适用性。我们采用64通道采集系统记录了17名听力正常听众的EEG数据，要求他们注意两个竞争说话者之一。我们的数据显示，与手工设计的声学特征相比，DNN提取的语音特征能显著更好地预测EEG反应。此外，对层次化DNN层的分析表明，早期层产生了最高的预测值。更重要的是，我们发现，使用DNN提取的语音特征相比手工设计的声学特征，听觉注意力分类准确率显著提高。这些发现为开发新的NT测量方法以评估和进一步推动听力技术开辟了新途径。