In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest. Studies into the human auditory attention network demonstrate cortical entrainment to speech envelopes resulting in highly correlated Electroencephalography (EEG) measurements. Current trends in EEG-based auditory attention detection (AAD) using artificial neural networks (ANN) are not practical for edge-computing platforms due to longer decision windows using several EEG channels, with higher power consumption and larger memory footprint requirements. Nor are ANNs capable of accurately modeling the brain's top-down attention network since the cortical organization is complex and layer. In this paper, we propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) corticomorphic architecture, inspired by the auditory cortex, which uses EEG data along with multi-speaker speech envelopes to successfully decode auditory attention with low latency down to 1 second, using only 8 EEG electrodes strategically placed close to the auditory cortex, at a significantly higher accuracy of 91.03%, compared to the state-of-the-art. Simultaneously, when compared to a traditional CNN reference model, our model uses ~15% fewer parameters at a lower bit precision resulting in ~57% memory footprint reduction. The results show great promise for edge-computing in brain-embedded devices, like smart hearing aids.
翻译:在多人说话的"鸡尾酒会"场景中,听者可以有选择性地关注感兴趣的说话者。对人类听觉注意网络的研究表明,听觉皮层会跟随语音包络,从而产生高度相关的脑电图(EEG)测量结果。当前基于人工神经网络(ANN)进行脑电图听觉注意检测(AAD)的趋势,由于使用多个EEG通道导致决策窗口较长,且功耗较高、内存占用较大,并不适用于边缘计算平台。此外,由于听觉皮层的组织复杂且分层,ANN也无法准确模拟大脑的自上而下注意网络。本文提出一种受听觉皮层启发的混合卷积神经网络-脉冲神经网络(CNN-SNN)皮质形态架构,该架构利用EEG数据及多说话者语音包络,以低至1秒的延迟成功解码听觉注意,仅使用8个策略性放置于听觉皮层附近的EEG电极,准确率高达91.03%,显著优于现有技术。同时,与传统CNN参考模型相比,本模型参数量减少约15%,且采用更低比特精度,使内存占用降低约57%。该结果在智能助听器等脑植入设备的边缘计算应用中展现出巨大潜力。