Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.
翻译:听觉注意力检测(AAD)旨在从多说话人环境中的大脑信号中检测目标说话人。尽管基于脑电图(EEG)的AAD方法近年来取得了显著成果,但现有方法主要依赖为处理图像等欧几里得数据而设计的传统卷积神经网络,这使其难以应对具有非欧几里得特性的脑电信号。为解决该问题,本文提出了一种无需语音刺激输入、用于AAD的动态图自蒸馏(DGSD)方法。具体而言,为有效表征脑电信号的非欧几里得特性,采用动态图卷积网络来构建脑电信号的图结构,同时提取与听觉空间注意力相关的关键特征。此外,为进一步提升AAD检测性能,本文整合了自蒸馏策略,该策略包含逐层特征蒸馏和层次化蒸馏,通过利用最深网络层的特征与分类结果来引导浅层网络的学习。我们在两个公开数据集(KUL和DTU)上进行实验。在1秒时间窗下,我们在KUL和DTU上分别达到90.0%和79.6%的准确率。将所提出的DGSD方法与现有竞争基准对比,实验结果表明,该方法不仅检测性能优于最佳可复现基准,而且可训练参数量减少了约100倍。