DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

At a cocktail party, humans exhibit an impressive ability to direct their attention. The auditory attention detection (AAD) approach seeks to identify the attended speaker by analyzing brain signals, such as EEG signals. However, current AAD algorithms overlook the spatial distribution information within EEG signals and lack the ability to capture long-range latent dependencies, limiting the model's ability to decode brain activity. To address these issues, this paper proposes a dual attention refinement network with spatiotemporal construction for AAD, named DARNet, which consists of the spatiotemporal construction module, dual attention refinement module, and feature fusion \& classifier module. Specifically, the spatiotemporal construction module aims to construct more expressive spatiotemporal feature representations, by capturing the spatial distribution characteristics of EEG signals. The dual attention refinement module aims to extract different levels of temporal patterns in EEG signals and enhance the model's ability to capture long-range latent dependencies. The feature fusion \& classifier module aims to aggregate temporal patterns and dependencies from different levels and obtain the final classification results. The experimental results indicate that compared to the state-of-the-art models, DARNet achieves an average classification accuracy improvement of 5.9\% for 0.1s, 4.6\% for 1s, and 3.9\% for 2s on the DTU dataset. While maintaining excellent classification performance, DARNet significantly reduces the number of required parameters. Compared to the state-of-the-art models, DARNet reduces the parameter count by 91\%. Code is available at: https://github.com/fchest/DARNet.git.

翻译：在鸡尾酒会场景中，人类展现出令人瞩目的注意力定向能力。听觉注意检测（AAD）方法旨在通过分析脑信号（如脑电图信号）来识别受试者关注的说话者。然而，现有的AAD算法忽视了脑电图信号内部的空间分布信息，且缺乏捕获长程潜在依赖关系的能力，从而限制了模型解码大脑活动的能力。为解决这些问题，本文提出了一种用于AAD的时空构建双重注意力精化网络，命名为DARNet。该网络由时空构建模块、双重注意力精化模块以及特征融合与分类器模块构成。具体而言，时空构建模块旨在通过捕获脑电图信号的空间分布特征，构建更具表现力的时空特征表示。双重注意力精化模块旨在提取脑电图信号中不同层次的时间模式，并增强模型捕获长程潜在依赖关系的能力。特征融合与分类器模块旨在聚合来自不同层次的时间模式与依赖关系，并获取最终的分类结果。实验结果表明，在DTU数据集上，与现有最优模型相比，DARNet在0.1秒、1秒和2秒时间窗下的平均分类准确率分别提升了5.9%、4.6%和3.9%。在保持优异分类性能的同时，DARNet显著减少了所需参数量，与现有最优模型相比，参数量减少了91%。代码公开于：https://github.com/fchest/DARNet.git。