Auditory spatial attention detection (ASAD) is used to determine the direction of a listener's attention to a speaker by analyzing her/his electroencephalographic (EEG) signals. This study aimed to further improve the performance of ASAD with a short decision window (i.e., <1 s) rather than with long decision windows ranging from 1 to 5 seconds in previous studies. An end-to-end temporal attention network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head attention (MHA) mechanism, which can more effectively capture the interactions among time steps in collected EEG signals and efficiently assign corresponding weights to those EEG time steps. Experiments demonstrated that, compared with the CNN-based method and recent ASAD methods, TAnet provided improved decoding performance in the KUL dataset, with decoding accuracies of 92.4% (decision window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s) with short decision windows (i.e., <1 s). As a new ASAD model with a short decision window, TAnet can potentially facilitate the design of EEG-controlled intelligent hearing aids and sound recognition systems.
翻译:听觉空间注意力检测(ASAD)用于通过分析个体的脑电图(EEG)信号来确定听者对说话者的注意力方向。本研究旨在进一步改善ASAD在短决策窗口(即<1秒)下的性能,而非以往研究中使用1至5秒的长决策窗口。本文引入了一种端到端的时间注意力网络(即TAnet)。TAnet采用多头注意力(MHA)机制,该机制能更有效地捕捉采集到的EEG信号中时间步之间的交互关系,并为这些EEG时间步高效分配相应权重。实验表明,与基于CNN的方法及近期ASAD方法相比,TAnet在KUL数据集上提供了更优的解码性能,在短决策窗口(即<1秒)下,其解码准确率分别为92.4%(决策窗口0.1秒)、94.9%(0.25秒)、95.1%(0.3秒)、95.4%(0.4秒)和95.5%(0.5秒)。作为一种采用短决策窗口的新型ASAD模型,TAnet有望促进基于EEG控制的智能助听器及声音识别系统的设计。