Auditory Attention Decoding (AAD) algorithms play a crucial role in isolating desired sound sources within challenging acoustic environments directly from brain activity. Although recent research has shown promise in AAD using shallow representations such as auditory envelope and spectrogram, there has been limited exploration of deep Self-Supervised (SS) representations on a larger scale. In this study, we undertake a comprehensive investigation into the performance of linear decoders across 12 deep and 2 shallow representations, applied to EEG data from multiple studies spanning 57 subjects and multiple languages. Our experimental results consistently reveal the superiority of deep features for AAD at decoding background speakers, regardless of the datasets and analysis windows. This result indicates possible nonlinear encoding of unattended signals in the brain that are revealed using deep nonlinear features. Additionally, we analyze the impact of different layers of SS representations and window sizes on AAD performance. These findings underscore the potential for enhancing EEG-based AAD systems through the integration of deep feature representations.
翻译:听觉注意力解码(AAD)算法在直接从脑活动中隔离复杂声学环境中的目标声源方面扮演着关键角色。尽管近期研究利用听觉包络和声谱图等浅层表示在AAD中展现出潜力,但对大规模深度自监督(SS)表示的探索仍十分有限。本研究对线性解码器在12种深度表示与2种浅层表示上的性能进行了全面评估,并将其应用于涵盖57名受试者及多种语言的多项脑电图研究数据。我们的实验结果一致表明,无论数据集与分析窗口如何,深度特征在解码背景说话者时均优于AAD方法。这一结果揭示了大脑中可能存在的非注意信号非线性编码机制,而深度非线性特征恰好能揭示这一机制。此外,我们还分析了SS表示的不同层级及窗口大小对AAD性能的影响。这些发现凸显了通过整合深度特征表示来增强基于EEG的AAD系统的潜力。