Neural Tracking of Sustained Attention, Attention Switching, and Natural Conversation in Audiovisual Environments using Mobile EEG

Everyday communication is dynamic and multisensory, often involving shifting attention, overlapping speech and visual cues. Yet, most neural attention tracking studies are still limited to highly controlled lab settings, using clean, often audio-only stimuli and requiring sustained attention to a single talker. This work addresses that gap by introducing a novel dataset from 24 normal-hearing participants. We used a mobile electroencephalography (EEG) system (44 scalp electrodes and 20 cEEGrid electrodes) in an audiovisual (AV) paradigm with three conditions: sustained attention to a single talker in a two-talker environment, attention switching between two talkers, and unscripted two-talker conversations with a competing single talker. Analysis included temporal response functions (TRFs) modeling, optimal lag analysis, selective attention classification with decision windows ranging from 1.1s to 35s, and comparisons of TRFs for attention to AV conversations versus side audio-only talkers. Key findings show significant differences in the attention-related P2-peak between attended and ignored speech across conditions for scalp EEG. No significant change in performance between switching and sustained attention suggests robustness for attention switches. Optimal lag analysis revealed narrower peak for conversation compared to single-talker AV stimuli, reflecting the additional complexity of multi-talker processing. Classification of selective attention was consistently above chance (55-70% accuracy) for scalp EEG, while cEEGrid data yielded lower correlations, highlighting the need for further methodological improvements. These results demonstrate that mobile EEG can reliably track selective attention in dynamic, multisensory listening scenarios and provide guidance for designing future AV paradigms and real-world attention tracking applications.

翻译：日常交流具有动态性与多模态特性，常涉及注意转移、重叠的语音和视觉线索。然而，现有神经注意追踪研究大多仍局限于高度控制的实验室环境，使用纯净（通常仅为音频）的刺激材料，并要求被试持续注意单一说话者。本研究通过引入包含24名听力正常参与者的新型数据集来填补这一空白。我们在视听范式下使用移动脑电图系统（44个头皮电极与20个cEEGrid电极），设置了三种实验条件：在双人对话环境中持续注意单一说话者、在两位说话者间进行注意切换、以及存在竞争性单人说话者的无脚本双人自然对话。分析方法包括：时序响应函数建模、最优延迟分析、决策窗口为1.1秒至35秒的选择性注意分类，以及对视听对话注意与侧边纯音频说话者注意的TRF比较。关键发现表明：头皮脑电在所有条件下，被注意与忽略语音的注意相关P2峰值均存在显著差异。切换注意与持续注意间的性能无显著变化，说明系统对注意切换具有鲁棒性。最优延迟分析显示，相较于单人视听刺激，对话条件下的峰值更窄，反映了多说话者处理的额外复杂性。头皮脑电的选择性注意分类准确率持续高于随机水平（55-70%），而cEEGrid数据相关性较低，凸显了方法学进一步改进的必要性。这些结果证明，移动脑电能在动态多模态听觉场景中可靠地追踪选择性注意，并为未来视听范式设计与现实世界注意追踪应用提供了指导。