CAT-Net：一种用于跨被试脑电-肌电融合声调解码的交叉注意力声调网络 (CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding)

Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (EEG) and electromyography (EMG) signals offers strong potential for enhancing decoding performance. Mandarin tone classification presents particular challenges, as tonal variations convey distinct meanings even when phonemes remain identical. In this study, we propose a novel cross-subject multimodal BCI decoding framework that fuses EEG and EMG signals to classify four Mandarin tones under both audible and silent speech conditions. Inspired by the cooperative mechanisms of neural and muscular systems in speech production, our neural decoding architecture combines spatial-temporal feature extraction branches with a cross-attention fusion mechanism, enabling informative interaction between modalities. We further incorporate domain-adversarial training to improve cross-subject generalization. We collected 4,800 EEG trials and 4,800 EMG trials from 10 participants using only twenty EEG and five EMG channels, demonstrating the feasibility of minimal-channel decoding. Despite employing lightweight modules, our model outperforms state-of-the-art baselines across all conditions, achieving average classification accuracies of 87.83% for audible speech and 88.08% for silent speech. In cross-subject evaluations, it still maintains strong performance with accuracies of 83.27% and 85.10% for audible and silent speech, respectively. We further conduct ablation studies to validate the effectiveness of each component. Our findings suggest that tone-level decoding with minimal EEG-EMG channels is feasible and potentially generalizable across subjects, contributing to the development of practical BCI applications.

翻译：脑机接口语音解码已成为辅助言语障碍患者的一种有前景的工具。在此背景下，脑电图与肌电图信号的融合为提升解码性能提供了巨大潜力。汉语声调分类面临特殊挑战，因为即使在音素相同的情况下，声调变化也传达不同的语义。本研究提出了一种新颖的跨被试多模态脑机接口解码框架，融合脑电与肌电信号，在可听语音与无声语音两种条件下对四个汉语声调进行分类。受言语产生中神经与肌肉系统协同机制的启发，我们的神经解码架构将时空特征提取分支与交叉注意力融合机制相结合，实现了模态间信息交互。我们进一步引入域对抗训练以提升跨被试泛化能力。我们仅使用二十个脑电通道和五个肌电通道，从十名参与者处采集了4,800次脑电试验和4,800次肌电试验，证明了极简通道解码的可行性。尽管采用轻量级模块，我们的模型在所有条件下均优于现有先进基线，在可听语音和无声语音中分别实现了87.83%和88.08%的平均分类准确率。在跨被试评估中，模型仍保持强劲性能，可听语音和无声语音的准确率分别为83.27%和85.10%。我们进一步进行了消融实验以验证各组件有效性。研究结果表明，基于极简脑电-肌电通道的声调级解码是可行的，且可能具备跨被试泛化能力，有助于推动实用脑机接口应用的发展。