Invasive brain-computer interfaces with Electrocorticography (ECoG) have shown promise for high-performance speech decoding in medical applications, but less damaging methods like intracranial stereo-electroencephalography (sEEG) remain underexplored. With rapid advances in representation learning, leveraging abundant recordings to enhance speech decoding is increasingly attractive. However, popular methods often pre-train temporal models based on brain-level tokens, overlooking that brain activities in different regions are highly desynchronized during tasks. Alternatively, they pre-train spatial-temporal models based on channel-level tokens but fail to evaluate them on challenging tasks like speech decoding, which requires intricate processing in specific language-related areas. To address this issue, we collected a well-annotated Chinese word-reading sEEG dataset targeting language-related brain networks from 12 subjects. Using this benchmark, we developed the Du-IN model, which extracts contextual embeddings based on region-level tokens through discrete codex-guided mask modeling. Our model achieves state-of-the-art performance on the 61-word classification task, surpassing all baselines. Model comparisons and ablation studies reveal that our design choices, including (i) temporal modeling based on region-level tokens by utilizing 1D depthwise convolution to fuse channels in the lateral sensorimotor cortex (vSMC) and superior temporal gyrus (STG) and (ii) self-supervision through discrete codex-guided mask modeling, significantly contribute to this performance. Overall, our approach -- inspired by neuroscience findings and capitalizing on region-level representations from specific brain regions -- is suitable for invasive brain modeling and represents a promising neuro-inspired AI approach in brain-computer interfaces.
翻译:侵入式脑机接口结合皮层电图(ECoG)在医疗应用中已展现出实现高性能语音解码的潜力,但如颅内立体脑电图(sEEG)等损伤性更小的方法仍未得到充分探索。随着表征学习的快速发展,利用大量记录数据增强语音解码能力日益受到关注。然而,现有主流方法通常基于脑区级标记预训练时序模型,忽视了任务过程中不同脑区神经活动的高度异步性。另一种方案基于通道级标记预训练时空模型,但未在语音解码这类需要特定语言脑区复杂处理的挑战性任务中进行充分验证。为解决这一问题,我们采集了来自12名受试者、针对语言相关脑网络的标注完善的中文词汇朗读sEEG数据集。基于该基准数据集,我们提出了Du-IN模型,该模型通过离散编码本引导的掩码建模方法,基于脑区级标记提取上下文嵌入表示。我们的模型在61词分类任务中取得了最先进的性能,超越了所有基线方法。模型对比与消融实验表明,我们的设计选择——包括(i)通过一维深度卷积融合外侧感觉运动皮层(vSMC)与颞上回(STG)通道的脑区级标记时序建模,以及(ii)基于离散编码本引导掩码建模的自监督学习——对性能提升具有显著贡献。总体而言,受神经科学发现启发并充分利用特定脑区层级表征的该方法,适用于侵入式脑功能建模,为脑机接口领域提供了一条具有前景的神经启发式人工智能研究路径。