Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

Invasive brain-computer interfaces with Electrocorticography (ECoG) have shown promise for high-performance speech decoding in medical applications, but less damaging methods like intracranial stereo-electroencephalography (sEEG) remain underexplored. With rapid advances in representation learning, leveraging abundant recordings to enhance speech decoding is increasingly attractive. However, popular methods often pre-train temporal models based on brain-level tokens, overlooking that brain activities in different regions are highly desynchronized during tasks. Alternatively, they pre-train spatial-temporal models based on channel-level tokens but fail to evaluate them on challenging tasks like speech decoding, which requires intricate processing in specific language-related areas. To address this issue, we collected a well-annotated Chinese word-reading sEEG dataset targeting language-related brain networks from 12 subjects. Using this benchmark, we developed the Du-IN model, which extracts contextual embeddings based on region-level tokens through discrete codex-guided mask modeling. Our model achieves state-of-the-art performance on the 61-word classification task, surpassing all baselines. Model comparisons and ablation studies reveal that our design choices, including (i) temporal modeling based on region-level tokens by utilizing 1D depthwise convolution to fuse channels in the ventral sensorimotor cortex (vSMC) and superior temporal gyrus (STG) and (ii) self-supervision through discrete codex-guided mask modeling, significantly contribute to this performance. Overall, our approach -- inspired by neuroscience findings and capitalizing on region-level representations from specific brain regions -- is suitable for invasive brain modeling and represents a promising neuro-inspired AI approach in brain-computer interfaces.

翻译：基于皮层脑电图（ECoG）的侵入式脑机接口在医疗应用中已展现出实现高性能语音解码的潜力，但如颅内立体脑电图（sEEG）等损伤更小的方法仍未被充分探索。随着表征学习的快速发展，利用大量记录数据来增强语音解码能力日益受到关注。然而，主流方法通常基于脑区级标记预训练时序模型，忽视了不同脑区在任务执行期间活动的高度不同步性。另一种方案则基于通道级标记预训练时空模型，但未在语音解码这类需要特定语言相关脑区进行复杂处理的挑战性任务上对其有效性进行验证。为解决该问题，我们采集了一个来自12名受试者、针对语言相关脑网络的标注完善的中文词汇朗读sEEG数据集。基于此基准数据集，我们开发了Du-IN模型，该模型通过离散编码本引导的掩码建模方法，基于脑区级标记提取上下文嵌入表示。我们的模型在61词分类任务上取得了最先进的性能，超越了所有基线方法。模型对比与消融研究表明，我们的设计选择——包括（i）通过一维深度卷积融合腹侧感觉运动皮层（vSMC）和颞上回（STG）的通道信息，构建基于脑区级标记的时序建模；（ii）通过离散编码本引导的掩码建模实现自监督学习——对性能提升具有显著贡献。总体而言，我们的方法受神经科学发现启发，充分利用特定脑区的区域级表征，适用于侵入式脑建模，为脑机接口领域提供了一种具有前景的神经启发式人工智能路径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日