The translation of brain dynamics into natural language is pivotal for brain-computer interfaces (BCIs). With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it realizes translation on raw waves without marker by introducing text-EEG contrastive alignment training, and 2) it alleviates the interference caused by individual differences in EEG waves through an invariant discrete codex with or without markers. Our model surpasses the previous baseline (40.1 and 31.7) by 3.06% and 6.34%, respectively, achieving 41.35 BLEU-1 and 33.71 Rouge-F on the ZuCo Dataset. This work is the first to facilitate the translation of entire EEG signal periods without word-level order markers (e.g., eye fixations), scoring 20.5 BLEU-1 and 29.5 Rouge-1 on the ZuCo Dataset.
翻译:将大脑动态翻译为自然语言是脑机接口(BCI)的关键任务。随着ChatGPT等大型语言模型的快速发展,弥合大脑与语言之间鸿沟的需求日益迫切。然而,当前方法需依赖眼动追踪注视点或事件标记将脑动态分割为词级特征,这限制了此类系统的实际应用。针对这些问题,我们提出新型框架DeWave,通过将离散编码序列整合至开放词汇的脑电(EEG)到文本翻译任务中。DeWave采用量化变分编码器提取离散代码编码,并将其与预训练语言模型对齐。该离散编码表示具有两大优势:1)通过引入文本-EEG对比对齐训练,无需标记即可对原始脑电波进行翻译;2)通过无标记/有标记条件下不变的离散编码,缓解脑电波个体差异带来的干扰。在ZuCo数据集上,我们的模型以41.35 BLEU-1和33.71 Rouge-F指标,分别超越先前基线(40.1与31.7)达3.06%与6.34%。本研究首次实现无词级顺序标记(如眼动注视点)的全脑电周期翻译,在ZuCo数据集上取得20.5 BLEU-1与29.5 Rouge-1评分。