DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation

The translation of brain dynamics into natural language is pivotal for brain-computer interfaces (BCIs), a field that has seen substantial growth in recent years. With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. These event markers may not be readily available or could be challenging to acquire during real-time inference, and the sequence of eye fixations may not align with the order of spoken words. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it alleviates the order mismatch between eye fixations and spoken words by introducing text-EEG contrastive alignment training, and 2) it minimizes the interference caused by individual differences in EEG waves through an invariant discrete codex. Our model surpasses the previous baseline (40.1 and 31.7) by 3.06% and 6.34%, respectively, achieving 41.35 BLEU-1 and 33.71 Rouge-F on the ZuCo Dataset. Furthermore, this work is the first to facilitate the translation of entire EEG signal periods without needing word-level order markers (e.g., eye fixations), scoring 20.5 BLEU-1 and 29.5 Rouge-1 on the ZuCo Dataset, respectively. Codes and the final paper will be public soon.

翻译：将脑动态翻译为自然语言是脑机接口（BCI）的关键任务，近年来该领域发展显著。随着ChatGPT等大型语言模型的快速进步，弥合大脑与语言之间鸿沟的需求日益迫切。然而，当前方法需要依赖眼动注视点或事件标记来将脑动态分割为词级特征，这限制了系统的实际应用。这些事件标记在实时推理中可能难以获取，且眼动注视序列可能与口语词汇顺序不一致。为解决这些问题，我们提出了一种新型框架DeWave，将离散编码序列融入开放词汇的脑电图到文本翻译任务。DeWave采用量化变分编码器获取离散编码表示，并将其与预训练语言模型对齐。这种离散编码表示具有两大优势：1）通过引入文本-脑电图对比对齐训练，缓解了眼动注视与口语词汇之间的顺序错配；2）通过不变离散编码最小化脑电波的个体差异干扰。我们的模型在ZuCo数据集上以41.35的BLEU-1和33.71的Rouge-F分数超越先前基线（40.1和31.7），分别提升3.06%和6.34%。此外，本工作首次实现了无需词级顺序标记（如眼动注视点）的完整脑电图信号周期翻译，在ZuCo数据集上分别达到20.5的BLEU-1和29.5的Rouge-1分数。代码和最终论文将很快公开。