Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold gradually over minutes, most methods pre-train with only a few seconds of context. Thus, we propose MEG-XL, a model pre-trained with 2.5 minutes of MEG context per sample, 5-300x longer than prior work, and equivalent to 191k tokens, capturing extended neural context. Fine-tuning on the task of word decoding from brain data, MEG-XL matches supervised performance with a fraction of the data (e.g. 1hr vs 50hrs) and outperforms brain foundation models. We find that models pre-trained with longer contexts learn representations that transfer better to word decoding. Our results indicate that long-context pre-training helps exploit extended neural context that other methods unnecessarily discard. Code, model weights, and instructions are available at https://github.com/neural-processing-lab/MEG-XL .
翻译:临床大脑到文本接口专为无法提供大量训练记录的瘫痪患者设计。预训练通过跨被试学习统计先验来提升数据高效的泛化能力,但这些先验高度依赖于上下文。虽然自然语音可能持续数分钟,但现有方法大多仅使用数秒上下文进行预训练。为此,我们提出MEG-XL模型,该模型以每个样本2.5分钟的脑磁图上下文进行预训练,其长度是先前工作的5-300倍,相当于19.1万个标记,能够捕捉扩展的神经上下文。在基于脑数据的词汇解码任务上进行微调时,MEG-XL仅需少量数据(例如1小时对比50小时)即可达到监督学习性能,并超越现有大脑基础模型。我们发现,采用更长上下文预训练的模型能够学习到迁移性更强的表征,从而提升词汇解码效果。我们的研究结果表明,长上下文预训练有助于利用扩展的神经上下文信息,而其他方法则无必要地丢弃了这些信息。代码、模型权重及使用说明详见 https://github.com/neural-processing-lab/MEG-XL 。