Non-invasive decoding of imagined speech remains challenging due to weak, distributed signals and limited labeled data. Our paper introduces an image-based approach that transforms magnetoencephalography (MEG) signals into time-frequency representations compatible with pretrained vision models. MEG data from 21 participants performing imagined speech tasks were projected into three spatial scalogram mixtures via a learnable sensor-space convolution, producing compact image-like inputs for ImageNet-pretrained vision architectures. These models outperformed classical and non-pretrained models, achieving up to 90.4% balanced accuracy for imagery vs. silence, 81.0% vs. silent reading, and 60.6% for vowel decoding. Cross-subject evaluation confirmed that pretrained models capture shared neural representations, and temporal analyses localized discriminative information to imagery-locked intervals. These findings show that pretrained vision models applied to image-based MEG representations can effectively capture the structure of imagined speech in non-invasive neural signals.
翻译:由于信号微弱分散且标记数据有限,非侵入式想象语音解码仍具挑战性。本文提出一种基于图像的方法,将脑磁图信号转换为与预训练视觉模型兼容的时频表示。通过可学习的传感器空间卷积,将21名参与者执行想象语音任务的MEG数据投影为三种空间尺度图混合表示,生成适用于ImageNet预训练视觉架构的紧凑类图像输入。这些模型性能超越经典模型及未预训练模型,在想象与静默对比中达到90.4%的平衡准确率,想象与默读对比达81.0%,元音解码达60.6%。跨被试评估证实预训练模型能捕获共享神经表征,时序分析将判别信息定位至想象锁定区间。研究结果表明,应用于基于图像的MEG表示的预训练视觉模型能有效捕捉非侵入神经信号中想象语音的结构特征。