In this study, we demonstrate the application of a hybrid Vision Transformer (ViT) model, pretrained on ImageNet, on an electroencephalogram (EEG) regression task. Despite being originally trained for image classification tasks, when fine-tuned on EEG data, this model shows a notable increase in performance compared to other models, including an identical architecture ViT trained without the ImageNet weights. This discovery challenges the traditional understanding of model generalization, suggesting that Transformer models pretrained on seemingly unrelated image data can provide valuable priors for EEG regression tasks with an appropriate fine-tuning pipeline. The success of this approach suggests that the features extracted by ViT models in the context of visual tasks can be readily transformed for the purpose of EEG predictive modeling. We recommend utilizing this methodology not only in neuroscience and related fields, but generally for any task where data collection is limited by practical, financial, or ethical constraints. Our results illuminate the potential of pretrained models on tasks that are clearly distinct from their original purpose.
翻译:在本研究中,我们展示了混合视觉Transformer(ViT)模型(在ImageNet上预训练)在脑电图(EEG)回归任务中的应用。尽管该模型最初是为图像分类任务训练的,但在EEG数据上微调后,其性能相比其他模型(包括未使用ImageNet权重的相同架构ViT)显著提升。这一发现挑战了传统模型泛化理论,表明在看似无关的图像数据上预训练的Transformer模型,通过适当的微调流程,能为EEG回归任务提供有价值的先验知识。该方法的成功表明,ViT模型在视觉任务中提取的特征可便捷地转化为EEG预测建模所需特征。我们建议不仅在神经科学及相关领域推广该方法,且适用于任何因实践、经济或伦理限制导致数据采集受限的任务。我们的研究结果揭示了预训练模型在处理与其原始目标截然不同的任务时的巨大潜力。