To be practical for real-life applications, models for brain-computer interfaces must be easily and quickly deployable on new subjects, effective on affordable scanning hardware, and small enough to run locally on accessible computing resources. To directly address these current limitations, we introduce ENIGMA, a multi-subject electroencephalography (EEG)-to-Image decoding model that reconstructs seen images from EEG recordings and achieves state-of-the-art (SOTA) performance on the research-grade THINGS-EEG2 and consumer-grade AllJoined-1.6M benchmarks, while fine-tuning effectively on new subjects with as little as 15 minutes of data. ENIGMA boasts a simpler architecture and requires less than 1% of the trainable parameters necessary for previous approaches. Our approach integrates a subject-unified spatio-temporal backbone along with a set of multi-subject latent alignment layers and an MLP projector to map raw EEG signals to a rich visual latent space. We evaluate our approach using a broad suite of image reconstruction metrics that have been standardized in the adjacent field of fMRI-to-Image research, and we describe the first EEG-to-Image study to conduct extensive behavioral evaluations of our reconstructions using human raters. Our simple and robust architecture provides a significant performance boost across both research-grade and consumer-grade EEG hardware, and a substantial improvement in fine-tuning efficiency and inference cost. Finally, we provide extensive ablations to determine the architectural choices most responsible for our performance gains in both single and multi-subject cases across multiple benchmark datasets. Collectively, our work provides a substantial step towards the development of practical brain-computer interface applications.
翻译:为了使脑机接口模型在实际应用中切实可行,其必须能够快速、便捷地部署于新受试者,在价格合理的扫描硬件上有效运行,并且模型体积足够小,以便在本地可获取的计算资源上运行。为了直接应对当前这些局限性,我们提出了ENIGMA,这是一个多受试者脑电图到图像的解码模型,它能够从脑电图记录中重建所见的图像。该模型在研究级的THINGS-EEG2基准测试和消费级的AllJoined-1.6M基准测试上均达到了最先进的性能,同时仅需15分钟的数据即可在新受试者身上进行有效的微调。ENIGMA架构更为简洁,所需的可训练参数不到以往方法的1%。我们的方法集成了一个受试者统一的时空骨干网络、一组多受试者潜在对齐层以及一个MLP投影器,用于将原始脑电信号映射到丰富的视觉潜在空间。我们使用一套在功能磁共振成像到图像研究邻域已标准化的广泛图像重建指标来评估我们的方法,并首次在脑电图到图像的研究中,通过人类评分者对重建结果进行了广泛的行为评估。我们简洁而鲁棒的架构在研究级和消费级脑电硬件上都带来了显著的性能提升,并在微调效率和推理成本方面实现了大幅改进。最后,我们进行了广泛的消融实验,以确定在多个基准数据集上,针对单受试者和多受试者情况,哪些架构选择对我们的性能提升贡献最大。总而言之,我们的工作为开发实用的脑机接口应用迈出了重要一步。