RAICL: Retrieval-Augmented In-Context Learning for Vision-Language-Model Based EEG Seizure Detection

Electroencephalogram (EEG) decoding is a critical component of medical diagnostics, rehabilitation engineering, and brain-computer interfaces. However, contemporary decoding methodologies remain heavily dependent on task-specific datasets to train specialized neural network architectures. Consequently, limited data availability impedes the development of generalizable large brain decoding models. In this work, we propose a paradigm shift from conventional signal-based decoding by leveraging large-scale vision-language models (VLMs) to analyze EEG waveform plots. By converting multivariate EEG signals into stacked waveform images and integrating neuroscience domain expertise into textual prompts, we demonstrate that foundational VLMs can effectively differentiate between different patterns in the human brain. To address the inherent non-stationarity of EEG signals, we introduce a Retrieval-Augmented In-Context Learning (RAICL) approach, which dynamically selects the most representative and relevant few-shot examples to condition the autoregressive outputs of the VLM. Experiments on EEG-based seizure detection indicate that state-of-the-art VLMs under RAICL achieved better or comparable performance with traditional time series based approaches. These findings suggest a new direction in physiological signal processing that effectively bridges the modalities of vision, language, and neural activities. Furthermore, the utilization of off-the-shelf VLMs, without the need for retraining or downstream architecture construction, offers a readily deployable solution for clinical applications.

翻译：脑电图（EEG）解码是医学诊断、康复工程和脑机接口的关键组成部分。然而，当代解码方法仍然严重依赖特定任务的数据集来训练专门的神经网络架构。因此，有限的数据可用性阻碍了通用大型脑解码模型的发展。在本工作中，我们提出了一种范式转变，即利用大规模视觉语言模型（VLMs）分析脑电图波形图，从而超越传统的基于信号的解码方法。通过将多变量脑电信号转换为堆叠的波形图像，并将神经科学领域的专业知识整合到文本提示中，我们证明了基础视觉语言模型能够有效区分人脑中的不同模式。为了应对脑电信号固有的非平稳性，我们引入了一种检索增强上下文学习（RAICL）方法，该方法动态选择最具代表性和相关性的少样本示例，以调节视觉语言模型的自回归输出。基于脑电图的癫痫检测实验表明，采用RAICL的先进视觉语言模型取得了优于或可与传统基于时间序列方法相媲美的性能。这些发现为生理信号处理指明了一个新方向，有效融合了视觉、语言和神经活动模态。此外，利用现成的视觉语言模型，无需重新训练或构建下游架构，为临床应用提供了一个易于部署的解决方案。