A Temporal-Spectral Fusion Transformer with Subject-specific Adapter for Enhancing RSVP-BCI Decoding

The Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an efficient technology for target retrieval using electroencephalography (EEG) signals. The performance improvement of traditional decoding methods relies on a substantial amount of training data from new test subjects, which increases preparation time for BCI systems. Several studies introduce data from existing subjects to reduce the dependence of performance improvement on data from new subjects, but their optimization strategy based on adversarial learning with extensive data increases training time during the preparation procedure. Moreover, most previous methods only focus on the single-view information of EEG signals, but ignore the information from other views which may further improve performance. To enhance decoding performance while reducing preparation time, we propose a Temporal-Spectral fusion transformer with Subject-specific Adapter (TSformer-SA). Specifically, a cross-view interaction module is proposed to facilitate information transfer and extract common representations across two-view features extracted from EEG temporal signals and spectrogram images. Then, an attention-based fusion module fuses the features of two views to obtain comprehensive discriminative features for classification. Furthermore, a multi-view consistency loss is proposed to maximize the feature similarity between two views of the same EEG signal. Finally, we propose a subject-specific adapter to rapidly transfer the knowledge of the model trained on data from existing subjects to decode data from new subjects. Experimental results show that TSformer-SA significantly outperforms comparison methods and achieves outstanding performance with limited training data from new subjects. This facilitates efficient decoding and rapid deployment of BCI systems in practical use.

翻译：基于快速序列视觉呈现（RSVP）的脑机接口（BCI）是一种利用脑电图（EEG）信号进行目标检索的高效技术。传统解码方法的性能提升依赖于大量来自新受试者的训练数据，这增加了BCI系统的准备时间。部分研究引入现有受试者的数据以减少对新受试者数据的依赖，但其基于对抗学习的优化策略因需大量数据而延长了准备过程中的训练时间。此外，以往方法多聚焦于EEG信号的单一视图信息，忽略了可能进一步改善性能的其他视角信息。为提升解码性能并缩短准备时间，我们提出了一种带有主题自适应适配器的时间-频谱融合Transformer（TSformer-SA）。具体而言，我们设计了一个跨视图交互模块，以促进从EEG时间信号和频谱图像中提取的双视图特征间的信息传递与共性表征学习。随后，基于注意力机制的融合模块整合双视图特征，获得用于分类的综合判别特征。同时，提出多视图一致性损失函数以最大化同一EEG信号双视图特征间的相似性。最后，我们开发了一种主题自适应适配器，可快速将基于现有受试者数据训练的模型知识迁移至新受试者的解码任务。实验结果表明，TSformer-SA显著优于对比方法，且仅需少量新受试者训练数据即可实现卓越性能，这为BCI系统的高效解码与快速实际部署提供了支持。