EEG foundation models (EEG-FMs) have been evaluated predominantly on clean, in-distribution accuracy, leaving their robustness, interpretability and representational quality largely unexamined. This study addresses these gaps by benchmarking six EEG-FMs against a baseline deep learning model across eight datasets. Beyond clean accuracy, we conduct three layers of analysis: (i) Robustness: we apply test-time perturbations including additive noise, random and region-based channel dropout and region-specific noise injection. Our analyses show that no single model dominates all failure modes. The most noise-robust model is among the most fragile under channel dropout and much of the dropout fragility disappears when channels are removed rather than zero-padded. (ii) Interpretability: we present the first application of Attention-Aware Layer-Wise Relevance Propagation (AttnLRP) to EEG-FMs and show that models broadly concentrate relevance on task-appropriate brain regions consistent with known neurophysiology. However, attribution maps remain spatially stable under perturbation while predictions degrade, suggesting that the models attend to the correct brain regions but decode corrupted content. (iii) Expressiveness: With block-wise probing we show that late blocks are repurposed during fine-tuning, while early blocks already hold task-related information. Furthermore, we demonstrate that the poor head-only performance previously attributed to low-quality pre-trained representations is largely explained by pooling and that EEG-FMs possess sufficient representational capacity when their token-level embeddings are preserved. Together, these findings provide the first systematic assessment of robustness, interpretability and expressiveness for EEG-FMs and highlight critical considerations for their development.
翻译:脑电图基础模型(EEG-FMs)主要基于干净且分布内数据的准确率进行评估,而其鲁棒性、可解释性与表征质量在很大程度上尚未得到检验。本研究通过将六种EEG-FM与一个基线深度学习模型在八个数据集上进行基准测试,填补了这些空白。除了干净数据上的准确率,我们开展了三个层面的分析:(i)鲁棒性:我们施加了测试时扰动,包括加性噪声、随机和基于区域的信道丢弃以及特定区域的噪声注入。分析表明,没有单一模型在所有失效模式下占据主导地位。抗噪声最强的模型在信道丢弃情境下最为脆弱,且当信道被移除而非零填充时,大部分丢弃脆弱性消失。(ii)可解释性:我们首次将注意力感知的逐层相关性传播(AttnLRP)应用于EEG-FM,并表明这些模型普遍将相关性集中在与已知神经生理学一致、任务相关的大脑区域。然而,在预测性能下降的情况下,归因图在扰动下仍保持空间稳定性,这表明模型关注了正确的脑区但解码了受损的内容。(iii)表达能力:通过分块探查,我们发现微调期间后期模块被重新利用,而早期模块已包含任务相关信息。此外,我们证明先前归因于低质量预训练表示的弱头部性能在很大程度上可由池化操作解释,并且当保留其令牌级嵌入时,EEG-FM具备足够的表征能力。综合而言,这些发现首次系统评估了EEG-FM的鲁棒性、可解释性与表达能力,并为其开发提出了关键考量。