LEAD: An EEG Foundation Model for Alzheimer's Disease Detection

Electroencephalography (EEG) provides a non-invasive, highly accessible, and cost-effective approach for detecting Alzheimer's disease (AD). However, existing methods, whether based on handcrafted feature engineering or standard deep learning, face three major challenges: 1) the lack of large-scale EEG-based AD datasets for robust representation learning; 2) limited generalizability across subjects; and 3) difficulty in adapting to highly heterogeneous data. To address these challenges, we curate the world's largest EEG-AD corpus to date, comprising 2,238 subjects. Leveraging this unique resource, we propose LEAD, the first large-scale foundation model for EEG-based AD detection. Specifically, we design a gated temporal-spatial Transformer that can adapt to EEG recordings with arbitrary lengths, channel configurations, and sampling rates. In addition, we introduce a subject-regularized training strategy to enhance subject-level feature learning. We further employ medical contrastive learning for pre-training on 13 datasets, including 4 AD datasets and 9 non-AD neurological disorder datasets, and fine-tune/test the model on the other 5 AD datasets. LEAD achieves the best average ranking across all 20 evaluations on 5 downstream datasets, substantially outperforming existing approaches, including state-of-the-art (SOTA) EEG foundation models. These results strongly demonstrate the effectiveness and practical potential of the proposed method for real-world EEG-based AD detection. Source code: https://github.com/DL4mHealth/LEAD

翻译：脑电图（EEG）为检测阿尔茨海默病（AD）提供了一种非侵入性、高度可及且成本效益高的方法。然而，现有方法无论是基于手工特征工程还是标准深度学习，都面临三大挑战：1）缺乏用于稳健表征学习的大规模基于EEG的AD数据集；2）跨被试的泛化能力有限；3）难以适应高度异质的数据。为应对这些挑战，我们构建了迄今为止全球最大的EEG-AD数据集，包含2,238名被试。利用这一独特资源，我们提出了LEAD，首个用于基于EEG的AD检测的大规模基础模型。具体而言，我们设计了一种门控时空Transformer，能够适应任意长度、通道配置和采样率的EEG记录。此外，我们引入了一种被试正则化训练策略以增强被试层面的特征学习。我们进一步采用医学对比学习在13个数据集上进行预训练，其中包括4个AD数据集和9个非AD神经系统疾病数据集，并在另外5个AD数据集上对模型进行微调/测试。LEAD在5个下游数据集的所有20项评估中取得了最佳平均排名，显著优于现有方法，包括最先进的（SOTA）EEG基础模型。这些结果有力地证明了所提方法在现实世界基于EEG的AD检测中的有效性和实际潜力。源代码：https://github.com/DL4mHealth/LEAD

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

基于 Transformer 的脑电解码综述询问 ChatGPT

专知会员服务

12+阅读 · 2025年7月6日