Generating clinical reports that summarize abnormal patterns, diagnostic findings, and clinical interpretations from long-term EEG recordings remains labor-intensive. We curate a large-scale clinical EEG dataset with $9{,}922$ reports paired with approximately $11{,}000$ hours of EEG recordings from $9{,}048$ patients. We therefore develop CELM, the first clinical EEG-to-Language foundation model capable of summarizing long-duration, variable-length EEG recordings and performing end-to-end clinical report generation at multiple scales, including recording description, background activity, epileptiform abnormalities, events/seizures, and impressions. Experimental results show that, with patient history supervision, our method achieves $70\%$-$95\%$ average relative improvements in standard generation metrics (e.g., ROUGE-1 and METEOR) from $0.2$-$0.3$ to $0.4$-$0.6$. In the zero-shot setting without patient history, CELM attains generation scores in the range of $0.43$-$0.52$, compared to baselines of $0.17$-$0.26$. CELM integrates pretrained EEG foundation models with language models to enable scalable multimodal learning. We release our model and benchmark construction pipeline at https://github.com/Jathurshan0330/CELM.
翻译:从长期脑电图记录中生成总结异常模式、诊断发现和临床解读的临床报告仍然是一项劳动密集型任务。我们构建了一个大规模临床脑电图数据集,包含与来自9,048名患者的约11,000小时脑电图记录配对的9,922份报告。为此,我们开发了CELM,这是首个临床脑电图到语言的基座模型,能够总结长时程、可变长度的脑电图记录,并在多个尺度上执行端到端的临床报告生成,包括记录描述、背景活动、癫痫样异常、事件/癫痫发作以及印象。实验结果表明,在患者病史监督下,我们的方法在标准生成指标(如ROUGE-1和METEOR)上实现了70%至95%的平均相对提升,从0.2-0.3提高到0.4-0.6。在无患者病史的零样本设置下,CELM获得的生成分数在0.43-0.52范围内,而基线方法为0.17-0.26。CELM将预训练的脑电图基座模型与语言模型相结合,以实现可扩展的多模态学习。我们在https://github.com/Jathurshan0330/CELM 发布了我们的模型和基准构建流程。