Multimodal language modeling constitutes a recent breakthrough which leverages advances in large language models to pretrain capable multimodal models. The integration of natural language during pretraining has been shown to significantly improve learned representations, particularly in computer vision. However, the efficacy of multimodal language modeling in the realm of functional brain data, specifically for advancing pathology detection, remains unexplored. This study pioneers EEG-language models trained on clinical reports and 15000 EEGs. We extend methods for multimodal alignment to this novel domain and investigate which textual information in reports is useful for training EEG-language models. Our results indicate that models learn richer representations from being exposed to a variety of report segments, including the patient's clinical history, description of the EEG, and the physician's interpretation. Compared to models exposed to narrower clinical text information, we find such models to retrieve EEGs based on clinical reports (and vice versa) with substantially higher accuracy. Yet, this is only observed when using a contrastive learning approach. Particularly in regimes with few annotations, we observe that representations of EEG-language models can significantly improve pathology detection compared to those of EEG-only models, as demonstrated by both zero-shot classification and linear probes. In sum, these results highlight the potential of integrating brain activity data with clinical text, suggesting that EEG-language models represent significant progress for clinical applications.
翻译:多模态语言建模是近期的一项突破性进展,它利用大型语言模型的进步来预训练高效的多模态模型。预训练期间自然语言的整合已被证明能显著改善学习到的表征,尤其在计算机视觉领域。然而,多模态语言建模在功能性脑数据领域——特别是用于推进病理检测方面的有效性,仍未被探索。本研究开创性地在临床报告和15000份脑电图数据上训练了脑电图-语言模型。我们将多模态对齐方法扩展至这一新领域,并探究了报告中哪些文本信息对训练脑电图-语言模型是有用的。我们的结果表明,模型通过接触多样化的报告片段(包括患者的临床病史、脑电图描述及医生的解读)能学习到更丰富的表征。与仅接触有限临床文本信息的模型相比,我们发现此类模型能更准确地基于临床报告检索脑电图(反之亦然)。然而,这一优势仅在采用对比学习方法时才能观察到。特别是在标注数据稀缺的情况下,我们观察到脑电图-语言模型的表征相比纯脑电图模型能显著提升病理检测性能,这一点通过零样本分类和线性探针实验均得到了验证。总之,这些结果突显了将脑活动数据与临床文本相整合的潜力,表明脑电图-语言模型代表了临床应用的重要进展。