Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. While such approaches deliver promising results, they do not leverage associated patient or scan information collected within Electronic Health Records (EHR). Here, we propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest X-ray representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate our approach on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations via linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and state-of-the-art self-supervised learning baselines. Our work highlights the potential of EHR-enhanced self-supervised pre-training for medical imaging. The code is publicly available at: https://github.com/nyuad-cai/CXR-EHR-MSN
翻译:针对医学图像的自监督学习方法在预训练阶段主要依赖成像模态。尽管此类方法取得了有希望的结果,但它们并未利用电子健康记录中收集的相关患者或扫描信息。在此,我们提出在自监督预训练中结合EHR数据,采用掩码孪生网络来提升胸部X光表征的质量。我们研究了三种类型的EHR数据,包括人口统计学信息、扫描元数据和住院信息。我们在三个公开可用的胸部X光数据集(MIMIC-CXR、CheXpert和NIH-14)上评估了我们的方法,使用了两种视觉Transformer骨干网络,具体为ViT-Tiny和ViT-Small。通过线性评估来衡量表征质量,我们提出的方法相较于原始MSN以及最先进的自监督学习基线,均表现出显著提升。我们的工作凸显了EHR增强的自监督预训练在医学成像领域的潜力。代码公开于:https://github.com/nyuad-cai/CXR-EHR-MSN