Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻译：医学数据集（尤其是生物样本库）通常包含大量具有丰富临床信息的表格型结构化数据，同时伴有影像资料。实际应用中，临床医生往往面临数据规模与多样性不足的困境，但仍希望部署深度学习解决方案。随着医疗数据规模持续扩大且标注成本高昂，开发能够进行多模态预训练并实现单模态预测的无监督方法变得至关重要。为此，我们首次提出一种利用影像与表格数据进行自监督对比学习以训练单模态编码器的框架。该方案融合了SimCLR与SCARF两种主流对比学习策略，兼具简洁性与有效性。实验证明，我们利用来自4万名英国生物样本库受试者的心脏磁共振影像及120项临床特征，成功预测了心肌梗死与冠状动脉疾病（CAD）风险。此外，通过DVM汽车广告数据集验证了本方法在自然图像领域的可泛化性。我们充分利用表格数据的高可解释性，通过归因实验与消融实验发现，描述尺寸与形态的形态测量学表格特征在对比学习过程中具有突出重要性，并能提升学习表征的质量。最后，我们提出一种新型监督对比学习方法——特征化标签（LaaF），通过将真实标签作为表格特征附加到多模态预训练中，其性能超越了所有监督对比学习基线方法。