Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻译：医学数据集，尤其是生物样本库，除影像外常包含大量含有丰富临床信息的表格数据。实践中，临床医生通常拥有较少的数据（无论在多样性还是规模上），但仍希望部署深度学习解决方案。结合日益增长的医学数据集规模和昂贵的标注成本，能够进行多模态预训练并实现单模态预测的无监督方法需求日益迫切。为应对这些挑战，我们提出首个利用影像与表格数据训练单模态编码器的自监督对比学习框架。该方案融合了两种领先的对比学习策略SimCLR和SCARF，兼具简洁性与有效性。实验中，我们利用4万名英国生物样本库受试者的心脏磁共振影像与120项临床特征预测心肌梗死及冠状动脉疾病风险，验证了框架的显著性能。此外，我们通过DVM汽车广告数据集证明该方法可泛化至自然影像领域。借助表格数据的高可解释性，通过归因与消融实验发现：描述物体尺寸与形态的形态测量表格特征在对比学习过程中具有突出重要性，并能提升所学表征质量。最后，我们提出新型监督对比学习范式"标签即特征"（LaaF），在多模态预训练中将真实标签作为表格特征附加输入，其性能超越所有监督对比学习基线方法。