Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

翻译：医学数据集，尤其是生物样本库中，除影像数据外，通常包含大量富含临床信息的表格化数据。实践中，临床医生拥有的数据往往在多样性和规模上均显不足，但仍希望部署深度学习解决方案。随着医学数据集规模日益扩大且标注成本高昂，对能够进行多模态预训练并实现单模态预测的无监督方法的需求日益迫切。为应对这些挑战，我们提出了首个利用影像数据和表格数据训练单模态编码器的自监督对比学习框架。该方案结合了SimCLR和SCARF两种领先的对比学习策略，兼具简洁性与高效性。实验表明，我们利用40,000名英国生物样本库受试者的心脏磁共振影像及120项临床特征，通过预测心肌梗死和冠状动脉疾病风险，验证了该框架的优势。此外，我们采用DVM汽车广告数据集展示了该方法在自然图像上的泛化能力。借助表格数据的高可解释性，通过归因与消融实验发现，描述尺寸与形态的形态测量表格特征在对比学习过程中具有突出重要性，并能提升所学嵌入表示的质量。最后，我们提出一种新型监督对比学习方法——标签作为特征（LaaF），即多模态预训练阶段将真实标签作为表格特征附加输入，该方法在性能上超越了所有监督对比学习基线。