Modern studies in radiograph representation learning rely on either self-supervision to encode invariant semantics or associated radiology reports to incorporate medical expertise, while the complementarity between them is barely noticed. To explore this, we formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM). In practice, MRM reconstructs masked image patches and masked report tokens following a multi-task scheme to learn knowledge-enhanced semantic representations. With MRM pre-training, we obtain pre-trained models that can be well transferred to various radiography tasks. Specifically, we find that MRM offers superior performance in label-efficient fine-tuning. For instance, MRM achieves 88.5% mean AUC on CheXpert using 1% labeled data, outperforming previous R$^2$L methods with 100% labels. On NIH ChestX-ray, MRM outperforms the best performing counterpart by about 3% under small labeling ratios. Besides, MRM surpasses self- and report-supervised pre-training in identifying the pneumonia type and the pneumothorax area, sometimes by large margins.
翻译:现代放射影像表征学习研究依赖于自监督编码不变语义或关联放射学报告融合医学专业知识,但两者间的互补性鲜少被关注。为探索此问题,我们将自监督与报告补全建模为互补目标,提出基于掩码记录建模(MRM)的统一框架。实践中,MRM采用多任务策略重建掩码图像块与掩码报告标记,以学习知识增强的语义表征。通过MRM预训练,我们获得的预训练模型可良好迁移至多种放射影像任务。特别地,我们发现MRM在标签高效微调中表现优异。例如,使用1%标记数据时,MRM在CheXpert上达到88.5%的平均AUC,超越此前使用100%标签的R$^2$L方法。在NIH ChestX-ray数据集上,MRM在低标注比例下以约3%的优势超越最优对比方法。此外,在肺炎类型鉴别与气胸区域识别任务中,MRM显著超越自监督与报告监督预训练方法,部分场景优势尤为明显。