Despite of the success of multi-modal foundation models pre-trained on large-scale data in natural language understanding and vision recognition, its counterpart in medical and clinical domains remains preliminary, due to the fine-grained recognition nature of the medical tasks with high demands on domain knowledge. Here, we propose a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. The algorithm, named Knowledge-enhanced Auto Diagnosis~(KAD), first trains a knowledge encoder based on an existing medical knowledge graph, i.e., learning neural embeddings of the definitions and relationships between medical concepts and then leverages the pre-trained knowledge encoder to guide the visual representation learning with paired chest X-rays and radiology reports. We experimentally validate KAD's effectiveness on three external X-ray datasets. The zero-shot performance of KAD is not only comparable to that of the fully-supervised models but also, for the first time, superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. When the few-shot annotation is available, KAD also surpasses all existing approaches in finetuning settings, demonstrating the potential for application in different clinical scenarios.
翻译:尽管在大规模数据上预训练的多模态基础模型在自然语言理解和视觉识别领域取得了成功,其在医学和临床领域的应用仍处于初级阶段,这是由于医学任务具有细粒度识别特性且对领域知识要求较高。为此,我们提出一种知识增强的视觉-语言预训练方法,用于胸部X光影像的自动诊断。该算法名为知识增强自动诊断(KAD),首先基于现有医学知识图谱训练知识编码器——学习医学概念定义及其关系的神经嵌入表示,进而利用预训练的知识编码器引导配对的胸部X光片与放射学报告的视觉表征学习。我们在三个外部X光数据集上实验验证了KAD的有效性。KAD的零样本性能不仅可与全监督模型相媲美,更首次在三种(共五类)病理学诊断中显著优于三位放射科专家医生的平均诊断水平。当具备少量标注样本时,KAD在微调设置下同样超越所有现有方法,展示了在不同临床场景中的应用潜力。