In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge. In particular, we make the following contributions: First, to explicitly incorporate experts' knowledge, we propose to learn a neural representation for the medical knowledge graph via contrastive learning, implicitly establishing relations between different medical concepts. Second, while training the visual encoder, we keep the parameters of the knowledge encoder frozen and propose to learn a set of prompt vectors for efficient adaptation. Third, we adopt a Transformer-based disease-query module for cross-model fusion, which naturally enables explainable diagnosis results via cross attention. To validate the effectiveness of our proposed framework, we conduct thorough experiments on three x-ray imaging datasets across different anatomy structures, showing our model is able to exploit the implicit relations between diseases/findings, thus is beneficial to the commonly encountered problem in the medical domain, namely, long-tailed and zero-shot recognition, which conventional methods either struggle or completely fail to realize.
翻译:本文研究疾病诊断问题。不同于将标签独立处理的传统学习范式,我们提出一种知识增强框架,借助医学领域知识指导视觉表征训练。具体而言,本研究作出以下贡献:第一,为显式整合专家知识,我们提出通过对比学习对医学知识图谱进行神经表征学习,隐式建立不同医学概念间的关联;第二,在训练视觉编码器过程中保持知识编码器参数固定,并提出学习一组提示向量以实现高效适应;第三,采用基于Transformer的疾病查询模块进行跨模态融合,通过交叉注意力机制实现可解释的诊断结果。为验证所提出框架的有效性,我们在三个涵盖不同解剖结构的X射线影像数据集上开展充分实验,结果表明模型能够挖掘疾病/发现之间的隐式关联,从而有效应对医学领域常见的尾部分布与零样本识别问题——而传统方法对此类问题要么效果欠佳,要么完全无法应对。