Locating pathologies automatically from medical images aids the understanding of the emergence and progression of diseases, and such an ability can significantly benefit clinical diagnostics. However, existing deep learning models heavily rely on expert annotations and lack generalization capabilities in open clinical environments. In this study, we present a generalizable vision-language pre-training model for Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc lies in its image annotation-free multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts from reports with abundant image features, to adapt to the diverse expressions of observed and emerging unseen pathologies. We conducted extensive experimental validation across 4 distinct external datasets, encompassing 11 types of chest pathologies, to verify its generalization ability. The results demonstrate that AFLoc surpasses 6 state-of-the-art methods and even outperforms the human benchmark in locating 5 different pathologies, underscoring its suitability for complex clinical environments.
翻译:从医学图像中自动定位病理区域有助于理解疾病的产生与发展进程,这种能力可显著提升临床诊断效率。然而现有深度学习模型严重依赖专家标注,在开放临床环境中缺乏泛化能力。本研究提出一种可泛化的视觉-语言预训练模型AFLoc,用于实现无标注病理定位。AFLoc的核心优势在于其基于多层级语义结构的无图像标注对比学习,该机制通过将放射报告中多粒度医学概念与丰富的图像特征进行全局对齐,从而适应既有病理与新型未知病理的多样化表现形式。我们在涵盖11种胸部病理的4个不同外部数据集上进行了广泛实验验证,结果表明AFLoc在5种病理定位任务中超越了6种最先进方法,甚至优于人类基准,充分证明其适用于复杂临床环境。