With the advance of deep learning, much progress has been made in building powerful artificial intelligence (AI) systems for automatic Chest X-ray (CXR) analysis. Most existing AI models are trained to be a binary classifier with the aim of distinguishing positive and negative cases. However, a large gap exists between the simple binary setting and complicated real-world medical scenarios. In this work, we reinvestigate the problem of automatic radiology diagnosis. We first observe that there is considerable diversity among cases within the positive class, which means simply classifying them as positive loses many important details. This motivates us to build AI models that can communicate fine-grained knowledge from medical images like human experts. To this end, we first propose a new benchmark on fine granularity learning from medical images. Specifically, we devise a division rule based on medical knowledge to divide positive cases into two subcategories, namely atypical positive and typical positive. Then, we propose a new metric termed AUC$^\text{FG}$ on the two subcategories for evaluation of the ability to separate them apart. With the proposed benchmark, we encourage the community to develop AI diagnosis systems that could better learn fine granularity from medical images. Last, we propose a simple risk modulation approach to this problem by only using coarse labels in training. Empirical results show that despite its simplicity, the proposed method achieves superior performance and thus serves as a strong baseline.
翻译:随着深度学习的进步,构建用于自动胸部X光片分析的强大人工智能系统已取得显著进展。现有的大多数人工智能模型被训练为二元分类器,旨在区分阳性和阴性病例。然而,这种简单的二元设定与复杂的真实世界医疗场景之间存在巨大差距。在本工作中,我们重新审视了自动放射学诊断问题。我们首先观察到阳性类别内的病例存在相当大的多样性,这意味着仅将其分类为阳性会丢失许多重要细节。这促使我们构建能够像人类专家一样从医学图像中传达细粒度知识的人工智能模型。为此,我们首先提出了一个关于从医学图像中学习细粒度知识的新基准。具体而言,我们设计了一种基于医学知识的划分规则,将阳性病例分为两个子类别,即非典型阳性和典型阳性。随后,我们提出了一个名为AUC$^\text{FG}$的新度量标准,用于评估模型区分这两个子类别的能力。通过提出的基准,我们鼓励社区开发能够更好地从医学图像中学习细粒度信息的人工智能诊断系统。最后,我们针对此问题提出了一种简单的风险调制方法,该方法在训练中仅使用粗粒度标签。实证结果表明,尽管方法简单,所提出的方法仍取得了优异的性能,因此可作为一个强有力的基线。