In the field of chest X-ray (CXR) diagnosis, existing works often focus solely on determining where a radiologist looks, typically through tasks such as detection, segmentation, or classification. However, these approaches are often designed as black-box models, lacking interpretability. In this paper, we introduce Interpretable Artificial Intelligence (I-AI) a novel and unified controllable interpretable pipeline for decoding the intense focus of radiologists in CXR diagnosis. Our I-AI addresses three key questions: where a radiologist looks, how long they focus on specific areas, and what findings they diagnose. By capturing the intensity of the radiologist's gaze, we provide a unified solution that offers insights into the cognitive process underlying radiological interpretation. Unlike current methods that rely on black-box machine learning models, which can be prone to extracting erroneous information from the entire input image during the diagnosis process, we tackle this issue by effectively masking out irrelevant information. Our proposed I-AI leverages a vision-language model, allowing for precise control over the interpretation process while ensuring the exclusion of irrelevant features. To train our I-AI model, we utilize an eye gaze dataset to extract anatomical gaze information and generate ground truth heatmaps. Through extensive experimentation, we demonstrate the efficacy of our method. We showcase that the attention heatmaps, designed to mimic radiologists' focus, encode sufficient and relevant information, enabling accurate classification tasks using only a portion of CXR.
翻译:在胸部X光(CXR)诊断领域,现有研究通常仅专注于确定放射科医生的注视位置,典型方式是通过检测、分割或分类等任务实现。然而,这些方法往往被设计为黑箱模型,缺乏可解释性。本文提出可解释人工智能(I-AI)——一种新颖且统一的可控制可解释流水线,用于解码放射科医生在CXR诊断中的密集关注。我们的I-AI系统解答三个关键问题:放射科医生的注视位置、对特定区域的注视时长以及诊断出的发现。通过捕获放射科医生注视的强度,我们提供统一解决方案,揭示放射学解读背后的认知过程。与当前依赖黑箱机器学习模型的方法不同——这些方法在诊断过程中容易从整个输入图像中提取错误信息——我们通过有效屏蔽无关信息来解决该问题。所提出的I-AI系统利用视觉语言模型,在确保排除无关特征的同时实现对解读过程的精确控制。为训练I-AI模型,我们利用眼动注视数据集提取解剖学注视信息并生成真实热力图。通过广泛实验,我们验证了该方法的有效性。研究表明:旨在模拟放射科医生关注的注意力热图编码了充分且相关的信息,仅使用部分CXR图像即可实现准确的分类任务。