Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.
翻译:基于医学图像的自动化诊断预测是支持临床决策的重要工具。然而,此类系统通常需要大量标注数据进行训练,而医疗领域往往缺乏此类数据。零样本方法通过无需依赖标注数据、灵活适应不同临床发现的新场景来应对这一挑战。此外,为将自动化诊断融入临床工作流程,方法需具备透明性和可解释性,以增强医疗专业人员的信任并便于正确性验证。本文提出Xplainer,一种用于临床环境中可解释零样本诊断的新型框架。Xplainer将对比视觉语言模型中的“分类即描述”方法适配至多标签医学诊断任务。具体而言,我们并不直接预测诊断结果,而是促使模型对放射科医生在X射线影像中关注的描述性观测项进行分类,并利用描述符概率估计诊断的可能性。本模型具有固有的可解释性,因为最终诊断预测直接基于底层描述符的预测结果。我们在CheXpert和ChestX-ray14两个胸部X射线数据集上评估Xplainer,证明其在提升零样本诊断性能与可解释性方面的有效性。结果表明,Xplainer能更细致地揭示决策过程,有望成为临床诊断的实用工具。