Deep learning has advanced medical image classification, but interpretability challenges hinder its clinical adoption. This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports, enhancing clinical relevance, explainability, and transparency. Evaluation of the generated reports using an LLM-as-a-judge confirmed the interpretability and clinical utility of our model's outputs. On the COVID-QU dataset, our model achieved 81% classification accuracy and demonstrated robust report generation performance, with five key metrics ranging between 84% and 90%. This interpretable multi-agent framework bridges the gap between high-performance AI and the explainability required for reliable AI-driven CXR analysis in clinical settings.
翻译:深度学习已显著推进医学影像分类领域的发展,但其可解释性方面的挑战阻碍了临床应用的广泛采纳。本研究通过采用概念瓶颈模型(CBMs)及一个用于报告生成的多智能体检索增强生成(RAG)系统,提升了胸部X光片(CXR)分类任务的可解释性。通过建模视觉特征与临床概念之间的关联,我们构建了可解释的概念向量,用以引导多智能体RAG系统生成放射学报告,从而增强了报告的临床相关性、可解释性与透明度。采用大语言模型作为评估者对生成报告进行的评测,证实了我们模型输出结果的可解释性与临床实用性。在COVID-QU数据集上,我们的模型实现了81%的分类准确率,并在报告生成方面展现出稳健的性能,五项关键指标介于84%至90%之间。这一可解释的多智能体框架弥合了高性能人工智能与临床环境中可靠AI驱动CXR分析所需可解释性之间的鸿沟。