Deep learning has advanced medical image classification, but interpretability challenges hinder its clinical adoption. This study enhances interpretability in Chest X-ray (CXR) classification by using concept bottleneck models (CBMs) and a multi-agent Retrieval-Augmented Generation (RAG) system for report generation. By modeling relationships between visual features and clinical concepts, we create interpretable concept vectors that guide a multi-agent RAG system to generate radiology reports, enhancing clinical relevance, explainability, and transparency. Evaluation of the generated reports using an LLM-as-a-judge confirmed the interpretability and clinical utility of our model's outputs. On the COVID-QU dataset, our model achieved 81% classification accuracy and demonstrated robust report generation performance, with five key metrics ranging between 84% and 90%. This interpretable multi-agent framework bridges the gap between high-performance AI and the explainability required for reliable AI-driven CXR analysis in clinical settings. Our code is available at https://github.com/tifat58/IRR-with-CBM-RAG.git.
翻译:深度学习虽已推动医学影像分类的发展,但其可解释性不足阻碍了临床实际应用。本研究通过采用概念瓶颈模型(CBMs)与多智能体检索增强生成(RAG)系统,提升了胸部X光片(CXR)分类的可解释性,并实现了报告生成。通过建模视觉特征与临床概念间的关联,我们构建了可解释的概念向量,用以引导多智能体RAG系统生成放射学报告,从而增强临床相关性、可解释性与透明度。采用大语言模型作为评估工具对生成报告进行评测,证实了模型输出结果的可解释性与临床实用性。在COVID-QU数据集上,我们的模型实现了81%的分类准确率,并在报告生成任务中展现出稳健的性能,五项关键指标介于84%至90%之间。这一可解释的多智能体框架弥合了高性能人工智能与临床环境下可靠AI驱动CXR分析所需可解释性之间的差距。相关代码已发布于 https://github.com/tifat58/IRR-with-CBM-RAG.git。