CXR-LanIC: Language-Grounded Interpretable Classifier for Chest X-Ray Diagnosis

Deep learning models have achieved remarkable accuracy in chest X-ray diagnosis, yet their widespread clinical adoption remains limited by the black-box nature of their predictions. Clinicians require transparent, verifiable explanations to trust automated diagnoses and identify potential failure modes. We introduce CXR-LanIC (Language-Grounded Interpretable Classifier for Chest X-rays), a novel framework that addresses this interpretability challenge through task-aligned pattern discovery. Our approach trains transcoder-based sparse autoencoders on a BiomedCLIP diagnostic classifier to decompose medical image representations into interpretable visual patterns. By training an ensemble of 100 transcoders on multimodal embeddings from the MIMIC-CXR dataset, we discover approximately 5,000 monosemantic patterns spanning cardiac, pulmonary, pleural, structural, device, and artifact categories. Each pattern exhibits consistent activation behavior across images sharing specific radiological features, enabling transparent attribution where predictions decompose into 20-50 interpretable patterns with verifiable activation galleries. CXR-LanIC achieves competitive diagnostic accuracy on five key findings while providing the foundation for natural language explanations through planned large multimodal model annotation. Our key innovation lies in extracting interpretable features from a classifier trained on specific diagnostic objectives rather than general-purpose embeddings, ensuring discovered patterns are directly relevant to clinical decision-making, demonstrating that medical AI systems can be both accurate and interpretable, supporting safer clinical deployment through transparent, clinically grounded explanations.

翻译：摘要：深度学习模型在胸部X光诊断中已取得显著准确性，但其临床广泛应用仍受限于预测的“黑箱”特性。临床医生需要透明、可验证的解释以信任自动化诊断，并识别潜在失效模式。我们提出CXR-LanIC（面向胸部X光的语言可解释分类器），一种通过任务对齐模式发现解决可解释性挑战的新型框架。该方法在BiomedCLIP诊断分类器上训练基于转换器的稀疏自编码器，将医学图像表示分解为可解释的视觉模式。通过在MIMIC-CXR数据集的多模态嵌入上训练100个转换器集成，我们发现了约5000个涵盖心脏、肺部、胸膜、结构、植入物与伪影类别的单语义模式。每个模式在共享特定放射学特征的图像上呈现一致的激活行为，从而支持透明的归因机制——将预测分解为20-50个可解释模式，并附带可验证的激活图库。CXR-LanIC在五项关键征象上实现了有竞争力的诊断准确性，同时通过计划中的大规模多模态模型标注为自然语言解释奠定基础。我们的核心创新在于：从基于特定诊断目标训练的分类器中提取可解释特征，而非通用嵌入，确保发现的模式与临床决策直接相关，证明医疗AI系统既可实现高准确性，也能具备可解释性，从而通过透明、基于临床的解释支持更安全的临床部署。