Detecting misleading patterns in automated diagnostic assistance systems, such as those powered by Artificial Intelligence, is critical to ensuring their reliability, particularly in healthcare. Current techniques for evaluating deep learning models cannot visualize confounding factors at a diagnostic level. Here, we propose a self-conditioned diffusion model termed DiffChest and train it on a dataset of 515,704 chest radiographs from 194,956 patients from multiple healthcare centers in the United States and Europe. DiffChest explains classifications on a patient-specific level and visualizes the confounding factors that may mislead the model. We found high inter-reader agreement when evaluating DiffChest's capability to identify treatment-related confounders, with Fleiss' Kappa values of 0.8 or higher across most imaging findings. Confounders were accurately captured with 11.1% to 100% prevalence rates. Furthermore, our pretraining process optimized the model to capture the most relevant information from the input radiographs. DiffChest achieved excellent diagnostic accuracy when diagnosing 11 chest conditions, such as pleural effusion and cardiac insufficiency, and at least sufficient diagnostic accuracy for the remaining conditions. Our findings highlight the potential of pretraining based on diffusion models in medical image classification, specifically in providing insights into confounding factors and model robustness.
翻译:在人工智能驱动的自动化诊断辅助系统中,检测误导性模式对于确保其可靠性至关重要,尤其是在医疗领域。当前评估深度学习模型的技术无法在诊断层面可视化混杂因素。本文提出一种名为DiffChest的自条件扩散模型,并在包含来自美欧多中心194,956名患者的515,704张胸部X光影像数据集上进行训练。DiffChest可在患者个体层面解释分类结果,并可视化可能误导模型的混杂因素。我们发现,在评估DiffChest识别治疗相关混杂因素的能力时,读者间一致性较高,大多数影像学表现的Fleiss' Kappa值达到0.8及以上。混杂因素的捕获准确率达11.1%至100%。此外,我们的预训练过程优化了模型从输入X光影像中提取最相关信息的能力。在诊断胸腔积液、心功能不全等11种胸部疾病时,DiffChest达到了优异的诊断准确率,其余疾病也至少达到充分诊断水平。研究结果凸显了基于扩散模型预训练在医学图像分类中的潜力,特别是在揭示混杂因素和提升模型鲁棒性方面。