To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals.
翻译:为确保分类系统在医疗应用中的可靠使用,防止静默故障至关重要。这可以通过设计鲁棒性足够强的分类器以避免故障发生,或利用置信度评分函数(CSF)检测剩余故障来实现。图像分类中故障的主要来源是训练数据与部署数据之间的分布偏移。为理解医学影像领域静默故障预防的现状,我们开展了首次全面分析,在四项生物医学任务及多种分布偏移场景下比较了不同CSF的性能。基于基准测试中所有CSF均无法可靠预防静默故障的结果,我们得出结论:需更深入理解数据中故障的根本成因。为此,我们引入了SF-Visuals交互式分析工具,该工具利用潜在空间聚类可视化分布偏移与故障。通过多个案例,我们展示了该工具如何帮助研究人员深入理解医疗领域分类系统安全应用的要求。开源基准测试与工具参见:https://github.com/IML-DKFZ/sf-visuals。