To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals.
翻译:为确保分类系统在医学应用中的可靠使用,预防无声故障至关重要。这可以通过设计足够鲁棒以避免故障的分类器,或使用置信度评分函数(CSF)检测残留故障来实现。图像分类故障的主要来源是训练数据与部署数据之间的分布偏移。为理解医学影像中无声故障预防的现状,我们首次开展了全面分析,比较了四种生物医学任务及多种分布偏移下各类CSF的表现。基于基准测试中无任何CSF能可靠预防无声故障的结果,我们得出结论:需更深入地理解数据中故障的根本原因。为此,我们引入SF-Visuals这一交互式分析工具,利用潜在空间聚类可视化偏移和故障。通过多个实例,我们展示了该工具如何帮助研究人员洞察医学领域分类系统安全应用的要求。开源基准测试及工具地址:https://github.com/IML-DKFZ/sf-visuals。