In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
翻译:近年来,深度学习在组织病理学应用领域的使用日益增多。然而,尽管这些方法展现出巨大潜力,在高风险环境中,深度学习模型需要能够判断自身的不确定性,并在存在显著错误分类风险时拒绝输入。本研究对全切片图像分类中最常用的不确定性与鲁棒性方法进行了严格评估,重点关注选择性分类任务——即模型在不确定情境下应拒绝分类。我们在瓦片级别(考虑领域偏移和标签噪声)及切片级别上开展了实验。实验中,我们比较了深度集成、蒙特卡洛丢弃、随机变分推断、测试时数据增强以及这些方法的集成组合。观察发现,方法集成通常能带来更优的不确定性估计,并增强对领域偏移和标签噪声的鲁棒性,而与传统计算机视觉基准测试结果相反,其他方法未显示出系统性增益。在所有方法中,拒绝最不确定的样本可靠地显著提高了分布内与分布外数据的分类准确率。此外,我们还比较了不同标签噪声条件下这些方法的表现。最后,我们公开了代码框架,以促进组织病理学数据不确定性估计的进一步研究。