Transfer learning boosts the performance of medical image analysis by enabling deep learning (DL) on small datasets through the knowledge acquired from large ones. As the number of DL architectures explodes, exhaustively attempting all candidates becomes unfeasible, motivating cheaper alternatives for choosing them. Transferability scoring methods emerge as an enticing solution, allowing to efficiently calculate a score that correlates with the architecture accuracy on any target dataset. However, since transferability scores have not been evaluated on medical datasets, their use in this context remains uncertain, preventing them from benefiting practitioners. We fill that gap in this work, thoroughly evaluating seven transferability scores in three medical applications, including out-of-distribution scenarios. Despite promising results in general-purpose datasets, our results show that no transferability score can reliably and consistently estimate target performance in medical contexts, inviting further work in that direction.
翻译:迁移学习通过从大规模数据集中获取知识,使深度学习(DL)在小数据集上仍能发挥作用,从而提升了医学图像分析的性能。随着深度学习架构数量激增,穷举所有候选模型变得不可行,这促使研究者寻求更经济的替代方案以进行模型选择。迁移性评分方法作为一种颇具吸引力的解决方案应运而生,它能够高效计算与架构在任意目标数据集上准确率相关的评分。然而,由于迁移性评分尚未在医学数据集中得到验证,其在此类场景中的应用仍存在不确定性,因而无法为从业者带来实际效益。本研究填补了这一空白,在三种医学应用(包括分布外场景)中全面评估了七种迁移性评分方法。尽管这些评分方法在通用数据集中表现良好,但我们的结果表明,在医学背景下,没有任何一种迁移性评分能够可靠且一致地估计目标性能,这为后续研究指明了方向。