Numerous Deep Learning (DL) models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, some fundamental questions remain: are the DL models capable of generalizing? What causes a drop in DL model performances? How to overcome the DL model performance drop? Medical data are dynamic and prone to domain shift, due to multiple factors such as updates to medical equipment, new imaging workflow, and shifts in patient demographics or populations can induce this drift over time. In this paper, we review recent developments in generalization methods for DL-based classification models. We also discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.
翻译:摘要:针对广泛的医学图像分析应用,大量深度学习模型已被开发出来,这些模型有望重塑医学实践的多个方面。尽管深度学习模型验证与实施取得初步进展,促使医疗机构采用这些模型,但一些根本性问题仍待解答:深度学习模型是否具备泛化能力?导致模型性能下降的原因是什么?如何克服模型性能衰退?医学数据具有动态性且易受领域漂移影响,医疗设备更新、新型成像流程、患者人口统计特征或群体变化等多重因素均可能随时间引发这种偏移。本文综述了基于深度学习的分类模型泛化方法的最新研究进展,同时探讨未来面临的挑战,包括改进评估协议和基准测试的需求,并展望了实现稳健泛化型医学图像分类模型的发展路径。