Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models

With numerous medical tasks, the performance of deep models has recently experienced considerable improvements. These models are often adept learners. Yet, their intricate architectural design and high computational complexity make deploying them in clinical settings challenging, particularly with devices with limited resources. To deal with this issue, Knowledge Distillation (KD) has been proposed as a compression method and an acceleration technology. KD is an efficient learning strategy that can transfer knowledge from a burdensome model (i.e., teacher model) to a lightweight model (i.e., student model). Hence we can obtain a compact model with low parameters with preserving the teacher's performance. Therefore, we develop a KD-based deep model for prostate MRI segmentation in this work by combining features-based distillation with Kullback-Leibler divergence, Lovasz, and Dice losses. We further demonstrate its effectiveness by applying two compression procedures: 1) distilling knowledge to a student model from a single well-trained teacher, and 2) since most of the medical applications have a small dataset, we train multiple teachers that each one trained with a small set of images to learn an adaptive student model as close to the teachers as possible considering the desired accuracy and fast inference time. Extensive experiments were conducted on a public multi-site prostate tumor dataset, showing that the proposed adaptation KD strategy improves the dice similarity score by 9%, outperforming all tested well-established baseline models.

翻译：针对众多医疗任务，深度学习模型的性能近期取得了显著提升。这些模型往往是高效的学习器。然而，其复杂的架构设计与高计算复杂度使得在临床环境中的部署面临挑战，尤其是在资源受限的设备上。为解决此问题，知识蒸馏作为一种压缩方法与加速技术被提出。知识蒸馏是一种高效的学习策略，能够将知识从繁重的模型（即教师模型）迁移至轻量级模型（即学生模型）。因此，我们可在保留教师模型性能的同时获得参数规模小且紧凑的模型。本研究基于知识蒸馏方法，结合基于特征的蒸馏、Kullback-Leibler散度、Lovasz损失与Dice损失，开发了用于前列腺MRI分割的深度学习模型。我们通过两种压缩程序进一步验证其有效性：1) 从单一训练充分的教师模型向学生模型蒸馏知识；2) 鉴于多数医疗应用的数据集较小，我们训练多个教师模型，每个教师模型使用小规模图像集进行训练，从而学习一个自适应学生模型，使其在确保所需精度与快速推理的前提下尽可能接近教师模型。在公开的多中心前列腺肿瘤数据集上的大量实验表明，所提出的自适应知识蒸馏策略将Dice相似系数提升了9%，优于所有经过充分测试的基线模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日