Deep learning empowers the mainstream medical image segmentation methods. Nevertheless current deep segmentation approaches are not capable of efficiently and effectively adapting and updating the trained models when new incremental segmentation classes (along with new training datasets or not) are required to be added. In real clinical environment, it can be preferred that segmentation models could be dynamically extended to segment new organs/tumors without the (re-)access to previous training datasets due to obstacles of patient privacy and data storage. This process can be viewed as a continual semantic segmentation (CSS) problem, being understudied for multi-organ segmentation. In this work, we propose a new architectural CSS learning framework to learn a single deep segmentation model for segmenting a total of 143 whole-body organs. Using the encoder/decoder network structure, we demonstrate that a continually-trained then frozen encoder coupled with incrementally-added decoders can extract and preserve sufficiently representative image features for new classes to be subsequently and validly segmented. To maintain a single network model complexity, we trim each decoder progressively using neural architecture search and teacher-student based knowledge distillation. To incorporate with both healthy and pathological organs appearing in different datasets, a novel anomaly-aware and confidence learning module is proposed to merge the overlapped organ predictions, originated from different decoders. Trained and validated on 3D CT scans of 2500+ patients from four datasets, our single network can segment total 143 whole-body organs with very high accuracy, closely reaching the upper bound performance level by training four separate segmentation models (i.e., one model per dataset/task).
翻译:深度学习赋能了主流医学图像分割方法。然而,当前深度分割方法在需要新增增量分割类别(无论是否伴随新训练数据集)时,无法高效且有效地适应和更新已训练模型。在真实临床环境中,由于患者隐私和数据存储障碍,无法(重新)访问历史训练数据集时,分割模型能够动态扩展以分割新器官/肿瘤更为理想。该过程可视为持续语义分割问题,但目前针对多器官分割的研究尚不充分。本文提出了一种新的架构化持续语义分割学习框架,用于训练单个深度分割模型以分割全身143个器官。基于编码器/解码器网络结构,我们证明持续训练后冻结的编码器与增量添加的解码器相结合,能够提取并保留足够具有代表性的图像特征,使新类别得以后续有效分割。为维持单网络模型复杂度,我们通过神经架构搜索和基于师生模型的知识蒸馏逐步修剪每个解码器。针对不同数据集中健康与病变器官的融合问题,提出新型异常感知与置信度学习模块,用于合并来自不同解码器的重叠器官预测结果。基于来自四个数据集的2500余例患者3D CT扫描训练与验证,我们的单网络能以极高精度分割总计143个全身器官,其性能接近通过分别训练四个分割模型(即每个数据集/任务对应一个模型)所能达到的性能上限水平。