Whole abdominal organ segmentation is important in diagnosing abdomen lesions, radiotherapy, and follow-up. However, oncologists' delineating all abdominal organs from 3D volumes is time-consuming and very expensive. Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, and there is a lack of large-scale datasets covering the whole abdomen region with accurate and detailed annotations for the whole abdominal organ segmentation. In this work, we establish a new large-scale \textit{W}hole abdominal \textit{OR}gan \textit{D}ataset (\textit{WORD}) for algorithm research and clinical application development. This dataset contains 150 abdominal CT volumes (30495 slices). Each volume has 16 organs with fine pixel-level annotations and scribble-based sparse annotations, which may be the largest dataset with whole abdominal organ annotation. Several state-of-the-art segmentation methods are evaluated on this dataset. And we also invited three experienced oncologists to revise the model predictions to measure the gap between the deep learning method and oncologists. Afterwards, we investigate the inference-efficient learning on the WORD, as the high-resolution image requires large GPU memory and a long inference time in the test stage. We further evaluate the scribble-based annotation-efficient learning on this dataset, as the pixel-wise manual annotation is time-consuming and expensive. The work provided a new benchmark for the abdominal multi-organ segmentation task, and these experiments can serve as the baseline for future research and clinical application development.
翻译:全腹部器官分割在腹部病变诊断、放射治疗及随访中具有重要意义。然而,肿瘤科医生从三维体积中逐一手动勾画所有腹部器官既耗时又成本高昂。基于深度学习的医学图像分割虽有望减少人工标注工作量,但仍需大规模精细标注数据集进行训练,且目前缺乏覆盖全腹部区域、具备精确详细标注的大规模数据集。本研究构建了名为《WORD》(全腹部器官数据集)的新大规模数据集,用于算法研究与临床开发。该数据集包含150例腹部CT影像(共30495张切片),每例影像均具备16个器官的精细像素级标注及涂鸦式稀疏标注——这可能是目前标注器官数量最多的全腹部器官数据集。我们在该数据集上评估了多种前沿分割方法,并邀请三位资深肿瘤科医生对模型预测结果进行修正,以衡量深度学习方法与临床医生之间的差距。随后,我们针对高分辨率图像在测试阶段需消耗大量GPU显存及推理时间的问题,探究了WORD数据集上的高效推理学习。鉴于像素级手动标注的耗时性与高昂成本,我们进一步评估了基于涂鸦的稀疏标注高效学习方法。本研究为腹部多器官分割任务提供了新基准,相关实验可作为未来研究与临床开发的基础参考。