WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image

from arxiv, Accepted to Medical Image Analysis, dataset at: https://github.com/HiLab-git/WORD (we corrected the results or description in this version.)

Whole abdominal organ segmentation is important in diagnosing abdomen lesions, radiotherapy, and follow-up. However, oncologists' delineating all abdominal organs from 3D volumes is time-consuming and very expensive. Deep learning-based medical image segmentation has shown the potential to reduce manual delineation efforts, but it still requires a large-scale fine annotated dataset for training, and there is a lack of large-scale datasets covering the whole abdomen region with accurate and detailed annotations for the whole abdominal organ segmentation. In this work, we establish a new large-scale \textit{W}hole abdominal \textit{OR}gan \textit{D}ataset (\textit{WORD}) for algorithm research and clinical application development. This dataset contains 150 abdominal CT volumes (30495 slices). Each volume has 16 organs with fine pixel-level annotations and scribble-based sparse annotations, which may be the largest dataset with whole abdominal organ annotation. Several state-of-the-art segmentation methods are evaluated on this dataset. And we also invited three experienced oncologists to revise the model predictions to measure the gap between the deep learning method and oncologists. Afterwards, we investigate the inference-efficient learning on the WORD, as the high-resolution image requires large GPU memory and a long inference time in the test stage. We further evaluate the scribble-based annotation-efficient learning on this dataset, as the pixel-wise manual annotation is time-consuming and expensive. The work provided a new benchmark for the abdominal multi-organ segmentation task, and these experiments can serve as the baseline for future research and clinical application development.

翻译：全腹部器官分割在腹部病变诊断、放射治疗及随访中具有重要意义。然而，肿瘤科医生从三维体积中逐一手动勾画所有腹部器官既耗时又成本高昂。基于深度学习的医学图像分割虽有望减少人工标注工作量，但仍需大规模精细标注数据集进行训练，且目前缺乏覆盖全腹部区域、具备精确详细标注的大规模数据集。本研究构建了名为《WORD》（全腹部器官数据集）的新大规模数据集，用于算法研究与临床开发。该数据集包含150例腹部CT影像（共30495张切片），每例影像均具备16个器官的精细像素级标注及涂鸦式稀疏标注——这可能是目前标注器官数量最多的全腹部器官数据集。我们在该数据集上评估了多种前沿分割方法，并邀请三位资深肿瘤科医生对模型预测结果进行修正，以衡量深度学习方法与临床医生之间的差距。随后，我们针对高分辨率图像在测试阶段需消耗大量GPU显存及推理时间的问题，探究了WORD数据集上的高效推理学习。鉴于像素级手动标注的耗时性与高昂成本，我们进一步评估了基于涂鸦的稀疏标注高效学习方法。本研究为腹部多器官分割任务提供了新基准，相关实验可作为未来研究与临床开发的基础参考。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日