Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks

Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes a systematic and efficient method to expedite the annotation process for organ segmentation. We have created the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous refinement of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.

翻译：医学图像标注，尤其是器官分割标注，是一项耗时费力的工作。例如，根据标注者的专业水平以及器官的大小、可见度和复杂程度，标注一个腹部器官预计需要每例CT扫描30-60分钟。因此，公开可用的多器官分割数据集通常在数据规模和器官多样性上受到限制。本文提出了一种系统且高效的方法以加速器官分割的标注流程。我们构建了目前（迄今为止）最大的多器官数据集，涵盖8,448例CT扫描（相当于320万张切片）中的脾脏、肝脏、肾脏、胃、胆囊、胰腺、主动脉及下腔静脉（IVC）标注。传统标注方法需要经验丰富的标注者花费高达1,600周（约30.8年）才能完成此任务。相比之下，我们的标注方法在维持相同甚至更优标注质量的前提下，仅用三周（基于每周五天、每天八小时工作制）便完成了该任务。这一成果归功于我们方法的三个独特特性：（1）利用多个预训练分割模型减少标签偏差；（2）有效检测模型预测中的错误；（3）引导标注者聚焦于最显著的错误进行修正。此外，我们归纳了人工智能算法与标注者常见错误的分类体系。这有助于持续优化人工智能模型与标注质量，并显著降低为更广泛医学成像任务创建大规模数据集所需的标注成本。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日