Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes a systematic and efficient method to expedite the annotation process for organ segmentation. We have created the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous refinement of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.
翻译:医学图像标注,尤其是器官分割标注,是一项耗时费力的工作。例如,根据标注者的专业水平以及器官的大小、可见度和复杂程度,标注一个腹部器官预计需要每例CT扫描30-60分钟。因此,公开可用的多器官分割数据集通常在数据规模和器官多样性上受到限制。本文提出了一种系统且高效的方法以加速器官分割的标注流程。我们构建了目前(迄今为止)最大的多器官数据集,涵盖8,448例CT扫描(相当于320万张切片)中的脾脏、肝脏、肾脏、胃、胆囊、胰腺、主动脉及下腔静脉(IVC)标注。传统标注方法需要经验丰富的标注者花费高达1,600周(约30.8年)才能完成此任务。相比之下,我们的标注方法在维持相同甚至更优标注质量的前提下,仅用三周(基于每周五天、每天八小时工作制)便完成了该任务。这一成果归功于我们方法的三个独特特性:(1)利用多个预训练分割模型减少标签偏差;(2)有效检测模型预测中的错误;(3)引导标注者聚焦于最显著的错误进行修正。此外,我们归纳了人工智能算法与标注者常见错误的分类体系。这有助于持续优化人工智能模型与标注质量,并显著降低为更广泛医学成像任务创建大规模数据集所需的标注成本。