Med-Query: Steerable Parsing of 9-DoF Medical Anatomies with Query Embedding

Automatic parsing of human anatomies at instance-level from 3D computed tomography (CT) scans is a prerequisite step for many clinical applications. The presence of pathologies, broken structures or limited field-of-view (FOV) all can make anatomy parsing algorithms vulnerable. In this work, we explore how to exploit and conduct the prosperous detection-then-segmentation paradigm in 3D medical data, and propose a steerable, robust, and efficient computing framework for detection, identification, and segmentation of anatomies in CT scans. Considering complicated shapes, sizes and orientations of anatomies, without lose of generality, we present the nine degrees-of-freedom (9-DoF) pose estimation solution in full 3D space using a novel single-stage, non-hierarchical forward representation. Our whole framework is executed in a steerable manner where any anatomy of interest can be directly retrieved to further boost the inference efficiency. We have validated the proposed method on three medical imaging parsing tasks of ribs, spine, and abdominal organs. For rib parsing, CT scans have been annotated at the rib instance-level for quantitative evaluation, similarly for spine vertebrae and abdominal organs. Extensive experiments on 9-DoF box detection and rib instance segmentation demonstrate the effectiveness of our framework (with the identification rate of 97.0% and the segmentation Dice score of 90.9%) in high efficiency, compared favorably against several strong baselines (e.g., CenterNet, FCOS, and nnU-Net). For spine identification and segmentation, our method achieves a new state-of-the-art result on the public CTSpine1K dataset. Last, we report highly competitive results in multi-organ segmentation at FLARE22 competition. Our annotations, code and models will be made publicly available at: https://github.com/alibaba-damo-academy/Med_Query.

翻译：从三维计算机断层扫描（CT）中实现人体解剖结构的实例级自动解析，是众多临床应用的前提步骤。病理形态、结构断裂或有限视野均可能削弱解剖结构解析算法的鲁棒性。本文探索如何在三维医学数据中利用并实现"检测-分割"这一高效范式，提出一种可引导、鲁棒且高效的计算框架，用于CT扫描中解剖结构的检测、识别与分割。考虑解剖结构复杂的形状、尺寸与朝向，且不失一般性，我们创新性地采用单阶段、非层级的前向表示方法，在完整三维空间中提出九自由度位姿估计方案。整个框架以可引导方式运行，可直接检索任意目标解剖结构以进一步提升推理效率。我们在肋骨、脊柱及腹部器官三项医学影像解析任务上验证了所提方法：其中肋骨解析采用实例级标注的CT扫描进行定量评估，脊柱椎骨与腹部器官也采用类似标注方案。在九自由度包围盒检测与肋骨实例分割上的大量实验表明，本框架以高识别率（97.0%）与分割Dice系数（90.9%）实现了高效性能，优于多个强基线方法（如CenterNet、FCOS及nnU-Net）。在脊柱识别与分割任务中，我们的方法在公开CTSpine1K数据集上达到最新最优结果。此外，我们在FLARE22竞赛的多器官分割任务中取得极具竞争力的结果。本文的标注数据、代码及模型将开源发布于：https://github.com/alibaba-damo-academy/Med_Query。