Med-Query: Steerable Parsing of 9-DoF Medical Anatomies with Query Embedding

Automatic parsing of human anatomies at the instance-level from 3D computed tomography (CT) is a prerequisite step for many clinical applications. The presence of pathologies, broken structures or limited field-of-view (FOV) can all make anatomy parsing algorithms vulnerable. In this work, we explore how to leverage and implement the successful detection-then-segmentation paradigm for 3D medical data, and propose a steerable, robust, and efficient computing framework for detection, identification, and segmentation of anatomies in CT scans. Considering the complicated shapes, sizes, and orientations of anatomies, without loss of generality, we present a nine degrees of freedom (9-DoF) pose estimation solution in full 3D space using a novel single-stage, non-hierarchical representation. Our whole framework is executed in a steerable manner where any anatomy of interest can be directly retrieved to further boost inference efficiency. We have validated our method on three medical imaging parsing tasks: ribs, spine, and abdominal organs. For rib parsing, CT scans have been annotated at the rib instance-level for quantitative evaluation, similarly for spine vertebrae and abdominal organs. Extensive experiments on 9-DoF box detection and rib instance segmentation demonstrate the high efficiency and effectiveness of our framework (with the identification rate of 97.0% and the segmentation Dice score of 90.9%), compared favorably against several strong baselines (e.g., CenterNet, FCOS, and nnU-Net). For spine parsing and abdominal multi-organ segmentation, our method achieves competitive results on par with state-of-the-art methods on the public CTSpine1K dataset and FLARE22 competition, respectively. Our annotations, code, and models are available at: https://github.com/alibaba-damo-academy/Med_Query.

翻译：从三维计算机断层扫描（CT）中实现实例级人体解剖结构自动解析是众多临床应用的前提步骤。病理特征、结构断裂或有限视野（FOV）的存在均可能导致解剖解析算法失效。本研究探索如何将成功的“检测-分割”范式应用于三维医学数据，并提出一种可操控、鲁棒且高效的计算框架，用于CT扫描中解剖结构的检测、识别与分割。考虑到解剖结构复杂的形状、尺寸和空间朝向，我们在完整三维空间中提出了一种基于新型单阶段非层级表示的九自由度（9-DoF）姿态估计解决方案。整个框架以可操控方式执行，能够直接检索任意目标解剖结构以进一步提升推理效率。我们在三项医学影像解析任务上验证了方法有效性：肋骨、脊柱及腹部器官。针对肋骨解析，我们在CT扫描中标注了肋骨实例级数据用于定量评估，脊柱椎骨与腹部器官亦采用类似标注方式。在9-DoF边界框检测和肋骨实例分割任务上的大量实验表明，相较于多个强基线模型（如CenterNet、FCOS和nnU-Net），本框架具有高效性与优越性（识别率达97.0%，分割Dice分数达90.9%）。在脊柱解析与腹部多器官分割任务中，我们的方法分别在公开CTSpine1K数据集和FLARE22竞赛中取得了与前沿方法相媲美的结果。相关标注数据、代码及模型已开源：https://github.com/alibaba-damo-academy/Med_Query。