Reliable identification of anatomical body regions is a prerequisite for many automated medical imaging workflows, yet existing solutions remain heavily dependent on unreliable DICOM metadata. Current solutions mainly use supervised learning, which limits their applicability in many real-world scenarios. In this work, we investigate whether body region detection in volumetric CT and MR images can be achieved in a fully zero-shot manner by using knowledge embedded in large pre-trained foundation models. We propose and systematically evaluate three training-free pipelines: (1) a segmentation-driven rule-based system leveraging pre-trained multi-organ segmentation models, (2) a Multimodal Large Language Model (MLLM) guided by radiologist-defined rules, and (3) a segmentation-aware MLLM that combines visual input with explicit anatomical evidence. All methods are evaluated on 887 heterogeneous CT and MR scans with manually verified anatomical region labels. The segmentation-driven rule-based approach achieves the strongest and most consistent performance, with weighted F1-scores of 0.947 (CT) and 0.914 (MR), demonstrating robustness across modalities and atypical scan coverage. The MLLM performs competitively in visually distinctive regions, while the segmentation-aware MLLM reveals fundamental limitations.
翻译:解剖体区域的可靠识别是许多自动化医学影像工作流程的前提条件,然而现有解决方案仍严重依赖不可靠的DICOM元数据。当前方法主要采用监督学习,这限制了其在许多真实场景中的适用性。在本研究中,我们探讨是否可以通过利用大型预训练基础模型中嵌入的知识,以完全零样本的方式实现体部CT与MR影像中的解剖区域检测。我们提出并系统评估了三种免训练流程:(1)利用预训练多器官分割模型的基于分割驱动的规则系统,(2)由放射科医师定义规则指导的多模态大语言模型(MLLM),以及(3)将视觉输入与显式解剖证据相结合的分割感知型MLLM。所有方法均在887例具有人工验证解剖区域标签的异质性CT与MR扫描上进行评估。基于分割驱动的规则方法取得了最优且最稳定的性能,加权F1分数分别达到0.947(CT)和0.914(MR),展现了跨模态与非典型扫描范围的鲁棒性。MLLM在视觉特征显著区域表现出竞争力,而分割感知型MLLM则揭示了其根本性局限。