Existing promptable segmentation methods in the medical imaging field primarily consider either textual or visual prompts to segment relevant objects, yet they often fall short when addressing anomalies in medical images, like tumors, which may vary greatly in shape, size, and appearance. Recognizing the complexity of medical scenarios and the limitations of textual or visual prompts, we propose a novel dual-prompt schema that leverages the complementary strengths of visual and textual prompts for segmenting various organs and tumors. Specifically, we introduce CAT, an innovative model that Coordinates Anatomical prompts derived from 3D cropped images with Textual prompts enriched by medical domain knowledge. The model architecture adopts a general query-based design, where prompt queries facilitate segmentation queries for mask prediction. To synergize two types of prompts within a unified framework, we implement a ShareRefiner, which refines both segmentation and prompt queries while disentangling the two types of prompts. Trained on a consortium of 10 public CT datasets, CAT demonstrates superior performance in multiple segmentation tasks. Further validation on a specialized in-house dataset reveals the remarkable capacity of segmenting tumors across multiple cancer stages. This approach confirms that coordinating multimodal prompts is a promising avenue for addressing complex scenarios in the medical domain.
翻译:医学影像领域中现有的可提示分割方法主要考虑文本或视觉提示来分割相关对象,但这些方法在处理医学图像中的异常(如肿瘤)时往往表现不足,因为肿瘤在形状、大小和外观上可能存在巨大差异。认识到医学场景的复杂性以及文本或视觉提示的局限性,我们提出了一种新颖的双提示方案,该方案利用视觉和文本提示的互补优势来分割各种器官和肿瘤。具体而言,我们引入了CAT这一创新模型,它协调源自三维裁剪图像的解剖学提示与通过医学领域知识增强的文本提示。该模型架构采用基于查询的通用设计,其中提示查询辅助分割查询进行掩码预测。为了在统一框架内协同两种类型的提示,我们实现了ShareRefiner,它在解耦两种提示的同时,优化分割查询和提示查询。通过在10个公开CT数据集联盟上进行训练,CAT在多种分割任务中展现出卓越性能。在专门内部数据集上的进一步验证揭示了该模型在多个癌症阶段分割肿瘤的显著能力。这一方法证实,协调多模态提示是解决医学领域复杂场景的一条有前景的途径。