In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmentations by using visual prompts from previous iterations to achieve progressive improvement. (2) Confidence learning. PRISM employs multiple segmentation heads per input image, each generating a continuous map and a confidence score to optimize predictions. (3) Corrective learning. Following each segmentation iteration, PRISM employs a shallow corrective refinement network to reassign mislabeled voxels. (4) Hybrid design. PRISM integrates hybrid encoders to better capture both the local and global information. Comprehensive validation of PRISM is conducted using four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney, highlighting challenges caused by anatomical variations and ambiguous boundaries in accurate tumor identification. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly improves performance, achieving results that are close to human levels. The code is publicly available at https://github.com/MedICL-VU/PRISM.
翻译:本文提出PRISM(Promptable and Robust Interactive Segmentation Model),一种提示式鲁棒交互分割模型,旨在实现三维医学图像的精确分割。PRISM可接受多种视觉输入,包括点、框、涂鸦作为稀疏提示,以及掩膜作为密集提示。具体而言,PRISM遵循四项设计原则以实现鲁棒性:(1)迭代学习。模型通过利用前序迭代的视觉提示生成分割结果,实现渐进式改进;(2)置信学习。PRISM对每幅输入图像采用多个分割头,每个分割头生成连续概率图与置信度分数以优化预测;(3)校正学习。每次分割迭代后,PRISM采用浅层校正精化网络重新标注错误体素;(4)混合设计。PRISM集成混合编码器以更有效地捕获局部与全局信息。使用四个公开数据集(涵盖结肠、胰腺、肝脏和肾脏肿瘤分割)对PRISM进行全面验证,重点突出了解剖结构变异和模糊边界对准确肿瘤识别造成的挑战。与现有最优方法(包括有无提示工程)相比,PRISM显著提升性能,达到接近人类水平的结果。代码已开源:https://github.com/MedICL-VU/PRISM。