The Segment Anything Model has revolutionized image segmentation with its zero-shot capabilities, yet its reliance on manual prompts hinders fully automated deployment. While integrating object detectors as prompt generators offers a pathway to automation, existing pipelines suffer from two fundamental limitations: objective mismatch, where detectors optimized for geometric localization do not correspond to the optimal prompting context required by SAM, and alignment overfitting in standard joint training, where the detector simply memorizes specific prompt adjustments for training samples rather than learning a generalizable policy. To bridge this gap, we introduce BLO-Inst, a unified framework that aligns detection and segmentation objectives by bi-level optimization. We formulate the alignment as a nested optimization problem over disjoint data splits. In the lower level, the SAM is fine-tuned to maximize segmentation fidelity given the current detection proposals on a subset ($D_1$). In the upper level, the detector is updated to generate bounding boxes that explicitly minimize the validation loss of the fine-tuned SAM on a separate subset ($D_2$). This effectively transforms the detector into a segmentation-aware prompt generator, optimizing the bounding boxes not just for localization accuracy, but for downstream mask quality. Extensive experiments demonstrate that BLO-Inst achieves superior performance, outperforming standard baselines on tasks in general and biomedical domains.
翻译:Segment Anything Model凭借其零样本能力彻底改变了图像分割领域,但其对人工提示的依赖阻碍了全自动部署。虽然将目标检测器作为提示生成器集成为实现自动化提供了一条途径,但现有流程存在两个根本性限制:目标不匹配——为几何定位优化的检测器与SAM所需的最佳提示上下文并不对应;以及标准联合训练中的对齐过拟合——检测器只是记住了训练样本的特定提示调整,而非学习可泛化的策略。为弥合这一差距,我们提出了BLO-Inst,一个通过双层优化来对齐检测与分割目标的统一框架。我们将该对齐问题表述为在不相交数据划分上的嵌套优化问题。在底层,SAM在子集($D_1$)上根据当前检测提议进行微调,以最大化分割保真度。在上层,检测器被更新以生成边界框,这些边界框明确地最小化微调后SAM在独立子集($D_2$)上的验证损失。这有效地将检测器转变为分割感知的提示生成器,其优化的边界框不仅追求定位精度,更追求下游掩码质量。大量实验表明,BLO-Inst实现了卓越的性能,在通用和生物医学领域的任务上均超越了标准基线方法。