PerSense: Personalized Instance Segmentation in Dense Images

Leveraging large-scale pre-training, vision foundational models showcase notable performance benefits. Recent segmentation algorithms for natural scenes have advanced significantly. However, existing models still struggle to automatically segment personalized instances in dense and crowded scenarios, where severe occlusions, scale variations, and background clutter pose a challenge to accurately delineate densely packed instances of the target object. To address this, we propose PerSense, an end-to-end, training-free, and model-agnostic one-shot framework for Personalized instance Segmentation in dense images. We develop a new baseline capable of automatically generating instance-level point prompts via proposing a novel Instance Detection Module (IDM) that leverages density maps, encapsulating spatial distribution of objects in an image. To mitigate false positives within generated point prompts, we design Point Prompt Selection Module (PPSM). Both IDM and PPSM transform density maps into personalized precise point prompts for instance-level segmentation and offer a seamless integration in our model-agnostic framework. We also introduce a feedback mechanism which enables PerSense to improve the accuracy of density maps by automating the exemplar selection process for density map generation. To promote algorithmic advances and effective tools for this relatively underexplored task, we introduce PerSense-D, a diverse dataset exclusive to personalized instance segmentation in dense images. Our extensive experiments establish PerSense superiority in dense scenarios by achieving an mIoU of 71.61% on PerSense-D, outperforming recent SOTA models by significant margins of +47.16%, +42.27%, +8.83%, and +5.69%. Additionally, our qualitative findings demonstrate the adaptability of our framework to images captured in-the-wild.

翻译：通过大规模预训练，视觉基础模型展现出显著的性能优势。自然场景分割算法近年来取得了重大进展。然而，现有模型在密集拥挤场景中仍难以自动分割个性化实例，其中严重的遮挡、尺度变化和背景干扰对准确勾勒目标对象密集实例的轮廓构成挑战。为解决此问题，我们提出PerSense，一种用于密集图像个性化实例分割的端到端、免训练且模型无关的单样本框架。我们开发了一种新基线，能够通过提出一种新颖的实例检测模块（IDM）自动生成实例级点提示，该模块利用密度图来封装图像中对象的空间分布。为减少生成点提示中的误报，我们设计了点提示选择模块（PPSM）。IDM和PPSM共同将密度图转化为用于实例级分割的个性化精确点提示，并在我们的模型无关框架中实现无缝集成。我们还引入了一种反馈机制，使PerSense能够通过自动化密度图生成的示例选择过程来提高密度图的准确性。为促进这一相对未被充分探索任务的算法进展和有效工具开发，我们提出了PerSense-D，一个专用于密集图像个性化实例分割的多样化数据集。我们的大量实验证实了PerSense在密集场景中的优越性，在PerSense-D数据集上实现了71.61%的mIoU，以显著优势超越近期SOTA模型，分别超出+47.16%、+42.27%、+8.83%和+5.69%。此外，我们的定性研究结果证明了该框架对野外拍摄图像的适应能力。