Amodal segmentation is a challenging task that aims to predict the complete geometric shape of objects, including their occluded regions. Although existing methods primarily focus on amodal segmentation within the training domain, these approaches often lack the generalization capacity to extend effectively to novel object categories and unseen contexts. This paper introduces Amodal SAM, a unified framework that leverages SAM (Segment Anything Model) for both amodal image and amodal video segmentation. Amodal SAM preserves the powerful generalization ability of SAM while extending its inherent capabilities to the amodal segmentation task. The improvements lie in three aspects: (1) a lightweight Spatial Completion Adapter that enables occluded region reconstruction, (2) a Target-Aware Occlusion Synthesis (TAOS) pipeline that addresses the scarcity of amodal annotations by generating diverse synthetic training data, and (3) novel learning objectives that enforce regional consistency and topological regularization. Extensive experiments demonstrate that Amodal SAM achieves state-of-the-art performance on standard benchmarks, while simultaneously exhibiting robust generalization to novel scenarios. We anticipate that this research will advance the field toward practical amodal segmentation systems capable of operating effectively in unconstrained real-world environments.
翻译:非模态分割是一项具有挑战性的任务,旨在预测物体的完整几何形状,包括其被遮挡区域。尽管现有方法主要集中于训练领域内的非模态分割,但这些方法往往缺乏有效泛化到新物体类别和未见情境的能力。本文介绍了Amodal SAM,一个统一的框架,它利用SAM(Segment Anything Model,分割一切模型)进行非模态图像和非模态视频分割。Amodal SAM保留了SAM强大的泛化能力,同时将其固有功能扩展到非模态分割任务。改进体现在三个方面:(1)轻量级的空间完成适配器,能够实现被遮挡区域的重建;(2)目标感知遮挡合成(TAOS)流水线,通过生成多样化的合成训练数据来解决非模态标注稀缺的问题;(3)创新的学习目标,强制进行区域一致性和拓扑正则化。大量实验表明,Amodal SAM在标准基准上达到了最先进的性能,同时对新场景展现出稳健的泛化能力。我们预期这项研究将推动该领域向能够在无约束真实环境中有效运行的实用非模态分割系统发展。