Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios. However, existing methods devote to designing specialized architectures or parameters for specific segmentation tasks. These customized design paradigms lead to fragmentation between various segmentation tasks, thus hindering the uniformity of segmentation models. Hence in this paper, we propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation. FreeSeg optimizes an all-in-one network via one-shot training and employs the same architecture and parameters to handle diverse segmentation tasks seamlessly in the inference procedure. Additionally, adaptive prompt learning facilitates the unified model to capture task-aware and category-sensitive concepts, improving model robustness in multi-task and varied scenarios. Extensive experimental results demonstrate that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks, which outperforms the best task-specific architectures by a large margin: 5.5% mIoU on semantic segmentation, 17.6% mAP on instance segmentation, 20.1% PQ on panoptic segmentation for the unseen class on COCO.
翻译:近年来,开放词汇学习涌现出来,旨在完成基于文本描述的任意类别分割,从而将分割系统推广到更通用的应用场景。然而,现有方法致力于为特定分割任务设计专用架构或参数,这种定制化设计范式导致不同分割任务之间碎片化,阻碍了分割模型的统一性。为此,本文提出FreeSeg,一个通用框架,实现统一、通用与开放词汇图像分割。FreeSeg通过一次性训练优化全能网络,在推理过程中采用相同架构和参数无缝处理多种分割任务。此外,自适应提示学习促进统一模型捕捉任务感知与类别敏感概念,提升模型在多任务与多样场景下的鲁棒性。大量实验结果表明,FreeSeg在三个分割任务上取得了性能和泛化性的最新最优结果,大幅超越最佳任务专用架构:在COCO数据集未见类别上,语义分割mIoU提升5.5%,实例分割mAP提升17.6%,全景分割PQ提升20.1%。