This paper aims to achieve universal segmentation of arbitrary semantic level. Despite significant progress in recent years, specialist segmentation approaches are limited to specific tasks and data distribution. Retraining a new model for adaptation to new scenarios or settings takes expensive computation and time cost, which raises the demand for versatile and universal segmentation model that can cater to various granularity. Although some attempts have been made for unifying different segmentation tasks or generalization to various scenarios, limitations in the definition of paradigms and input-output spaces make it difficult for them to achieve accurate understanding of content at arbitrary granularity. To this end, we present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions. For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output. Combined with a automatic annotation engine for utilizing numerous unlabeled data, UniLSeg achieves excellent performance on various tasks and settings, surpassing both specialist and unified segmentation models.
翻译:本文旨在实现任意语义层级的通用分割。尽管近年来取得了显著进展,但专业化分割方法仍局限于特定任务与数据分布。为适应新场景或新设定而重新训练模型需要昂贵的计算与时间成本,这促使学界对能够适配不同粒度的通用分割模型产生需求。尽管已有研究尝试统一不同分割任务或泛化至多样化场景,但由于范式定义与输入输出空间的局限,这些方法难以实现对任意粒度内容的精确理解。为此,我们提出UniLSeg——一种通过语言指令引导实现任意语义层级分割的通用分割模型。为训练UniLSeg,我们将原始分布各异的任务群组重构为统一数据格式:输入包含描述分割目标的文本与图像,输出对应掩膜。结合用于利用大量无标签数据的自动标注引擎,UniLSeg在各类任务与设定中均展现出卓越性能,全面超越专业分割模型与统一分割模型。