Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies.
翻译:摘要:多人解析是一项需要同时利用实例级和细粒度类别级信息的图像分割任务。然而,先前研究通常通过独立分支和不同输出格式分别处理这两类信息,导致框架冗余且效率低下。本文提出UniParser,从三个关键方面整合实例级与类别级表示:1) 提出统一的相关性表示学习方法,使网络能够在余弦空间中学习实例与类别特征;2) 统一各模块输出形式为像素级分割结果,同时通过同质标签与辅助损失监督实例与类别特征;3) 设计联合优化流程以融合实例与类别表示。通过统一实例级与类别级输出,UniParser避免了人工设计的后处理步骤,并超越现有最优方法,在MHPv2.0上达到49.3% AP,在CIHP上达到60.4% AP。我们将开源源代码、预训练模型及在线演示以促进后续研究。