Human-centric perception (e.g. pedetrian detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision. This paper introduces a unified and versatile framework (HQNet) for single-stage multi-person multi-task human-centric perception (HCP). Our approach centers on learning a unified human query representation, denoted as Human Query, which captures intricate instance-level features for individual persons and disentangles complex multi-person scenarios. Although different HCP tasks have been well-studied individually, single-stage multi-task learning of HCP tasks has not been fully exploited in the literature due to the absence of a comprehensive benchmark dataset. To address this gap, we propose COCO-UniHuman benchmark dataset to enable model development and comprehensive evaluation. Experimental results demonstrate the proposed method's state-of-the-art performance among multi-task HCP models and its competitive performance compared to task-specific HCP models. Moreover, our experiments underscore Human Query's adaptability to new HCP tasks, thus demonstrating its robust generalization capability. Codes and data will be publicly accessible.
翻译:人体中心感知(如行人检测、分割、姿态估计和属性分析)是计算机视觉中长期存在的问题。本文提出了一种统一且通用的框架(HQNet),用于单阶段多人多任务人体中心感知。该方法的核心在于学习一种统一的人体查询表示,称为Human Query,它能够捕获单个行人的复杂实例级特征,并解耦复杂的多人场景。尽管不同的人体中心感知任务已被分别深入研究,但由于缺乏全面的基准数据集,这些任务的单阶段多任务学习在文献中尚未得到充分探索。为填补这一空白,我们提出了COCO-UniHuman基准数据集,以支持模型开发与全面评估。实验结果表明,所提方法在多任务人体中心感知模型中达到了最先进性能,并与任务专用的模型相比具有竞争力。此外,实验强调了Human Query对新人体中心感知任务的适应性,从而展示了其强大的泛化能力。代码和数据将公开提供。