DirectMHP: Direct 2D Multi-Person Head Pose Estimation with Full-range Angles

Existing head pose estimation (HPE) mainly focuses on single person with pre-detected frontal heads, which limits their applications in real complex scenarios with multi-persons. We argue that these single HPE methods are fragile and inefficient for Multi-Person Head Pose Estimation (MPHPE) since they rely on the separately trained face detector that cannot generalize well to full viewpoints, especially for heads with invisible face areas. In this paper, we focus on the full-range MPHPE problem, and propose a direct end-to-end simple baseline named DirectMHP. Due to the lack of datasets applicable to the full-range MPHPE, we firstly construct two benchmarks by extracting ground-truth labels for head detection and head orientation from public datasets AGORA and CMU Panoptic. They are rather challenging for having many truncated, occluded, tiny and unevenly illuminated human heads. Then, we design a novel end-to-end trainable one-stage network architecture by joint regressing locations and orientations of multi-head to address the MPHPE problem. Specifically, we regard pose as an auxiliary attribute of the head, and append it after the traditional object prediction. Arbitrary pose representation such as Euler angles is acceptable by this flexible design. Then, we jointly optimize these two tasks by sharing features and utilizing appropriate multiple losses. In this way, our method can implicitly benefit from more surroundings to improve HPE accuracy while maintaining head detection performance. We present comprehensive comparisons with state-of-the-art single HPE methods on public benchmarks, as well as superior baseline results on our constructed MPHPE datasets. Datasets and code are released in https://github.com/hnuzhy/DirectMHP.

翻译：现有头部姿态估计（HPE）主要关注单人且预检测正脸场景，这限制了其在包含多人的真实复杂场景中的应用。我们认为，这类单人头姿态估计方法在多人头部姿态估计（MPHPE）中存在脆弱性和低效性，因为它们依赖单独训练的人脸检测器，而该检测器无法良好泛化至全视角范围，尤其是针对面部区域不可见的头部。本文聚焦全范围MPHPE问题，提出一个名为DirectMHP的直接端到端简洁基线。由于缺乏适用于全范围MPHPE的数据集，我们首先通过从公共数据集AGORA和CMU Panoptic中提取头部检测及头部朝向的真实标注，构建了两个基准测试集。这些基准因包含大量被截断、遮挡、微小及光照不均的人头而极具挑战性。随后，我们设计了一种新颖的端到端可训练单阶段网络架构，通过联合回归多头的位置和朝向来解决MPHPE问题。具体而言，我们将姿态视为头部的辅助属性，并将其附加在传统目标预测之后。这一灵活设计可兼容任意姿态表示（如欧拉角）。接着，我们通过共享特征并利用合适的多重损失函数，联合优化这两个任务。通过这种方式，我们的方法能够隐式利用更多周围环境信息提升HPE精度，同时保持头部检测性能。我们在公共基准上与先进单人头姿态估计方法进行了全面比较，并在所构建的MPHPE数据集上展示了更优的基线结果。数据集和代码已发布于https://github.com/hnuzhy/DirectMHP。