Head pose estimation (HPE) is a problem of interest in computer vision to improve the performance of face processing tasks in semi-frontal or profile settings. Recent applications require the analysis of faces in the full 360{\deg} rotation range. Traditional approaches to solve the semi-frontal and profile cases are not directly amenable for the full rotation case. In this paper we analyze the methodology for short- and wide-range HPE and discuss which representations and metrics are adequate for each case. We show that the popular Euler angles representation is a good choice for short-range HPE, but not at extreme rotations. However, the Euler angles' gimbal lock problem prevents them from being used as a valid metric in any setting. We also revisit the current cross-data set evaluation methodology and note that the lack of alignment between the reference systems of the training and test data sets negatively biases the results of all articles in the literature. We introduce a procedure to quantify this misalignment and a new methodology for cross-data set HPE that establishes new, more accurate, SOTA for the 300W-LP|Biwi benchmark. We also propose a generalization of the geodesic angular distance metric that enables the construction of a loss that controls the contribution of each training sample to the optimization of the model. Finally, we introduce a wide range HPE benchmark based on the CMU Panoptic data set.
翻译:头部姿态估计(HPE)是计算机视觉领域的一个研究问题,旨在提升半正面或侧面场景下脸部处理任务的性能。近期应用需要分析全360°旋转范围内的面部。解决半正面和侧面情况的传统方法无法直接适用于全旋转情况。本文分析了短范围和宽范围HPE的方法论,并讨论了每种情况适用的表示与度量。研究表明,流行的欧拉角表示是短范围HPE的合适选择,但不适用于极端旋转情况。然而,欧拉角的万向锁问题使其无法在任何场景中作为有效度量。我们还重新审视了当前的跨数据集评估方法论,注意到训练与测试数据集参考系之间缺乏对齐,这会导致文献中所有文章的结果产生负面偏差。我们提出了一种量化这种未对齐程度的方法,以及一种用于跨数据集HPE的新方法论,为300W-LP|Biwi基准建立了更准确的新SOTA。同时,我们提出了测地角距离度量的泛化形式,从而构建了一种能够控制每个训练样本对模型优化贡献的损失函数。最后,我们基于CMU Panoptic数据集引入了一个宽范围HPE基准。