We introduce a novel approach for 3D whole-body pose estimation, addressing the challenge of scale -- and deformability -- variance across body parts brought by the challenge of extending the 17 major joints on the human body to fine-grained keypoints on the face and hands. In addition to addressing the challenge of exploiting motion in unevenly sampled data, we combine stable diffusion to a hierarchical part representation which predicts the relative locations of fine-grained keypoints within each part (e.g., face) with respect to the part's local reference frame. On the H3WB dataset, our method greatly outperforms the current state of the art, which fails to exploit the temporal information. We also show considerable improvements compared to other spatiotemporal 3D human-pose estimation approaches that fail to account for the body part specificities. Code is available at https://github.com/valeoai/PAFUSE.
翻译:本文提出了一种新颖的三维全身姿态估计方法,旨在解决从人体17个主要关节点扩展到面部与手部细粒度关键点时所引发的部件间尺度与形变差异问题。在应对非均匀采样数据中运动信息利用难题的同时,我们将稳定扩散机制与分层部件表征相结合,通过该表征可预测各部件(如面部)内部细粒度关键点相对于部件局部参考系的相对位置。在H3WB数据集上的实验表明,本方法显著超越了当前未能有效利用时序信息的先进技术。相较于其他未考虑身体部件特异性的时空三维人体姿态估计方法,本方法亦展现出显著优势。代码已开源:https://github.com/valeoai/PAFUSE。