The rapid development of multi-view 3D human pose estimation (HPE) is attributed to the maturation of monocular 2D HPE and the geometry of 3D reconstruction. However, 2D detection outliers in occluded views due to neglect of view consistency, and 3D implausible poses due to lack of pose coherence, remain challenges. To solve this, we introduce a Multi-View Fusion module to refine 2D results by establishing view correlations. Then, Holistic Triangulation is proposed to infer the whole pose as an entirety, and anatomy prior is injected to maintain the pose coherence and improve the plausibility. Anatomy prior is extracted by PCA whose input is skeletal structure features, which can factor out global context and joint-by-joint relationship from abstract to concrete. Benefiting from the closed-form solution, the whole framework is trained end-to-end. Our method outperforms the state of the art in both precision and plausibility which is assessed by a new metric.
翻译:多视角三维人体姿态估计(HPE)的快速发展得益于单目二维HPE技术的成熟和三维重建的几何原理。然而,由于忽略视角一致性导致的遮挡视角中的二维检测异常值,以及缺乏姿态连贯性导致的不可信三维姿态,仍然是挑战。为解决此问题,我们引入多视角融合模块,通过建立视角相关性来细化二维结果。随后,提出整体三角化方法,将整个姿态作为一个整体进行推断,并注入解剖学先验以维持姿态连贯性并提高可信度。解剖学先验通过主成分分析(PCA)提取,其输入为骨骼结构特征,能够从抽象到具体地分解全局上下文和关节点间关系。得益于闭式解,整个框架可以端到端训练。我们的方法在精度和可信度上均超越了现有最优方法,并通过新指标进行评估。