3D human pose estimation (HPE) is characterized by intricate local and global dependencies among joints. Conventional supervised losses are limited in capturing these correlations because they treat each joint independently. Previous studies have attempted to promote structural consistency through manually designed priors or rule-based constraints; however, these approaches typically require manual specification and are often non-differentiable, limiting their use as end-to-end training objectives. We propose SEAL-pose, a data-driven framework in which a learnable loss-net trains a pose-net by evaluating structural plausibility. Rather than relying on hand-crafted priors, our joint-graph-based design enables the loss-net to learn complex structural dependencies directly from data. Extensive experiments on three 3D HPE benchmarks with eight backbones show that SEAL-pose reduces per-joint errors and improves pose plausibility compared with the corresponding backbones across all settings. Beyond improving each backbone, SEAL-pose also outperforms models with explicit structural constraints, despite not enforcing any such constraints. Finally, we analyze the relationship between the loss-net and structural consistency, and evaluate SEAL-pose in cross-dataset and in-the-wild settings.
翻译:三维人体姿态估计(HPE)的特点是关节间存在复杂的局部与全局依赖关系。传统的监督损失函数因独立处理每个关节而难以捕捉这些关联。先前的研究尝试通过人工设计的先验或基于规则的约束来提升结构一致性;然而,这些方法通常需要手动设定参数且往往不可微分,限制了其作为端到端训练目标的应用。我们提出SEAL-pose,一种数据驱动的框架,其中可学习的损失网络通过评估结构合理性来训练姿态网络。我们的基于关节图的设计使损失网络能够直接从数据中学习复杂的结构依赖关系,而无需依赖手工先验。在三个三维HPE基准数据集上使用八种骨干网络进行的广泛实验表明,在所有设置中,SEAL-pose相比对应骨干网络均降低了单关节误差并提升了姿态合理性。除了改进各骨干网络外,SEAL-pose的性能也优于采用显式结构约束的模型,尽管其本身并未强制任何此类约束。最后,我们分析了损失网络与结构一致性之间的关系,并在跨数据集及真实场景设置下评估了SEAL-pose。