Multi-person pose estimation (MPPE) presents a formidable yet crucial challenge in computer vision. Most existing methods predominantly concentrate on isolated interaction either between instances or joints, which is inadequate for scenarios demanding concurrent localization of both instances and joints. This paper introduces a novel CNN-based single-stage method, named Dual-path Hierarchical Relation Network (DHRNet), to extract instance-to-joint and joint-to-instance interactions concurrently. Specifically, we design a dual-path interaction modeling module (DIM) that strategically organizes cross-instance and cross-joint interaction modeling modules in two complementary orders, enriching interaction information by integrating merits from different correlation modeling branches. Notably, DHRNet excels in joint localization by leveraging information from other instances and joints. Extensive evaluations on challenging datasets, including COCO, CrowdPose, and OCHuman datasets, showcase DHRNet's state-of-the-art performance. The code will be released at https://github.com/YHDang/dhrnet-multi-pose-estimation.
翻译:多人姿态估计(MPPE)是计算机视觉中一项艰巨但至关重要的挑战。现有方法主要专注于实例间或关节点间的孤立交互,这在需要同时定位实例和关节点的场景中有所不足。本文提出了一种基于CNN的新型单阶段方法——双路径层次关系网络(DHRNet),以同时提取实例-关节点和关节点-实例的交互信息。具体而言,我们设计了一个双路径交互建模模块(DIM),该模块以两种互补顺序策略性地组织跨实例和跨关节点交互建模模块,通过整合不同关联建模分支的优势来丰富交互信息。值得注意的是,DHRNet通过利用来自其他实例和关节点信息在关节点定位方面表现出色。在COCO、CrowdPose和OCHuman等挑战性数据集上的广泛评估展示了DHRNet的最优性能。代码将发布于https://github.com/YHDang/dhrnet-multi-pose-estimation。