Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide range of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to insufficient training data, depth ambiguities, and occlusion. The goal of this survey paper is to provide a comprehensive review of recent deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on their input data and inference procedures. More than 250 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are concluded. A regularly updated project page is provided: \url{https://github.com/zczcwh/DL-HPE}
翻译:人体姿态估计旨在从图像和视频等输入数据中定位人体部位并构建人体表示(如人体骨架)。过去十年来,该领域受到越来越多的关注,并被广泛应用于人机交互、运动分析、增强现实和虚拟现实等领域。尽管近年来基于深度学习的解决方案在人体姿态估计中取得了高性能表现,但由于训练数据不足、深度歧义和遮挡等问题,仍存在诸多挑战。本综述论文旨在通过系统分析和比较现有解决方案的输入数据及推理过程,全面回顾近年来基于深度学习的2D和3D姿态估计方法。本综述涵盖了自2014年以来的250余篇研究论文,同时介绍了2D和3D人体姿态估计数据集与评估指标。总结了所综述方法在公开数据集上的定量性能比较并展开讨论,最后归纳了现有挑战、应用场景及未来研究方向。本文提供定期更新的项目页面:\url{https://github.com/zczcwh/DL-HPE}