Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey that focuses on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area. A regularly updated project page can be found at https://github.com/tinatiansjz/hmr-survey.
翻译:从单目图像中估计人体姿态与形状是计算机视觉领域长期存在的问题。自统计人体模型发布以来,三维人体网格恢复引起了广泛关注。为获得对齐良好且物理真实的三维网格结果,现有研究发展出两种范式以应对从二维到三维提升过程中的挑战:一是基于优化的范式,利用不同数据项和正则化项作为优化目标;二是基于回归的范式,采用深度学习技术以端到端的方式解决该问题。同时,研究者们持续致力于提升各类数据集的三维网格标签质量。尽管过去十年已取得显著进展,但由于人体运动灵活、外观多样、环境复杂以及野外标注数据不足,该任务仍具挑战性。据我们所知,这是首篇聚焦单目三维人体网格恢复任务的综述。我们从人体模型介绍入手,通过深入分析各类恢复框架与训练目标的优劣势展开阐述,并汇总了相关数据集、评估指标及基准测试结果。最后讨论了待解决的关键问题与未来研究方向,以期为该领域研究者提供启发与便利。定期更新的项目页面详见:https://github.com/tinatiansjz/hmr-survey。