Estimating human pose and shape from monocular images is a long-standing problem in computer vision. Since the release of statistical body models, 3D human mesh recovery has been drawing broader attention. With the same goal of obtaining well-aligned and physically plausible mesh results, two paradigms have been developed to overcome challenges in the 2D-to-3D lifting process: i) an optimization-based paradigm, where different data terms and regularization terms are exploited as optimization objectives; and ii) a regression-based paradigm, where deep learning techniques are embraced to solve the problem in an end-to-end fashion. Meanwhile, continuous efforts are devoted to improving the quality of 3D mesh labels for a wide range of datasets. Though remarkable progress has been achieved in the past decade, the task is still challenging due to flexible body motions, diverse appearances, complex environments, and insufficient in-the-wild annotations. To the best of our knowledge, this is the first survey that focuses on the task of monocular 3D human mesh recovery. We start with the introduction of body models and then elaborate recovery frameworks and training objectives by providing in-depth analyses of their strengths and weaknesses. We also summarize datasets, evaluation metrics, and benchmark results. Open issues and future directions are discussed in the end, hoping to motivate researchers and facilitate their research in this area. A regularly updated project page can be found at https://github.com/tinatiansjz/hmr-survey.
翻译:从单目图像估计人体姿态与形状是计算机视觉中长期存在的问题。自统计人体模型发布以来,三维人体网格恢复逐渐受到广泛关注。在获得对齐良好且物理合理网格结果的共同目标下,学界发展出两种范式以克服从二维到三维升维过程中的挑战:其一是基于优化的范式,利用不同数据项与正则化项作为优化目标;其二是基于回归的范式,采用深度学习技术以端到端方式解决该问题。与此同时,研究者持续致力于提升各类数据集的三维网格标签质量。尽管过去十年已取得显著进展,但由于人体动作灵活、外观多样、环境复杂以及野外标注数据不足,该任务仍具挑战性。据我们所知,本文是首个聚焦单目三维人体网格恢复任务的综述。我们从人体模型介绍入手,继而通过深入分析各类方法优劣来阐述恢复框架与训练目标,同时总结数据集、评估指标及基准测试结果。最后讨论开放问题与未来方向,旨在激发研究者兴趣并推动该领域研究。定期更新的项目页面见https://github.com/tinatiansjz/hmr-survey。