This is a technical report for the GigaCrowd challenge. Reconstructing 3D crowds from monocular images is a challenging problem due to mutual occlusions, server depth ambiguity, and complex spatial distribution. Since no large-scale 3D crowd dataset can be used to train a robust model, the current multi-person mesh recovery methods can hardly achieve satisfactory performance in crowded scenes. In this paper, we exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images. To avoid scale variations, we first detect human bounding-boxes and 2D poses from the original images with off-the-shelf detectors. Then, we train a single-person mesh recovery network using existing in-the-wild image datasets. To promote a more reasonable spatial distribution, we further propose a crowd constraint to refine the single-person network parameters. With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image using a single-person backbone. The code will be publicly available at~\url{https://github.com/boycehbz/CrowdRec}.
翻译:本报告是GigaCrowd挑战赛的技术报告。由于相互遮挡、深度模糊以及复杂空间分布等问题,从单目图像重建三维人群极具挑战性。当前缺乏大规模三维人群数据集用于训练鲁棒模型,因此在拥挤场景下,现有的人体网格恢复方法难以取得令人满意的效果。本文通过挖掘人群特征,提出一种人群约束优化方法,以改进通用的单人体方法在人群图像上的表现。为避免尺度变化,我们首先利用现成检测器从原始图像中检测人体边界框和二维姿态,然后使用已有的野外图像数据集训练单人体网格恢复网络。为促进更合理的空间分布,我们进一步提出人群约束以优化单人体网络参数。经优化后,可借助单人体骨干网络从大规模人群图像中获取精确的人体姿态与形状,并得到合理的绝对位置信息。代码将发布于\url{https://github.com/boycehbz/CrowdRec}。