Reconstructing 3D human shapes from 2D images has received increasing attention recently due to its fundamental support for many high-level 3D applications. Compared with natural images, freehand sketches are much more flexible to depict various shapes, providing a high potential and valuable way for 3D human reconstruction. However, such a task is highly challenging. The sparse abstract characteristics of sketches add severe difficulties, such as arbitrariness, inaccuracy, and lacking image details, to the already badly ill-posed problem of 2D-to-3D reconstruction. Although current methods have achieved great success in reconstructing 3D human bodies from a single-view image, they do not work well on freehand sketches. In this paper, we propose a novel sketch-driven multi-faceted decoder network termed SketchBodyNet to address this task. Specifically, the network consists of a backbone and three separate attention decoder branches, where a multi-head self-attention module is exploited in each decoder to obtain enhanced features, followed by a multi-layer perceptron. The multi-faceted decoders aim to predict the camera, shape, and pose parameters, respectively, which are then associated with the SMPL model to reconstruct the corresponding 3D human mesh. In learning, existing 3D meshes are projected via the camera parameters into 2D synthetic sketches with joints, which are combined with the freehand sketches to optimize the model. To verify our method, we collect a large-scale dataset of about 26k freehand sketches and their corresponding 3D meshes containing various poses of human bodies from 14 different angles. Extensive experimental results demonstrate our SketchBodyNet achieves superior performance in reconstructing 3D human meshes from freehand sketches.
翻译:从二维图像重建三维人体形状近年来因其对许多高级三维应用的基础支持而受到越来越多的关注。与自然图像相比,手绘草图在描绘各种形状方面更为灵活,为三维人体重建提供了一种具有高潜力且有价值的方式。然而,此类任务极具挑战性。草图的稀疏抽象特征给本就严重病态的二维到三维重建问题增加了额外的困难,例如随意性、不准确性以及缺乏图像细节。尽管当前方法在从单视图图像重建三维人体方面取得了巨大成功,但它们在手绘草图上效果不佳。本文提出了一种新颖的草图驱动多面解码器网络,命名为SketchBodyNet,以应对此任务。具体而言,该网络由一个主干网络和三个独立的注意力解码分支组成,其中每个解码器均利用多头自注意力模块获得增强特征,随后接入多层感知机。多面解码器分别旨在预测相机参数、形状参数和姿态参数,这些参数随后与SMPL模型关联,以重建相应的三维人体网格。在学习过程中,现有三维网格通过相机参数投影为带有关节点的二维合成草图,并与手绘草图结合以优化模型。为验证我们的方法,我们收集了一个大规模数据集,包含约2.6万张手绘草图及其对应的三维网格,涵盖从14个不同角度观察的人体各种姿态。大量实验结果表明,我们的SketchBodyNet在手绘草图重建三维人体网格方面实现了优越性能。