In 3D face reconstruction, orthogonal projection has been widely employed to substitute perspective projection to simplify the fitting process. This approximation performs well when the distance between camera and face is far enough. However, in some scenarios that the face is very close to camera or moving along the camera axis, the methods suffer from the inaccurate reconstruction and unstable temporal fitting due to the distortion under the perspective projection. In this paper, we aim to address the problem of single-image 3D face reconstruction under perspective projection. Specifically, a deep neural network, Perspective Network (PerspNet), is proposed to simultaneously reconstruct 3D face shape in canonical space and learn the correspondence between 2D pixels and 3D points, by which the 6DoF (6 Degrees of Freedom) face pose can be estimated to represent perspective projection. Besides, we contribute a large ARKitFace dataset to enable the training and evaluation of 3D face reconstruction solutions under the scenarios of perspective projection, which has 902,724 2D facial images with ground-truth 3D face mesh and annotated 6DoF pose parameters. Experimental results show that our approach outperforms current state-of-the-art methods by a significant margin. The code and data are available at https://github.com/cbsropenproject/6dof_face.
翻译:在三维人脸重建中,正交投影被广泛用于替代透视投影以简化拟合过程。当相机与人脸之间的距离足够远时,这种近似效果良好。然而,在人脸非常接近相机或沿相机轴向运动的情况下,由于透视投影下的畸变,现有方法会面临重建不准确和时间拟合不稳定的问题。本文旨在解决透视投影下的单图像三维人脸重建问题。具体而言,我们提出了一种深度神经网络——透视网络(PerspNet),它能够在规范空间中同时重建三维人脸形状,并学习二维像素与三维点之间的对应关系,从而估计代表透视投影的六自由度人脸姿态。此外,我们贡献了一个大规模的ARKitFace数据集,用以支持透视投影场景下三维人脸重建方案的训练与评估,该数据集包含902,724张二维人脸图像,并附有真实三维人脸网格和标注的六自由度姿态参数。实验结果表明,我们的方法显著优于当前最先进的方法。代码和数据可在https://github.com/cbsropenproject/6dof_face获取。