Accurate 3D face reconstruction from 2D images is an enabling technology with applications in healthcare, security, and creative industries. However, current state-of-the-art methods either rely on supervised training with very limited 3D data or self-supervised training with 2D image data. To bridge this gap, we present a method to generate a large-scale synthesised dataset of 250K photorealistic images and their corresponding shape parameters and depth maps, which we call SynthFace. Our synthesis method conditions Stable Diffusion on depth maps sampled from the FLAME 3D Morphable Model (3DMM) of the human face, allowing us to generate a diverse set of shape-consistent facial images that is designed to be balanced in race and gender. We further propose ControlFace, a deep neural network, trained on SynthFace, which achieves competitive performance on the NoW benchmark, without requiring 3D supervision or manual 3D asset creation. The complete SynthFace dataset will be made publicly available upon publication.
翻译:从二维图像进行精准的三维人脸重建是一项赋能技术,广泛应用于医疗健康、安全及创意产业。然而,当前最先进的方法要么依赖极其有限的三维数据进行监督训练,要么利用二维图像数据进行自监督训练。为弥补这一差距,我们提出了一种方法,生成包含25万张逼真图像及其对应形状参数和深度图的大规模合成数据集——SynthFace。我们的合成方法以FLAME三维形变模型(3DMM)抽样得到的深度图为条件,驱动Stable Diffusion,从而生成一组多样且形状一致的人脸图像,该数据集在设计上注重种族与性别的平衡。我们进一步提出深度神经网络ControlFace,该网络在SynthFace上训练,无需三维监督或手动三维资产创建,即在NoW基准测试中取得了具有竞争力的性能。完整的SynthFace数据集将在论文发表后公开发布。