Accurate 3D face shape estimation is an enabling technology with applications in healthcare, security, and creative industries, yet current state-of-the-art methods either rely on self-supervised training with 2D image data or supervised training with very limited 3D data. To bridge this gap, we present a novel approach which uses a conditioned stable diffusion model for face image generation, leveraging the abundance of 2D facial information to inform 3D space. By conditioning stable diffusion on depth maps sampled from a 3D Morphable Model (3DMM) of the human face, we generate diverse and shape-consistent images, forming the basis of SynthFace. We introduce this large-scale synthesised dataset of 250K photorealistic images and corresponding 3DMM parameters. We further propose ControlFace, a deep neural network, trained on SynthFace, which achieves competitive performance on the NoW benchmark, without requiring 3D supervision or manual 3D asset creation.
翻译:精确的3D人脸形状估计是一项使能技术,广泛应用于医疗、安全及创意产业。然而,当前最先进的方法要么依赖于基于2D图像数据的自监督训练,要么依赖于极有限3D数据的监督训练。为填补这一空白,我们提出了一种新方法,利用条件式稳定扩散模型生成人脸图像,以充分利用丰富的2D面部信息来指导3D空间建模。通过以从人脸3D形变模型(3DMM)采样的深度图作为稳定扩散的条件,我们生成了多样且形状一致的图像,构成了SynthFace数据集的基础。我们引入了这一大规模合成数据集,包含25万张逼真图像及其对应3DMM参数。此外,我们提出了ControlFace——一个在SynthFace上训练深度神经网络——在无需3D监督或人工3D资产创建的情况下,该网络在NoW基准测试中取得了具有竞争力的性能。