Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoft-laforge.github.io/character/mosar/
翻译:从肖像图像重建虚拟化身在多媒体领域具有诸多应用,但至今仍是一项具有挑战性的研究问题。从单张图像中提取反射率图与几何信息本质上是欠定问题:几何重建属于一对多的映射问题,而反射率与光照的分离则尤为困难。在光照阶段受控条件下虽能获取精确的几何与反射率信息,但通过这种方式采集大规模数据集的成本高昂。此外,仅采用此类数据训练会导致模型在自然场景图像上的泛化能力较差。这促使我们提出MoSAR——一种基于单目图像的三维化身生成方法。本文提出一种半监督训练方案,通过结合光照阶段数据集与自然场景数据集进行联合学习以提升泛化性能,这得益于一种新型可微分着色公式的实现。实验表明,该方法能有效解耦面部本征参数,生成可重光照的虚拟化身。相较于现有最先进方法,MoSAR能估计更丰富的皮肤反射率图,并生成更逼真的虚拟化身。我们还发布了名为FFHQ-UV-Intrinsics的新数据集,这是首个大规模提供面部本征属性(漫反射、镜面反射、环境光遮蔽及半透明贴图)的公开数据集,涵盖1万个对象。项目网站及数据集可通过以下链接获取:https://ubisoft-laforge.github.io/character/mosar/