Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrisic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoftlaforge.github.io/character/mosar
翻译:从肖像图像重建虚拟化身在多媒体领域具有广泛应用,但仍是一个具有挑战性的研究问题。从单张图像中提取反射率图和几何形状存在病态性:几何恢复是一个一对多的映射问题,而反射率与光照难以解耦。在光照阶段的可控条件下可获取精确的几何与反射率信息,但以这种方式采集大规模数据集成本高昂。此外,仅使用此类数据训练会导致模型对自然场景图像的泛化能力较差。这促使我们提出MoSAR——一种基于单目图像的三维虚拟化身生成方法。我们提出一种半监督训练方案,通过同时利用光照阶段数据和自然场景数据集进行学习,从而提升泛化能力。这一目标通过新颖的可微分着色公式实现。实验表明,该方法能有效解耦面部固有参数,生成可重光照的虚拟化身。相较于现有最先进方法,MoSAR估计出更丰富的皮肤反射率图集,并生成更逼真的虚拟化身。我们还发布了名为FFHQ-UV-Intrinsics的新数据集,这是首个大规模公开标量面部属性(漫反射、高光反射、环境光遮蔽与半透明图)的数据集,涵盖1万名受试者。项目网站及数据集可通过以下链接访问:https://ubisoftlaforge.github.io/character/mosar