Existing methods achieve high-quality facial albedo capture under controllable lighting, which increases capture cost and limits usability. We propose WildCap, a novel method for high-quality facial albedo capture from a smartphone video recorded in the wild. To disentangle high-quality albedo from complex lighting effects in in-the-wild captures, we propose a novel hybrid inverse rendering framework. We first apply a data-driven method, i.e., SwitchLight, to convert the captured images into more constrained conditions and then adopt model-based inverse rendering. However, unavoidable local artifacts in network predictions, such as shadow-baking, are non-physical and thus hinder accurate inverse rendering of lighting and material. To address this, we propose a novel texel grid lighting model to explain non-physical effects as clean albedo illuminated by local physical lighting. During optimization, we jointly sample a diffusion prior for the albedo map and optimize the lighting, effectively resolving scale ambiguity between local lights and albedo. Other reflectance maps are then predicted from the albedo. Our method achieves significantly better results than prior arts in the same capture setup, closing the quality gap between in-the-wild and controllable recordings by a large margin.
翻译:现有方法能够在可控光照条件下实现高质量的人脸反照率采集,但这增加了采集成本并限制了实用性。我们提出WildCap,一种从智能手机在野外录制视频中实现高质量人脸反照率采集的新方法。为了在野外采集场景中将高质量反照率与复杂光照效应解耦,我们提出了一种新颖的混合逆向渲染框架。我们首先应用数据驱动方法(即SwitchLight)将采集图像转换为约束更强的条件,随后采用基于模型的逆向渲染。然而,网络预测中不可避免的局部伪影(如阴影烘焙)是非物理性的,因此会阻碍光照与材质的精确逆向渲染。为解决此问题,我们提出了一种新颖的纹理网格光照模型,将非物理效应解释为由局部物理光照照射的纯净反照率。在优化过程中,我们联合采样反照率图的扩散先验并优化光照,有效解决了局部光源与反照率之间的尺度模糊性问题。其他反射率图随后根据反照率进行预测。在相同采集设置下,我们的方法取得了显著优于现有技术的结果,大幅缩小了野外采集与可控录制之间的质量差距。