DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows

We introduce a novel approach to single-view face relighting in the wild, addressing challenges such as global illumination and cast shadows. A common scheme in recent methods involves intrinsically decomposing an input image into 3D shape, albedo, and lighting, then recomposing it with the target lighting. However, estimating these components is error-prone and requires many training examples with ground-truth lighting to generalize well. Our work bypasses the need for accurate intrinsic estimation and can be trained solely on 2D images without any light stage data, relit pairs, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We propose a novel conditioning technique that simplifies modeling the complex interaction between light and geometry. It uses a rendered shading reference along with a shadow map, inferred using a simple and effective technique, to spatially modulate the DDIM. Moreover, we propose a single-shot relighting framework that requires just one network pass, given pre-processed data, and even outperforms the teacher model across all metrics. Our method realistically relights in-the-wild images with temporally consistent cast shadows under varying lighting conditions. We achieve state-of-the-art performance on the standard benchmark Multi-PIE and rank highest in user studies. Please visit our page: https://diffusion-face-relighting-pp.github.io

翻译：我们提出了一种新颖的单视图野外人脸重照明方法，旨在解决全局照明和投射阴影等挑战。近年来，常见方案涉及将输入图像内在分解为3D形状、反照率和照明，然后根据目标照明重新合成。然而，估计这些组件容易出现误差，且需要大量带有真实照明标注的训练样本才能良好泛化。我们的方法绕过了精确内在估计的需求，仅凭2D图像即可训练，无需任何光照舞台数据、成对重照明图像、多视图图像或照明真值。核心思想是利用条件扩散隐式模型（DDIM），对解耦的光照编码以及从现成估计器推断出的3D形状和面部身份相关编码进行解码。我们提出了一种新颖的条件约束技术，简化了光照与几何之间复杂交互的建模过程。该技术利用渲染的阴影参考结合阴影图（通过简单高效的技术推断得出），对DDIM进行空间调制。此外，我们提出了一种单次重照明框架，在预处理数据后仅需单次网络推理，且在所有指标上均优于教师模型。我们的方法能真实地对野外图像进行重照明，在不同光照条件下生成时间上一致的投射阴影。我们在标准基准Multi-PIE上达到了最先进性能，并在用户研究中排名最高。请访问我们的页面：https://diffusion-face-relighting-pp.github.io