Learning to Relight Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation

from arxiv, To appear in ACM Transactions on Graphics (SIGGRAPH Asia 2022). 21 pages, 25 figures, 7 tables. Project page: https://research.nvidia.com/labs/dir/lumos/

Given a portrait image of a person and an environment map of the target lighting, portrait relighting aims to re-illuminate the person in the image as if the person appeared in an environment with the target lighting. To achieve high-quality results, recent methods rely on deep learning. An effective approach is to supervise the training of deep neural networks with a high-fidelity dataset of desired input-output pairs, captured with a light stage. However, acquiring such data requires an expensive special capture rig and time-consuming efforts, limiting access to only a few resourceful laboratories. To address the limitation, we propose a new approach that can perform on par with the state-of-the-art (SOTA) relighting methods without requiring a light stage. Our approach is based on the realization that a successful relighting of a portrait image depends on two conditions. First, the method needs to mimic the behaviors of physically-based relighting. Second, the output has to be photorealistic. To meet the first condition, we propose to train the relighting network with training data generated by a virtual light stage that performs physically-based rendering on various 3D synthetic humans under different environment maps. To meet the second condition, we develop a novel synthetic-to-real approach to bring photorealism to the relighting network output. In addition to achieving SOTA results, our approach offers several advantages over the prior methods, including controllable glares on glasses and more temporally-consistent results for relighting videos.

翻译：给定一个人物肖像图像和目标光照的环境贴图，肖像重光照旨在将图像中的人物重新照亮，使其仿佛置身于目标光照环境下。为获得高质量结果，现有方法依赖深度学习技术。一种有效途径是利用高保真度输入-输出配对数据集（通过光照舞台采集）来监督深度神经网络训练。然而，此类数据采集需要昂贵的专用捕获装置和大量人力投入，仅限少数资源充足的实验室使用。针对这一局限，我们提出一种无需光照舞台即可达到与当前最优重光照方法相当效果的新方案。该方案基于以下认识：成功的肖像重光照需满足两个条件——方法需模拟物理基元重光照的行为特征，且输出必须具有照片级真实感。为满足第一条件，我们提出利用虚拟光照舞台生成的训练数据训练重光照网络，该舞台可对多种三维合成人体在不同环境贴图下执行物理基元渲染。为满足第二条件，我们开发了一种新颖的合成-真实域适应方法，赋予重光照网络输出照片级真实感。除获得最优结果外，本方法较先前技术具有多项优势，包括可控的眼镜反光效果以及更为时间一致的视频重光照结果。