In this paper, we develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos under any pose, expression, and lighting condition in real-time. Existing relighting algorithms typically rely either on publicly available synthetic data, which yields poor relighting results, or instead on light stage data which is difficult to obtain. We show that by just capturing video of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition. Our key contribution is a novel neural relighting architecture that effectively separates the intrinsic appearance features - the geometry and reflectance of the face - from the source lighting and then combines them with the target lighting to generate a relit image. This neural network architecture enables smoothing of intrinsic appearance features leading to temporally stable video relighting. Both qualitative and quantitative evaluations show that our architecture improves portrait image relighting quality and temporal consistency over state-of-the-art approaches on both casually captured `Light Stage at Your Desk' (LSYD) and light-stage-captured `One Light At a Time' (OLAT) datasets.
翻译:本文提出了一种个性化视频重光照算法,可在任意姿态、表情及光照条件下实时生成高质量且时间一致的重光照视频。现有重光照算法通常依赖公开的合成数据(导致重光照效果较差)或难以获取的光照舞台数据。我们证明,仅需捕捉用户观看显示器上YouTube视频时的视频数据,即可训练出能够在任意条件下实现高质量重光照的个性化算法。本文核心贡献在于提出一种新颖的神经重光照架构:该架构有效分离面部固有外观特征(几何结构与反射属性)与光源特征,进而将固有特征与目标光照结合生成重光照图像。该神经网络架构通过对固有外观特征的平滑处理,实现了时间稳定的视频重光照。定性与定量评估表明,在随意采集的“桌面光照舞台”(LSYD)数据集与光照舞台采集的“逐次单光源”(OLAT)数据集上,本架构的人像重光照质量及时序一致性均优于现有最优方法。