In this paper, we develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos under any pose, expression, and lighting condition in real-time. Existing relighting algorithms typically rely either on publicly available synthetic data, which yields poor relighting results, or on actual light stage data which is difficult to acquire. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition. Our key contribution is a novel image-based neural relighting architecture that effectively separates the intrinsic appearance features - the geometry and reflectance of the face - from the source lighting and then combines them with the target lighting to generate a relit image. This neural architecture enables smoothing of intrinsic appearance features leading to temporally stable video relighting. Both qualitative and quantitative evaluations show that our architecture improves portrait image relighting quality and temporal consistency over state-of-the-art approaches on both casually captured `Light Stage at Your Desk' (LSYD) and light-stage-captured `One Light At a Time' (OLAT) datasets.
翻译:本文提出一种个性化视频重光照算法,能够实时生成在任何姿态、表情和光照条件下均具有高质量与时序一致性的重光照视频。现有重光照算法通常依赖公开可得的合成数据(其重光照效果较差)或实际光照系统数据(其获取难度较高)。我们证明,仅需捕获用户观看显示器上YouTube视频的录制画面,即可训练出能够在任意条件下执行高质量重光照的个性化算法。我们的核心贡献在于提出一种新颖的基于图像的神经重光照架构,该架构能有效分离内在表观特征(面部的几何结构与反射特性)与源光照条件,随后将其与目标光照融合以生成重光照图像。此神经架构能够平滑内在表观特征,从而实现时序稳定的视频重光照效果。定性与定量评估均表明,在随意采集的"桌面光照系统"数据集与光照系统采集的"单次单光源"数据集上,我们的架构在肖像图像重光照质量与时序一致性方面均优于现有先进方法。