Being able to relight human performance is a fundamental task for post production and content creation. We present BodyReLux, a subject-specific video diffusion-based framework for relighting full-body human performances in a temporally consistent way. Our model is trained on a hybrid dataset of pixel-aligned video relighting pairs, covering a diverse combination of lighting conditions, performances and viewpoints. To acquire such dataset, we combine traditional static One-Light-at-a-Time (OLAT) capture and a novel dynamic performance capture in which two smoothly varying lighting sequences are rapidly interleaved. Because the lighting operates above the human flicker-fusion threshold, the interleaving does not appear to strobe. We train our video relighting model from a pretrained text-to-video model to fully leverage the generative priors for producing high quality videos. To achieve accurate lighting control, we introduce a new lighting conditioning method that represents each light source as a token. We further condition on sequences of lighting using masked attention to support dynamic lighting control. Together with a carefully designed data augmentation pipeline, we achieve photorealistic, robust, and temporally consistent video relighting of subject-specific human performances.
翻译:能够对人物表演进行重照明是后期制作和内容创作的一项基础任务。我们提出了BodyReLux,这是一个基于视频扩散的、面向特定对象的框架,用于以时序一致的方式对全身体表演进行重照明。我们的模型在一个由像素对齐的视频重照明对组成的混合数据集上训练,该数据集涵盖了光照条件、表演和视角的多样化组合。为获取此类数据集,我们结合了传统的静态“一次一光”(OLAT)捕捉技术,以及一种新颖的动态表演捕捉技术,其中两种平滑变化的光照序列被快速交错呈现。由于光照频率高于人类的闪烁融合阈值,这种交错不会产生闪烁感。我们利用预训练的文生视频模型进行训练,以充分发挥生成先验在高质量视频生成中的作用。为实现精确的光照控制,我们引入了一种新的光照条件方法,将每个光源表示为一个标记。我们进一步使用掩码注意力机制对光照序列进行条件控制,以支持动态光照。结合精心设计的数据增强流程,我们实现了针对特定对象人物表演的光照真实、鲁棒且时序一致的视频重照明。