Volumetric video relighting is essential for bringing captured performances into virtual worlds, but current approaches struggle to deliver temporally stable, production-ready results. Diffusion-based intrinsic decomposition methods show promise for single frames, yet suffer from stochastic noise and instability when extended to sequences, while video diffusion models remain constrained by memory and scale. We propose a hybrid relighting framework that combines diffusion-derived material priors with temporal regularization and physically motivated rendering. Our method aggregates multiple stochastic estimates of per-frame material properties into temporally consistent shading components, using optical-flow-guided regularization. For indirect effects such as shadows and reflections, we extract a mesh proxy from Gaussian Opacity Fields and render it within a standard graphics pipeline. Experiments on real and synthetic captures show that this hybrid strategy achieves substantially more stable relighting across sequences than diffusion-only baselines, while scaling beyond the clip lengths feasible for video diffusion. These results indicate that hybrid approaches, which balance learned priors with physically grounded constraints, are a practical step toward production-ready volumetric video relighting.
翻译:体积视频重光照对于将捕捉的表演带入虚拟世界至关重要,但现有方法难以提供时间稳定、可用于制作的成果。基于扩散的本征分解方法在单帧上展现出潜力,但在扩展到序列时受到随机噪声和不稳定性的困扰,而视频扩散模型仍受限于内存和规模。我们提出了一种混合重光照框架,该框架将扩散导出的材质先验与时间正则化及物理驱动的渲染相结合。我们的方法利用光流引导的正则化,将多帧材质属性的多个随机估计聚合成时间一致的着色分量。对于阴影和反射等间接效果,我们从高斯不透明度场中提取网格代理,并在标准图形管线中进行渲染。在真实和合成捕捉数据上的实验表明,与纯扩散基线相比,这种混合策略在序列上实现了显著更稳定的重光照效果,同时其可处理的片段长度超出了视频扩散模型的可行范围。这些结果表明,平衡学习先验与物理约束的混合方法,是迈向可用于制作的体积视频重光照的一个实用步骤。