Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics. Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations--explicit 3D geometry, high-quality material properties, and lighting conditions--that are often impractical to obtain in real-world scenarios. Therefore, we introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework. Leveraging powerful video diffusion model priors, the inverse rendering model accurately estimates G-buffers from real-world videos, providing an interface for image editing tasks, and training data for the rendering model. Conversely, our rendering model generates photorealistic images from G-buffers without explicit light transport simulation. Experiments demonstrate that DiffusionRenderer effectively approximates inverse and forwards rendering, consistently outperforming the state-of-the-art. Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.
翻译:理解与建模光照效应是计算机视觉与图形学中的基础任务。经典的基于物理的渲染(PBR)能够精确模拟光传输,但其依赖于精确的场景表示——显式的三维几何、高质量材质属性与光照条件——这些信息在现实场景中往往难以获取。为此,我们提出了DiffusionRenderer,一种在统一框架内解决逆向渲染与正向渲染双重问题的神经方法。该方法利用强大的视频扩散模型先验,其逆向渲染模型能够从真实世界视频中准确估计G缓冲区,为图像编辑任务提供接口,并为渲染模型生成训练数据。相应地,我们的渲染模型无需显式光传输模拟即可从G缓冲区生成逼真的图像。实验表明,DiffusionRenderer能有效近似逆向与正向渲染,性能持续优于现有最佳方法。我们的模型支持从单段视频输入实现多种实际应用,包括重光照、材质编辑与逼真物体插入。