DiffusionRenderer：基于视频扩散模型的神经逆向与正向渲染 (DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models)

Understanding and modeling lighting effects are fundamental tasks in computer vision and graphics. Classic physically-based rendering (PBR) accurately simulates the light transport, but relies on precise scene representations--explicit 3D geometry, high-quality material properties, and lighting conditions--that are often impractical to obtain in real-world scenarios. Therefore, we introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering within a holistic framework. Leveraging powerful video diffusion model priors, the inverse rendering model accurately estimates G-buffers from real-world videos, providing an interface for image editing tasks, and training data for the rendering model. Conversely, our rendering model generates photorealistic images from G-buffers without explicit light transport simulation. Experiments demonstrate that DiffusionRenderer effectively approximates inverse and forwards rendering, consistently outperforming the state-of-the-art. Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.

翻译：理解与建模光照效应是计算机视觉与图形学中的基础任务。经典的基于物理的渲染（PBR）能够精确模拟光传输，但其依赖于精确的场景表示——显式的三维几何、高质量材质属性与光照条件——这些信息在现实场景中往往难以获取。为此，我们提出了DiffusionRenderer，一种在统一框架内解决逆向渲染与正向渲染双重问题的神经方法。该方法利用强大的视频扩散模型先验，其逆向渲染模型能够从真实世界视频中准确估计G缓冲区，为图像编辑任务提供接口，并为渲染模型生成训练数据。相应地，我们的渲染模型无需显式光传输模拟即可从G缓冲区生成逼真的图像。实验表明，DiffusionRenderer能有效近似逆向与正向渲染，性能持续优于现有最佳方法。我们的模型支持从单段视频输入实现多种实际应用，包括重光照、材质编辑与逼真物体插入。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日