We introduce TRON, a rendering framework that combines 3D Gaussian ray tracing with neural rendering to enable realistic and controllable rendering of real-world 3D scenes under novel lighting, dynamic object motion, object insertion, and material editing. Prior approaches that rely solely on physically based rendering (PBR) of Gaussian representations struggle to achieve realistic relighting due to imperfections in reconstructed geometry, material estimates, and light transport estimation. At the same time, neural rendering methods often lack an explicit scene representation, limiting their ability to support interactive editing with fine-grained manipulation. TRON bridges these two paradigms. We use intrinsic decomposition priors from a learned inverse rendering model to regularize the material properties of a Gaussian field, and repurpose a ray tracer to provide radiometric guidance rather than final pixels. By treating this output as a structured 3D scaffold, we empower a lightweight neural renderer to bridge the domain gap between shading-model constrained estimates and photorealistic output. Our key insight is that the combination of explicit 3D knowledge with robust material priors provides speed and controllability, while neural rendering enables the synthesis of photorealistic images. To support real-world scenarios, we train our neural renderer with a multi-stage strategy consisting of large-scale pretraining and targeted fine-tuning on a newly constructed dataset of 2.1M rendered synthetic and real-world frames from 3D reconstructions. TRON outperforms Gaussian-based relighting methods in realism, and prior neural renderers in editability and speed. To the best of our knowledge, TRON is the first method to enable practical interactive applications in captured 3D environments, offering realistic appearance under dynamic geometric, lighting and material conditions.
翻译:摘要:我们提出TRON,一种融合三维高斯光线追踪与神经渲染的框架,可在新光照、动态物体运动、物体插入及材质编辑场景下,对真实世界三维场景实现逼真且可控的渲染。现有仅依赖基于物理渲染(PBR)的高斯表征方法,因重建几何、材质估计与光传输估计的缺陷,难以实现逼真重光照;而神经渲染方法因缺乏显式场景表征,难以支持细粒度的交互式编辑。TRON桥接这两类范式:利用从学习型逆渲染模型中提取的本征分解先验正则化高斯场的材质属性,并重新利用光线追踪器提供辐射度量引导而非最终像素。通过将该输出视为结构化三维骨架,我们赋予轻量级神经渲染器能力,弥合着色模型约束估计与逼真输出之间的领域差异。核心洞见在于:显式三维知识与稳健材质先验的结合提供了速度与可控性,而神经渲染则能合成逼真图像。为支持真实场景应用,我们采用多阶段策略训练神经渲染器,包括大规模预训练与在新建数据集(含从三维重建生成的210万帧合成及真实世界渲染帧)上的定向微调。TRON在逼真度上超越基于高斯的重光照方法,在可编辑性与速度上优于现有神经渲染器。据我们所知,TRON是首个在捕获三维环境中实现实用交互式应用的方法,能在动态几何、光照与材质条件下呈现逼真外观。