F-RNG: Feed-Forward Relightable Neural Gaussians

Capturing relightable 3D assets from real-world objects is a widely researched problem. Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes. Unlike per-scene optimization methods, generalized feed-forward models can directly reconstruct Gaussians from sparse input views. However, the resulting assets have baked-in illumination and cannot be easily used for relighting. In this paper, we present F-RNG, a feed-forward framework that directly generates relightable 3DGS assets from sparse-view inputs. Training such a model from scratch can require massive data and computing resources, and it is especially challenging to generate relightable assets in a feed-forward manner with acceptable cost. We develop F-RNG upon an existing large reconstruction model (LRM) to extract relightable representations, while also utilizing priors from an intrinsic decomposition model (IDM). Specifically, we first introduce a latent-interpolated fine-grained geometry synthesis to enhance the LRM's geometry representation. Second, we propose a prior-guided relightable appearance distillation to extract relightable neural representations by incorporating IDM priors. Finally, a universal neural renderer enables flexible and high-fidelity relighting. F-RNG requires neither re-training nor fine-tuning of the underlying LRMs, thus can automatically benefit from better LRMs and IDMs in the future. With only small networks that can be trained with affordable data and computational resources, F-RNG avoids the repetitive inference of large models under different light conditions. By comparison to the state-of-the-art LRM-based relighting method, F-RNG achieves ~25x faster relighting, as well as superior quality (~+2.0 dB).

翻译：从真实世界物体中捕获可重光照的三维资产是一个被广泛研究的问题。基于三维高斯体泼溅（3DGS）的若干按场景优化方法支持重光照，但它们通常需要密集的输入视图，且因过拟合特性难以跨场景泛化。与按场景优化方法不同，通用的前馈模型可直接从稀疏输入视图重建高斯体。然而，所得资产内含固化光照，无法直接用于重光照。本文提出F-RNG，一种从稀疏视图输入直接生成可重光照3DGS资产的前馈框架。从头训练此类模型需要海量数据与计算资源，而以可接受成本以前馈方式生成可重光照资产尤为困难。我们基于现有的大规模重建模型（LRM）开发F-RNG以提取可重光照表示，同时利用内在分解模型（IDM）的先验知识。具体而言，我们首先引入潜在插值细粒度几何合成以增强LRM的几何表示；其次提出先验引导的可重光照外观蒸馏，通过融入IDM先验提取可重光照神经表示；最后，通用神经渲染器实现灵活且高保真的重光照。F-RNG无需对底层LRM进行重训练或微调，因此可自动受益于未来更优的LRM与IDM。仅需配备可负担数据与计算资源的小型网络，F-RNG避免了大型模型在不同光照条件下重复推理。与基于LRM的最先进重光照方法相比，F-RNG实现约25倍加速的重光照，并具有更优质量（约+2.0 dB）。