F-RNG: Feed-Forward Relightable Neural Gaussians

Capturing relightable 3D assets from real-world objects is a widely researched problem. Several per-scene optimization-based methods, based on 3D Gaussian splatting (3DGS), support relighting; however, they usually require dense input views, and their overfitting nature makes it difficult to generalize across scenes. Unlike per-scene optimization methods, generalized feed-forward models can directly reconstruct Gaussians from sparse input views. However, the resulting assets have baked-in illumination and cannot be easily used for relighting. In this paper, we present F-RNG, a feed-forward framework that directly generates relightable 3DGS assets from sparse-view inputs. Training such a model from scratch can require massive data and computing resources, and it is especially challenging to generate relightable assets in a feed-forward manner with acceptable cost. We develop F-RNG upon an existing large reconstruction model (LRM) to extract relightable representations, while also utilizing priors from an intrinsic decomposition model (IDM). Specifically, we first introduce a latent-interpolated fine-grained geometry synthesis to enhance the LRM's geometry representation. Second, we propose a prior-guided relightable appearance distillation to extract relightable neural representations by incorporating IDM priors. Finally, a universal neural renderer enables flexible and high-fidelity relighting. F-RNG requires neither re-training nor fine-tuning of the underlying LRMs, thus can automatically benefit from better LRMs and IDMs in the future. With only small networks that can be trained with affordable data and computational resources, F-RNG avoids the repetitive inference of large models under different light conditions. By comparison to the state-of-the-art LRM-based relighting method, F-RNG achieves ~25x faster relighting, as well as superior quality (~+2.0 dB).

翻译：从真实世界物体中捕获可重光照的3D资产是一个被广泛研究的问题。基于3D高斯泼溅（3DGS）的若干逐场景优化方法支持重光照；然而，这些方法通常需要密集的输入视图，且其过拟合特性使其难以跨场景泛化。与逐场景优化方法不同，通用的前馈模型可直接从稀疏输入视图重建高斯场，但生成的资产具有固化光照，难以直接用于重光照。本文提出F-RNG，一种前馈框架，可直接从稀疏视图输入生成可重光照的3DGS资产。从头训练此类模型可能需要海量数据和计算资源，且以前馈方式生成可重光照资产在可接受成本下尤为困难。我们基于现有大型重建模型（LRM）构建F-RNG以提取可重光照表示，同时利用内在分解模型（IDM）的先验知识。具体而言：首先，我们引入潜在空间插值的细粒度几何合成以增强LRM的几何表示；其次，提出先验引导的可重光照外观蒸馏，通过融合IDM先验提取可重光照神经表示；最后，通用神经渲染器实现灵活且高保真的重光照。F-RNG无需对底层LRM进行重新训练或微调，因此未来可自动受益于更优的LRM和IDM。仅需以可负担的数据和计算资源训练小型网络，F-RNG避免了大型模型在不同光照条件下的重复推理。与基于LRM的最先进重光照方法相比，F-RNG实现约25倍的重光照加速，同时质量提升约2.0 dB。