Reconstructing 3D assets from images has long required separate pipelines for geometry reconstruction, material estimation, and illumination recovery, each with distinct limitations and computational overhead. We present ReLi3D, the first unified end-to-end pipeline that simultaneously reconstructs complete 3D geometry, spatially-varying physically-based materials, and environment illumination from sparse multi-view images in under one second. Our key insight is that multi-view constraints can dramatically improve material and illumination disentanglement, a problem that remains fundamentally ill-posed for single-image methods. Key to our approach is the fusion of the multi-view input via a transformer cross-conditioning architecture, followed by a novel unified two-path prediction strategy. The first path predicts the object's structure and appearance, while the second path predicts the environment illumination from image background or object reflections. This, combined with a differentiable Monte Carlo multiple importance sampling renderer, creates an optimal illumination disentanglement training pipeline. In addition, with our mixed domain training protocol, which combines synthetic PBR datasets with real-world RGB captures, we establish generalizable results in geometry, material accuracy, and illumination quality. By unifying previously separate reconstruction tasks into a single feed-forward pass, we enable near-instantaneous generation of complete, relightable 3D assets. Project Page: https://reli3d.jdihlmann.com/
翻译:从图像中重建三维资产长期以来需要独立的几何重建、材质估计和照明恢复流水线,每个步骤都存在不同的局限性和计算开销。我们提出ReLi3D,这是首个统一的端到端流水线,能够在不到一秒内从稀疏多视角图像中同步重建完整的几何结构、空间变化的基于物理的材质以及环境照明。我们的关键洞察在于,多视角约束可以显著改善材质与照明的解耦问题,而这一问题对于单图像方法而言本质上仍是不适定的。本方法的核心是通过Transformer交叉条件架构融合多视角输入,随后采用一种新颖的统一双路径预测策略。第一路径预测物体的结构与外观,第二路径则根据图像背景或物体反射预测环境照明。结合可微分的蒙特卡洛多重重要性采样渲染器,我们构建了最优的照明解耦训练流程。此外,通过混合域训练协议(结合合成PBR数据集与真实世界RGB采集数据),我们在几何精度、材质准确性与照明质量方面建立了可泛化的结果。通过将此前分离的重建任务统一为单一前馈过程,我们实现了近乎即时地生成完整、可重光照的三维资产。项目页面:https://reli3d.jdihlmann.com/