We show how to build a model that allows realistic, free-viewpoint renderings of a scene under novel lighting conditions from video. Our method -- UrbanIR: Urban Scene Inverse Rendering -- computes an inverse graphics representation from the video. UrbanIR jointly infers shape, albedo, visibility, and sun and sky illumination from a single video of unbounded outdoor scenes with unknown lighting. UrbanIR uses videos from cameras mounted on cars (in contrast to many views of the same points in typical NeRF-style estimation). As a result, standard methods produce poor geometry estimates (for example, roofs), and there are numerous ''floaters''. Errors in inverse graphics inference can result in strong rendering artifacts. UrbanIR uses novel losses to control these and other sources of error. UrbanIR uses a novel loss to make very good estimates of shadow volumes in the original scene. The resulting representations facilitate controllable editing, delivering photorealistic free-viewpoint renderings of relit scenes and inserted objects. Qualitative evaluation demonstrates strong improvements over the state-of-the-art.
翻译:我们展示了如何构建一个模型,使其能够根据视频在新光照条件下实现场景的逼真自由视点渲染。我们的方法——UrbanIR:城市场景逆渲染——从视频中计算逆图形表示。UrbanIR通过单一未知光照的无界室外场景视频,联合推断形状、反照率、可见性以及太阳和天空光照。UrbanIR使用安装在汽车上的摄像头拍摄的视频(与典型NeRF风格估计中对相同点的多视角观测相反)。因此,标准方法会产生较差的几何估计(例如屋顶),并存在大量“浮动伪影”。逆图形推理中的错误可能导致严重的渲染伪影。UrbanIR采用新型损失函数来控制这些及其他误差来源。UrbanIR使用一种新颖的损失函数来对原始场景中的阴影体积进行非常准确的估计。由此产生的表示便于可控编辑,能够实现重光照场景和插入物体的逼真自由视点渲染。定性评估表明,该方法相较于现有技术有显著改进。