We show how to build a model that allows realistic, free-viewpoint renderings of a scene under novel lighting conditions from video. Our method -- UrbanIR: Urban Scene Inverse Rendering -- computes an inverse graphics representation from the video. UrbanIR jointly infers shape, albedo, visibility, and sun and sky illumination from a single video of unbounded outdoor scenes with unknown lighting. UrbanIR uses videos from cameras mounted on cars (in contrast to many views of the same points in typical NeRF-style estimation). As a result, standard methods produce poor geometry estimates (for example, roofs), and there are numerous ''floaters''. Errors in inverse graphics inference can result in strong rendering artifacts. UrbanIR uses novel losses to control these and other sources of error. UrbanIR uses a novel loss to make very good estimates of shadow volumes in the original scene. The resulting representations facilitate controllable editing, delivering photorealistic free-viewpoint renderings of relit scenes and inserted objects. Qualitative evaluation demonstrates strong improvements over the state-of-the-art.
翻译:我们展示了如何构建一个模型,能够从视频中生成在新型光照条件下具有真实感的自由视点场景渲染。我们的方法——UrbanIR:城市场景逆渲染——从视频中计算出一个逆图形学表示。UrbanIR联合推断未知光照下无界室外场景视频中的形状、反照率、可见性以及太阳和天空光照。UrbanIR使用安装在汽车上的摄像头拍摄的视频(与典型NeRF风格估计中对相同点的多视角采集形成对比)。因此,标准方法会产生较差的几何估计(例如屋顶),并存在大量“浮动伪影”。逆图形学推断中的误差会导致强烈的渲染伪影。UrbanIR采用新型损失函数来控制这些及其他误差来源,并利用一种新颖的损失函数对原始场景中的阴影体进行非常精确的估计。所得表示支持可控编辑,能够生成重新照明场景和插入物体的逼真自由视点渲染。定性评估表明,该方法相较于现有技术具有显著提升。