MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation

In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range multi-view images with ground-truth geometry, material, and spatially-varying lighting. To improve the quality of scene-level inverse rendering, a novel framework called Multi-view Attention Inverse Rendering (MAIR) was recently introduced. MAIR performs scene-level multi-view inverse rendering by expanding the OpenRooms dataset, designing efficient pipelines to handle multi-view images, and splitting spatially-varying lighting. Although MAIR showed impressive results, its lighting representation is fixed to spherical Gaussians, which limits its ability to render images realistically. Consequently, MAIR cannot be directly used in applications such as material editing. Moreover, its multi-view aggregation networks have difficulties extracting rich features because they only focus on the mean and variance between multi-view features. In this paper, we propose its extended version, called MAIR++. MAIR++ addresses the aforementioned limitations by introducing an implicit lighting representation that accurately captures the lighting conditions of an image while facilitating realistic rendering. Furthermore, we design a directional attention-based multi-view aggregation network to infer more intricate relationships between views. Experimental results show that MAIR++ not only achieves better performance than MAIR and single-view-based methods, but also displays robust performance on unseen real-world scenes.

翻译：本文提出一种场景级逆向渲染框架，利用多视角图像将场景分解为几何、SVBRDF和三维空间变化光照。尽管多视角图像已广泛应用于物体级逆向渲染，但由于缺乏包含高动态范围多视角图像及真实几何、材质与空间变化光照的数据集，场景级逆向渲染的研究主要依赖于单视角图像。为提升场景级逆向渲染的质量，近期提出了名为多视角注意力逆向渲染（MAIR）的新型框架。MAIR通过扩展OpenRooms数据集、设计处理多视角图像的高效流水线以及拆分空间变化光照，实现了场景级多视角逆向渲染。虽然MAIR取得了显著成果，但其光照表示固定为球面高斯函数，限制了图像的真实渲染能力，导致无法直接应用于材质编辑等任务。此外，其多视角聚合网络仅关注多视角特征间的均值与方差，难以提取丰富特征。本文提出其扩展版本MAIR++，通过引入隐式光照表示来精准捕捉图像光照条件并实现真实渲染，从而解决上述局限。同时，我们设计了基于方向注意力的多视角聚合网络以推断视角间更复杂的关系。实验结果表明，MAIR++不仅优于MAIR及基于单视角的方法，在未见过的真实场景中也展现出鲁棒性能。