We propose a physically-motivated deep learning framework to solve a general version of the challenging indoor lighting estimation problem. Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position. Particularly, when the input is an LDR video sequence, our framework not only progressively refines the lighting prediction as it sees more regions, but also preserves temporal consistency by keeping the refinement smooth. Our framework reconstructs a spherical Gaussian lighting volume (SGLV) through a tailored 3D encoder-decoder, which enables spatially consistent lighting prediction through volume ray tracing, a hybrid blending network for detailed environment maps, an in-network Monte-Carlo rendering layer to enhance photorealism for virtual object insertion, and recurrent neural networks (RNN) to achieve temporally consistent lighting prediction with a video sequence as the input. For training, we significantly enhance the OpenRooms public dataset of photorealistic synthetic indoor scenes with around 360K HDR environment maps of much higher resolution and 38K video sequences, rendered with GPU-based path tracing. Experiments show that our framework achieves lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods, leading to photorealistic AR applications such as object insertion.
翻译:我们提出了一种基于物理动机的深度学习框架,用于解决具有挑战性的通用室内光照估计问题。给定单张带深度图的低动态范围(LDR)图像,该方法能预测任意图像位置的空间一致光照。特别地,当输入为LDR视频序列时,该框架不仅能随着对更多区域的观测逐步优化光照预测,还能通过保持优化的平滑性来维持时间一致性。该框架通过定制的3D编码器-解码器重建球形高斯光照体(SGLV),实现基于体光线追踪的空间一致光照预测;采用混合融合网络生成细节环境贴图;结合网络内蒙特卡洛渲染层增强虚拟物体插入的光真实感;并利用循环神经网络(RNN)实现视频序列输入下的时间一致光照预测。在训练阶段,我们显著增强了公开数据集OpenRooms的光真实感合成室内场景,包含约36万张更高分辨率的HDR环境贴图及3.8万段视频序列,均基于GPU路径追踪渲染。实验表明,与现有最先进的单图像或基于视频的方法相比,我们的框架能实现更高质量的光照预测,可支持如物体插入等光真实感增强现实应用。