We present a simple yet effective technique to estimate lighting in a single input image. Current techniques rely heavily on HDR panorama datasets to train neural networks to regress an input with limited field-of-view to a full environment map. However, these approaches often struggle with real-world, uncontrolled settings due to the limited diversity and size of their datasets. To address this problem, we leverage diffusion models trained on billions of standard images to render a chrome ball into the input image. Despite its simplicity, this task remains challenging: the diffusion models often insert incorrect or inconsistent objects and cannot readily generate images in HDR format. Our research uncovers a surprising relationship between the appearance of chrome balls and the initial diffusion noise map, which we utilize to consistently generate high-quality chrome balls. We further fine-tune an LDR difusion model (Stable Diffusion XL) with LoRA, enabling it to perform exposure bracketing for HDR light estimation. Our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios.
翻译:我们提出一种简单而有效的技术,用于从单张输入图像中估计照明。当前技术严重依赖高动态范围(HDR)全景数据集来训练神经网络,将有限视场的输入回归为完整的环境贴图。然而,这些方法常因数据集多样性与规模有限而在真实世界不受控场景中表现不佳。为解决此问题,我们利用基于数十亿张普通图像训练的扩散模型,在输入图像中渲染一个铬球。尽管方法看似简单,但该任务仍具挑战性:扩散模型常插入错误或不一致的物体,且难以直接生成HDR格式图像。我们的研究发现铬球外观与初始扩散噪声图之间存在出人意料的关联,并利用这一关联稳定生成高质量铬球。进一步地,我们通过LoRA微调低动态范围(LDR)扩散模型(Stable Diffusion XL),使其具备曝光包围能力以进行HDR光照估计。该方法在多样场景下均能生成令人信服的光照估计结果,并展现出对野外场景的优异泛化能力。