We present a simple yet effective technique to estimate lighting in a single input image. Current techniques rely heavily on HDR panorama datasets to train neural networks to regress an input with limited field-of-view to a full environment map. However, these approaches often struggle with real-world, uncontrolled settings due to the limited diversity and size of their datasets. To address this problem, we leverage diffusion models trained on billions of standard images to render a chrome ball into the input image. Despite its simplicity, this task remains challenging: the diffusion models often insert incorrect or inconsistent objects and cannot readily generate images in HDR format. Our research uncovers a surprising relationship between the appearance of chrome balls and the initial diffusion noise map, which we utilize to consistently generate high-quality chrome balls. We further fine-tune an LDR difusion model (Stable Diffusion XL) with LoRA, enabling it to perform exposure bracketing for HDR light estimation. Our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios.
翻译:我们提出了一种简单而有效的技术,用于从单张输入图像中估计光照。现有技术严重依赖HDR全景数据集来训练神经网络,将有限视场的输入回归为完整的环境贴图。然而,由于数据集的多样性和规模有限,这些方法在处理真实世界、非受控场景时常常表现不佳。为解决这一问题,我们利用在数十亿张标准图像上训练的扩散模型,将铬球渲染到输入图像中。尽管方法看似简单,但这项任务仍具挑战性:扩散模型常会插入错误或不一致的物体,且无法直接生成HDR格式的图像。我们的研究发现,铬球外观与初始扩散噪声图之间存在惊人关系,并利用这一关系稳定生成高质量铬球。我们进一步使用LoRA对LDR扩散模型(Stable Diffusion XL)进行微调,使其能够执行曝光包围以实现HDR光照估计。我们的方法在多种场景下均能产生令人信服的光照估计,并展现出对野外场景的卓越泛化能力。