CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality

High-quality environment lighting is the foundation of creating immersive user experiences in mobile augmented reality (AR) applications. However, achieving visually coherent environment lighting estimation for Mobile AR is challenging due to several key limitations associated with AR device sensing capabilities, including limitations in device camera FoV and pixel dynamic ranges. Recent advancements in generative AI, which can generate high-quality images from different types of prompts, including texts and images, present a potential solution for high-quality lighting estimation. Still, to effectively use generative image diffusion models, we must address their key limitations of generation hallucination and slow inference process. To do so, in this work, we design and implement a generative lighting estimation system called CleAR that can produce high-quality and diverse environment maps in the format of 360$^\circ$ images. Specifically, we design a two-step generation pipeline guided by AR environment context data to ensure the results follow physical environment visual context and color appearances. To improve the estimation robustness under different lighting conditions, we design a real-time refinement component to adjust lighting estimation results on AR devices. To train and test our generative models, we curate a large-scale environment lighting estimation dataset with diverse lighting conditions. Through quantitative evaluation and user study, we show that CleAR outperforms state-of-the-art lighting estimation methods on both estimation accuracy and robustness. Moreover, CleAR supports real-time refinement of lighting estimation results, ensuring robust and timely environment lighting updates for AR applications. Our end-to-end generative estimation takes as fast as 3.2 seconds, outperforming state-of-the-art methods by 110x.

翻译：高质量的环境光照是移动增强现实（AR）应用中创造沉浸式用户体验的基础。然而，由于AR设备感知能力的若干关键限制，包括设备相机视场和像素动态范围的局限，为移动AR实现视觉连贯的环境光照估计具有挑战性。生成式人工智能的最新进展，能够根据不同类型的提示（包括文本和图像）生成高质量图像，为高质量光照估计提供了潜在的解决方案。尽管如此，为了有效利用生成式图像扩散模型，我们必须解决其生成幻觉和推理过程缓慢的关键限制。为此，在本工作中，我们设计并实现了一个名为CleAR的生成式光照估计系统，该系统能够以360$^\circ$图像格式生成高质量且多样化的环境贴图。具体而言，我们设计了一个由AR环境上下文数据引导的两步生成流程，以确保结果遵循物理环境的视觉上下文和色彩外观。为了提高不同光照条件下估计的鲁棒性，我们设计了一个实时优化组件，用于在AR设备上调整光照估计结果。为了训练和测试我们的生成模型，我们策划了一个包含多种光照条件的大规模环境光照估计数据集。通过定量评估和用户研究，我们表明CleAR在估计准确性和鲁棒性方面均优于最先进的光照估计方法。此外，CleAR支持光照估计结果的实时优化，确保为AR应用提供鲁棒且及时的环境光照更新。我们的端到端生成式估计最快仅需3.2秒，比最先进方法快110倍。