Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), catering to the growing demand for detailed visual content across a $ 180^{\circ}\times360^{\circ}$ viewport. Existing ODISR methods are limited by simplified degradation assumptions (e.g., bicubic downsampling), failing to model and exploit the real-world degradation information. Recent latent-based diffusion approaches using condition guidance suffer from slow inference due to their hundreds of updating steps and frequent use of VAE. To tackle these challenges, we propose \textbf{RealOSR}, a diffusion-based framework tailored for real-world ODISR, featuring efficient latent-based condition guidance within a one-step denoising paradigm. Central to efficient latent-based condition guidance is the proposed \textbf{Latent Gradient Alignment Routing (LaGAR)}, a lightweight module that enables effective pixel-latent space interactions and simulates gradient descent directly in the latent space, thereby leveraging the semantic richness and multi-scale features captured by the denoising UNet. Compared to the recent diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200$\times$} inference acceleration. Our code and models will be released upon acceptance.
翻译:全景图像超分辨率(ODISR)旨在将低分辨率全景图像上采样至高分辨率,以满足对180°×360°视口内精细视觉内容日益增长的需求。现有ODISR方法受限于简化的退化假设(如双三次下采样),无法有效建模和利用真实世界的退化信息。近期基于潜在空间的扩散方法虽采用条件引导,但因需数百次更新步骤和频繁使用VAE而导致推理速度缓慢。为应对这些挑战,我们提出了**RealOSR**——一个专为真实世界ODISR设计的基于扩散的框架,其核心是在一步去噪范式中实现高效的潜在空间条件引导。高效潜在条件引导的关键在于我们提出的**潜在梯度对齐路由(LaGAR)**,该轻量级模块能够实现有效的像素-潜在空间交互,并直接在潜在空间中模拟梯度下降,从而充分利用去噪UNet捕获的语义丰富性和多尺度特征。与近期基于扩散的ODISR方法OmniSSR相比,RealOSR在视觉质量上取得显著提升,并实现了超过**200倍**的推理加速。我们的代码与模型将在论文录用后开源。