Recent illumination estimation methods have focused on enhancing the resolution and improving the quality and diversity of the generated textures. However, few have explored tailoring the neural network architecture to the Equirectangular Panorama (ERP) format utilised in image-based lighting. Consequently, high dynamic range images (HDRI) results usually exhibit a seam at the side borders and textures or objects that are warped at the poles. To address this shortcoming we propose a novel architecture, 360U-Former, based on a U-Net style Vision-Transformer which leverages the work of PanoSWIN, an adapted shifted window attention tailored to the ERP format. To the best of our knowledge, this is the first purely Vision-Transformer model used in the field of illumination estimation. We train 360U-Former as a GAN to generate HDRI from a limited field of view low dynamic range image (LDRI). We evaluate our method using current illumination estimation evaluation protocols and datasets, demonstrating that our approach outperforms existing and state-of-the-art methods without the artefacts typically associated with the use of the ERP format.
翻译:近期,光照估计方法主要聚焦于提升生成纹理的分辨率、质量与多样性。然而,鲜有研究针对基于图像的光照技术中所使用的等距柱状全景图格式来定制神经网络架构。因此,生成的高动态范围图像结果通常在侧边界处存在接缝,且在极地区域出现纹理或物体的扭曲形变。为克服这一不足,我们提出一种新颖的架构——360U-Former,该架构基于U-Net风格的视觉Transformer,并借鉴了专为ERP格式定制的自适应移位窗口注意力机制PanoSWIN的工作。据我们所知,这是首个在光照估计领域使用的纯视觉Transformer模型。我们将360U-Former作为生成对抗网络进行训练,以从有限视场的低动态范围图像生成HDRI。我们使用当前的光照估计评估协议和数据集对所提方法进行评估,结果表明:我们的方法在超越现有及最先进方法的同时,避免了通常因使用ERP格式而产生的伪影。