Multi-Sensor Diffusion-Driven Optical Image Translation for Large-Scale Applications

from arxiv, This is the accepted version of the manuscript published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS). Please access the final version at IEEEXplore (Open Access). DOI 10.1109/JSTARS.2024.3506032. This technology is protected by a patent filed on 23 december 2023 at Office Luxembourgeois de la propri\'et\'e intellectuelle (LU505861)

Comparing images captured by disparate sensors is a common challenge in remote sensing. This requires image translation -- converting imagery from one sensor domain to another while preserving the original content. Denoising Diffusion Implicit Models (DDIM) are potential state-of-the-art solutions for such domain translation due to their proven superiority in multiple image-to-image translation tasks in computer vision. However, these models struggle with reproducing radiometric features of large-scale multi-patch imagery, resulting in inconsistencies across the full image. This renders downstream tasks like Heterogeneous Change Detection impractical. To overcome these limitations, we propose a method that leverages denoising diffusion for effective multi-sensor optical image translation over large areas. Our approach super-resolves large-scale low spatial resolution images into high-resolution equivalents from disparate optical sensors, ensuring uniformity across hundreds of patches. Our contributions lie in new forward and reverse diffusion processes that address the challenges of large-scale image translation. Extensive experiments using paired Sentinel-II (10m) and Planet Dove (3m) images demonstrate that our approach provides precise domain adaptation, preserving image content while improving radiometric accuracy and feature representation. A thorough image quality assessment and comparisons with the standard DDIM framework and five other leading methods are presented. We reach a mean Learned Perceptual Image Patch Similarity (mLPIPS) of 0.1884 and a Fr\'echet Inception Distance (FID) of 45.64, expressively outperforming all compared methods, including DDIM, ShuffleMixer, and SwinIR. The usefulness of our approach is further demonstrated in two Heterogeneous Change Detection tasks.

翻译：比较不同传感器捕获的图像是遥感领域的一个常见挑战。这需要图像翻译——将图像从一个传感器域转换到另一个传感器域，同时保留原始内容。去噪扩散隐式模型（DDIM）因其在计算机视觉中多个图像到图像翻译任务上已证实的优越性，是此类域翻译的潜在最先进解决方案。然而，这些模型在再现大规模多图块图像的辐射特征方面存在困难，导致整个图像不一致。这使得像异质变化检测这样的下游任务变得不切实际。为了克服这些限制，我们提出了一种利用去噪扩散进行大范围有效多传感器光学图像翻译的方法。我们的方法将大规模低空间分辨率图像超分辨为来自不同光学传感器的高分辨率等效图像，确保数百个图块之间的一致性。我们的贡献在于提出了新的前向和反向扩散过程，以应对大规模图像翻译的挑战。使用配对的 Sentinel-II（10米）和 Planet Dove（3米）图像进行的广泛实验表明，我们的方法提供了精确的域适应，在保持图像内容的同时提高了辐射精度和特征表示。我们进行了全面的图像质量评估，并与标准 DDIM 框架以及其他五种领先方法进行了比较。我们达到了 0.1884 的平均学习感知图像块相似度（mLPIPS）和 45.64 的 Fréchet 起始距离（FID），显著优于所有比较方法，包括 DDIM、ShuffleMixer 和 SwinIR。我们的方法的实用性在两个异质变化检测任务中得到了进一步证明。