Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning

Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone mapping algorithm is required to compress the dynamic range of HDR images (videos). Although image tone mapping has been widely explored, video tone mapping is lagging behind, especially for the deep-learning-based methods, due to the lack of HDR-LDR video pairs. In this work, we propose a unified framework (IVTMNet) for unsupervised image and video tone mapping. To improve unsupervised training, we propose domain and instance based contrastive learning loss. Instead of using a universal feature extractor, such as VGG to extract the features for similarity measurement, we propose a novel latent code, which is an aggregation of the brightness and contrast of extracted features, to measure the similarity of different pairs. We totally construct two negative pairs and three positive pairs to constrain the latent codes of tone mapped results. For the network structure, we propose a spatial-feature-enhanced (SFE) module to enable information exchange and transformation of nonlocal regions. For video tone mapping, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation and improve the temporal consistency of video tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset to facilitate the unsupervised training process for video tone mapping. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone mapping methods. Our code and dataset are available at https://github.com/cao-cong/UnCLTMO.

翻译：捕捉高动态范围（HDR）图像（视频）能同时揭示明暗区域的细节，因此极具吸引力。由于主流显示屏仅支持低动态范围（LDR）内容，需要色调映射算法来压缩HDR图像（视频）的动态范围。尽管图像色调映射已被广泛探索，但视频色调映射仍相对滞后，尤其缺乏基于深度学习的方法，这主要归因于HDR-LDR视频对数据的缺失。本文提出了一种用于无监督图像与视频色调映射的统一框架（IVTMNet）。为改进无监督训练，我们提出了基于域和实例的对比学习损失。不同于使用通用特征提取器（如VGG）提取特征进行相似性度量，我们提出了一种新颖的潜在编码——该编码整合了提取特征的亮度与对比度信息——用于度量不同图像对之间的相似性。我们共构建了两个负样本对和三个正样本对，以约束色调映射结果的潜在编码。在网络结构方面，我们提出了空间特征增强（SFE）模块，实现非局部区域的信息交换与变换。针对视频色调映射，我们提出了时间特征替换（TFR）模块，有效利用时间相关性并提升视频色调映射结果的时间一致性。我们构建了一个大规模非配对HDR-LDR视频数据集，以促进视频色调映射的无监督训练过程。实验结果表明，我们的方法优于当前最先进的图像与视频色调映射方法。代码与数据集已公开于 https://github.com/cao-cong/UnCLTMO。