A Perceptually Optimized and Self-Calibrated Tone Mapping Operator

With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid. We then use two lightweight deep neural networks (DNNs), taking the normalized representation as input and estimating the Laplacian pyramid of the corresponding LDR image. We optimize the tone mapping network by minimizing the normalized Laplacian pyramid distance (NLPD), a perceptual metric aligning with human judgments of tone-mapped image quality. In Stage two, the input HDR image is self-calibrated to compute the final LDR image. We feed the same HDR image but rescaled with different maximum luminances to the learned tone mapping network, and generate a pseudo-multi-exposure image stack with different detail visibility and color saturation. We then train another lightweight DNN to fuse the LDR image stack into a desired LDR image by maximizing a variant of the structural similarity index for multi-exposure image fusion (MEF-SSIM), which has been proven perceptually relevant to fused image quality. The proposed self-calibration mechanism through MEF enables our TMO to accept uncalibrated HDR images, while being physiology-driven. Extensive experiments show that our method produces images with consistently better visual quality. Additionally, since our method builds upon three lightweight DNNs, it is among the fastest local TMOs.

翻译：随着高动态范围（HDR）摄影的普及和易用性提升，用于动态范围压缩的色调映射算子（TMO）具有实际应用需求。本文开发了一种基于两阶段神经网络的TMO，该算子具有自校准和感知优化特性。第一阶段受人类视觉系统早期阶段生理特性启发，我们首先将HDR图像分解为归一化拉普拉斯金字塔。随后使用两个轻量级深度神经网络（DNN），以归一化表示为输入，估计对应LDR图像的拉普拉斯金字塔。通过最小化归一化拉普拉斯金字塔距离（NLPD）——一种与人类对色调映射图像质量判断一致的感知度量——来优化色调映射网络。第二阶段对输入HDR图像进行自校准以生成最终LDR图像。我们将同一HDR图像以不同最大亮度重新缩放后输入已学习的色调映射网络，生成具有不同细节可见度和色彩饱和度的伪多曝光图像堆栈。然后训练另一个轻量级DNN，通过最大化多曝光图像融合的结构相似性指数变体（MEF-SSIM）来融合LDR图像堆栈，该指标已被证明与融合图像质量具有感知相关性。所提出的基于MEF的自校准机制使我们的TMO能够接受未标定HDR图像，同时保持生理驱动特性。大量实验表明，我们的方法能够持续生成视觉质量更优的图像。此外，由于该方法基于三个轻量级DNN构建，因此它属于最快的局部TMO之一。