A Perceptually Optimized and Self-Calibrated Tone Mapping Operator

With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid. We then use two lightweight deep neural networks (DNNs), taking the normalized representation as input and estimating the Laplacian pyramid of the corresponding LDR image. We optimize the tone mapping network by minimizing the normalized Laplacian pyramid distance (NLPD), a perceptual metric aligning with human judgments of tone-mapped image quality. In Stage two, the input HDR image is self-calibrated to compute the final LDR image. We feed the same HDR image but rescaled with different maximum luminances to the learned tone mapping network, and generate a pseudo-multi-exposure image stack with different detail visibility and color saturation. We then train another lightweight DNN to fuse the LDR image stack into a desired LDR image by maximizing a variant of the structural similarity index for multi-exposure image fusion (MEF-SSIM), which has been proven perceptually relevant to fused image quality. The proposed self-calibration mechanism through MEF enables our TMO to accept uncalibrated HDR images, while being physiology-driven. Extensive experiments show that our method produces images with consistently better visual quality. Additionally, since our method builds upon three lightweight DNNs, it is among the fastest local TMOs.

翻译：随着高动态范围（HDR）摄影的普及与易用性提升，用于动态范围压缩的色调映射算子（TMOs）在实际中需求迫切。本文提出一种基于两阶段神经网络的自校准感知优化TMO。第一阶段受人类视觉系统早期生理机制启发，首先将HDR图像分解为归一化拉普拉斯金字塔，随后采用两个轻量级深度神经网络（DNNs），以归一化表示作为输入，估计对应低动态范围（LDR）图像的拉普拉斯金字塔。通过最小化归一化拉普拉斯金字塔距离（NLPD）这一与色调映射图像质量人眼判断相一致的感知度量，对色调映射网络进行优化。第二阶段对输入HDR图像进行自校准以计算最终LDR图像：将同一HDR图像以不同最大亮度重新缩放后输入已学习的色调映射网络，生成具有不同细节可见度与色彩饱和度的伪多曝光图像堆栈；随后训练另一轻量级DNN，通过最大化经多曝光图像融合的结构相似性指数变体（MEF-SSIM）融合该LDR图像堆栈为目标LDR图像，该度量已被证明与融合图像质量感知相关。所提出的基于MEF的自校准机制使本文TMO能够接受非校准HDR图像，同时保持生理驱动特性。大量实验表明，该方法生成的图像具有持续更优的视觉质量。此外，由于该方法基于三个轻量级DNN构建，它属于速度最快的局部TMO之一。