Full-reference image quality metrics (FR-IQMs) aim to measure the visual differences between a pair of reference and distorted images, with the goal of accurately predicting human judgments. However, existing FR-IQMs, including traditional ones like PSNR and SSIM and even perceptual ones such as HDR-VDP, LPIPS, and DISTS, still fall short in capturing the complexities and nuances of human perception. In this work, rather than devising a novel IQM model, we seek to improve upon the perceptual quality of existing FR-IQM methods. We achieve this by considering visual masking, an important characteristic of the human visual system that changes its sensitivity to distortions as a function of local image content. Specifically, for a given FR-IQM metric, we propose to predict a visual masking model that modulates reference and distorted images in a way that penalizes the visual errors based on their visibility. Since the ground truth visual masks are difficult to obtain, we demonstrate how they can be derived in a self-supervised manner solely based on mean opinion scores (MOS) collected from an FR-IQM dataset. Our approach results in enhanced FR-IQM metrics that are more in line with human prediction both visually and quantitatively.
翻译:全参考图像质量指标(FR-IQMs)旨在测量参考图像与失真图像之间的视觉差异,以准确预测人类主观判断为目标。然而,现有的FR-IQMs——包括PSNR、SSIM等传统指标,甚至HDR-VDP、LPIPS和DISTS等感知指标——仍难以捕捉人类感知的复杂性与细微差异。本研究并非设计新的IQM模型,而是通过考虑视觉掩蔽这一人类视觉系统的重要特性(该特性使视觉系统对失真的敏感度随局部图像内容变化),改进现有FR-IQM方法的感知质量。具体而言,针对给定FR-IQM指标,我们提出预测一种视觉掩蔽模型,该模型通过调整参考图像与失真图像,依据视觉误差的可见性对其进行惩罚。由于真实视觉掩蔽难以获取,我们展示了如何仅基于FR-IQM数据集的平均主观意见分(MOS),以自监督方式推导出这些掩蔽。我们的方法增强了FR-IQM指标,使其在视觉与定量评估上更符合人类预测。