Invariance of deep image quality metrics to affine transformations

Deep architectures are the current state-of-the-art in predicting subjective image quality. Usually, these models are evaluated according to their ability to correlate with human opinion in databases with a range of distortions that may appear in digital media. However, these oversee affine transformations which may represent better the changes in the images actually happening in natural conditions. Humans can be particularly invariant to these natural transformations, as opposed to the digital ones. In this work, we evaluate state-of-the-art deep image quality metrics by assessing their invariance to affine transformations, specifically: rotation, translation, scaling, and changes in spectral illumination. Here invariance of a metric refers to the fact that certain distances should be neglected (considered to be zero) if their values are below a threshold. This is what we call invisibility threshold of a metric. We propose a methodology to assign such invisibility thresholds for any perceptual metric. This methodology involves transformations to a distance space common to any metric, and psychophysical measurements of thresholds in this common space. By doing so, we allow the analyzed metrics to be directly comparable with actual human thresholds. We find that none of the state-of-the-art metrics shows human-like results under this strong test based on invisibility thresholds. This means that tuning the models exclusively to predict the visibility of generic distortions may disregard other properties of human vision as for instance invariances or invisibility thresholds.

翻译：深度架构是目前预测主观图像质量的最先进方法。通常，这些模型通过在包含数字媒体中可能出现的一系列失真的数据库中进行评估，以衡量其与人类主观评价的相关性。然而，这些评估往往忽略了仿射变换，而仿射变换可能更好地反映了自然条件下图像实际发生的变化。与数字失真不同，人类对这些自然变换可能表现出特别的不变性。在本研究中，我们通过评估深度图像质量度量对仿射变换的不变性来检验当前最先进的模型，具体包括：旋转、平移、缩放和光谱照度变化。此处度量的不变性指当某些距离值低于阈值时应当被忽略（即视为零），我们称之为度量的不可见阈值。我们提出了一种为任意感知度量分配此类不可见阈值的方法论。该方法涉及将度量转换到公共距离空间，并在该空间中进行阈值的心理物理测量。通过这种方式，我们使得分析的度量能够直接与人类实际阈值进行比较。研究发现，在此基于不可见阈值的严格测试下，所有当前最先进的度量均未表现出类人的结果。这表明仅通过调整模型来预测通用失真的可见性，可能会忽略人类视觉的其他特性，例如不变性或不可见阈值。