Human visual perception naturally evaluates image quality across multiple scales, a hierarchical process that existing blind image quality assessment (BIQA) algorithms struggle to replicate effectively. This limitation stems from a fundamental misunderstanding: current multi-scale approaches fail to recognize that quality perception varies dramatically between scales -- what appears degraded when viewed closely may look acceptable from a distance. This inconsistency not only creates misleading ``visual illusions'' during feature fusion but also introduces substantial redundant information that dilutes quality-critical features and leads to imprecise assessments. Our CSFIQA framework advances multi-scale BIQA via two key innovations: (1) a selective focus attention mechanism that mimics human visual attention by filtering out redundant cross-scale information that would otherwise mask subtle quality indicators, and (2) a scale contrastive learning strategy that explicitly learns to distinguish quality variations both across and within scales. By incorporating an adaptive noise sample matching mechanism, CSFIQA effectively identifies perceptual quality discrepancies in the same content viewed at different scales. Experiments demonstrate substantial improvements over state-of-the-art methods across seven datasets, achieving up to 8.8% SRCC improvement on challenging real-world distortions, confirming CSFIQA's superior alignment with human quality perception.
翻译:人类视觉感知天然地在多个尺度上评估图像质量,这一分层过程是现有盲图像质量评估算法难以有效复现的。这一局限源于一个根本性的误解:当前的多尺度方法未能认识到质量感知在不同尺度间存在显著差异——近距离观看时显得退化的图像,在远距离观看时可能看起来可以接受。这种不一致性不仅在特征融合过程中产生误导性的"视觉错觉",还引入了大量冗余信息,这些信息会稀释对质量至关重要的特征并导致评估不精确。我们的CSFIQA框架通过两项关键创新推进了多尺度盲图像质量评估:(1) 一种选择性聚焦注意力机制,通过过滤掉冗余的跨尺度信息来模拟人类视觉注意力,这些信息原本会掩盖细微的质量指标;(2) 一种尺度对比学习策略,明确学习区分尺度间和尺度内的质量变化。通过结合自适应噪声样本匹配机制,CSFIQA能有效识别同一内容在不同尺度下观看时的感知质量差异。实验表明,在七个数据集上,本方法相较于现有最先进方法均有显著提升,在具有挑战性的真实世界失真类型上实现了高达8.8%的斯皮尔曼等级相关系数提升,证实了CSFIQA与人类质量感知具有更优的一致性。