No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.
翻译:无参考图像质量评估(NR-IQA)旨在无需访问原始质量参考图像的情况下估计感知质量。学习NR-IQI模型面临一个根本瓶颈:需要大量昂贵的人类感知标注。我们提出SHAMISA,一种非对比自监督框架,通过利用显式结构化的关系监督从无标注的失真图像中学习。与先前施加刚性二元相似性约束的方法不同,SHAMISA引入了隐式结构关联,其定义为从合成元数据和内在特征结构推断出的、可控制的软关系,同时具有失真感知和内容敏感性。一个关键创新是我们的组合失真引擎,它从连续参数空间生成不可数族的退化,并按组组织使得每次仅一个失真因素发生变化。这使得在训练期间能够对表征相似性进行细粒度控制:具有共享失真模式的图像在嵌入空间中被拉近,而严重程度的变化则产生结构化、可预测的偏移。我们通过双源关系图整合这些洞见,该图编码了已知的退化轮廓和涌现的结构亲和性,以在整个训练过程中指导学习过程。一个卷积编码器在此监督下训练,随后冻结用于推理,质量预测通过线性回归器对其特征执行。在合成、真实和跨数据集NR-IQI基准上的大量实验表明,SHAMISA实现了强大的整体性能,并具有改进的跨数据集泛化能力和鲁棒性,且全程无需人类质量标注或对比损失。