Image quality assessment (IQA) plays a critical role in selecting high-quality images and guiding compression and enhancement methods in a series of applications. The blind IQA, which assesses the quality of in-the-wild images containing complex authentic distortions without reference images, poses greater challenges. Existing methods are limited to modeling a uniform distribution with local patches and are bothered by the gap between low and high-level visions (caused by widely adopted pre-trained classification networks). In this paper, we propose a novel IQA method called diffusion priors-based IQA (DP-IQA), which leverages the prior knowledge from the pre-trained diffusion model with its excellent powers to bridge semantic gaps in the perception of the visual quality of images. Specifically, we use pre-trained stable diffusion as the backbone, extract multi-level features from the denoising U-Net during the upsampling process at a specified timestep, and decode them to estimate the image quality score. The text and image adapters are adopted to mitigate the domain gap for downstream tasks and correct the information loss caused by the variational autoencoder bottleneck. Finally, we distill the knowledge in the above model into a CNN-based student model, significantly reducing the parameter to enhance applicability, with the student model performing similarly or even better than the teacher model surprisingly. Experimental results demonstrate that our DP-IQA achieves state-of-the-art results on various in-the-wild datasets with better generalization capability, which shows the superiority of our method in global modeling and utilizing the hierarchical feature clues of diffusion for evaluating image quality.
翻译:图像质量评估(IQA)在一系列应用中对于筛选高质量图像以及指导压缩和增强方法起着至关重要的作用。盲图像质量评估旨在评估包含复杂真实失真、且无参考图像的真实场景图像的质量,这带来了更大的挑战。现有方法局限于对局部图像块进行均匀分布建模,并受到由广泛采用的预训练分类网络所导致的低层与高层视觉之间差距的困扰。本文提出了一种新颖的IQA方法,称为基于扩散先验的IQA(DP-IQA),该方法利用预训练扩散模型的先验知识及其卓越能力,以弥合图像视觉质量感知中的语义鸿沟。具体而言,我们使用预训练的稳定扩散模型作为主干网络,在指定时间步长的上采样过程中从去噪U-Net中提取多层次特征,并通过解码这些特征来估计图像质量分数。我们采用了文本和图像适配器来缓解下游任务的领域差距,并纠正由变分自编码器瓶颈造成的信息损失。最后,我们将上述模型中的知识蒸馏到一个基于CNN的学生模型中,显著减少了参数量以增强适用性,而该学生模型的表现出人意料地与教师模型相当甚至更优。实验结果表明,我们的DP-IQA在多个真实场景数据集上取得了最先进的结果,并具有更好的泛化能力,这证明了我们的方法在全局建模以及利用扩散的层次化特征线索评估图像质量方面的优越性。