Image quality assessment (IQA) plays a critical role in selecting high-quality images and guiding compression and enhancement methods in a series of applications. The blind IQA, which assesses the quality of in-the-wild images containing complex authentic distortions without reference images, poses greater challenges. Existing methods are limited to modeling a uniform distribution with local patches and are bothered by the gap between low and high-level visions (caused by widely adopted pre-trained classification networks). In this paper, we propose a novel IQA method called diffusion priors-based IQA (DP-IQA), which leverages the prior knowledge from the pre-trained diffusion model with its excellent powers to bridge semantic gaps in the perception of the visual quality of images. Specifically, we use pre-trained stable diffusion as the backbone, extract multi-level features from the denoising U-Net during the upsampling process at a specified timestep, and decode them to estimate the image quality score. The text and image adapters are adopted to mitigate the domain gap for downstream tasks and correct the information loss caused by the variational autoencoder bottleneck. Finally, we distill the knowledge in the above model into a CNN-based student model, significantly reducing the parameter to enhance applicability, with the student model performing similarly or even better than the teacher model surprisingly. Experimental results demonstrate that our DP-IQA achieves state-of-the-art results on various in-the-wild datasets with better generalization capability, which shows the superiority of our method in global modeling and utilizing the hierarchical feature clues of diffusion for evaluating image quality.
翻译:图像质量评估(IQA)在筛选高质量图像以及指导一系列应用中的压缩与增强方法方面起着关键作用。盲图像质量评估旨在评估包含复杂真实失真且无参考图像的真实场景图像质量,面临更大挑战。现有方法局限于对局部图像块进行均匀分布建模,并受限于由广泛采用的预训练分类网络引起的低层与高层视觉之间的语义鸿沟。本文提出一种新颖的IQA方法——基于扩散先验的图像质量评估(DP-IQA),该方法利用预训练扩散模型的先验知识及其卓越能力,弥合图像视觉质量感知中的语义差距。具体而言,我们以预训练的稳定扩散模型为骨干网络,在指定时间步长的上采样过程中从去噪U-Net提取多层级特征,并通过解码这些特征来估计图像质量分数。采用文本与图像适配器以缓解下游任务的领域差异,并校正由变分自编码器瓶颈造成的信息损失。最后,我们将上述模型中的知识蒸馏到基于CNN的学生模型中,显著减少参数量以提升适用性,而学生模型的表现竟与教师模型相当甚至更优。实验结果表明,我们的DP-IQA方法在多个真实场景数据集上取得了最先进的性能,并展现出更优的泛化能力,这证明了我们的方法在全局建模以及利用扩散模型的层次化特征线索评估图像质量方面的优越性。