Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, in AIGCIQA tasks, images are typically generated by generative models using text prompts. Considerable efforts have been made in the past years to advance AIGCIQA. However, most existing AIGCIQA methods regress predicted scores directly from individual generated images, overlooking the information contained in the text prompts of these images. This oversight partially limits the performance of these AIGCIQA methods. To address this issue, we propose a text and image encoder-based regression (TIER) framework. Specifically, we process the generated images and their corresponding text prompts as inputs, utilizing a text encoder and an image encoder to extract features from these text prompts and generated images, respectively. To demonstrate the effectiveness of our proposed TIER method, we conduct extensive experiments on several mainstream AIGCIQA databases, including AGIQA-1K, AGIQA-3K, and AIGCIQA2023. The experimental results indicate that our proposed TIER method generally demonstrates superior performance compared to baseline in most cases.
翻译:近年来,AIGC图像质量评估旨在从人类感知角度评估AI生成图像质量,已成为计算机视觉领域的新课题。与由原始图像经噪声、模糊和压缩失真而成的常规图像质量评估任务不同,在AIGCIQA任务中,图像通常由生成模型根据文本提示生成。过去数年里,研究者们为推进AIGCIQA做出了大量努力。然而,现有大多数AIGCIQA方法直接对单幅生成图像进行分数回归预测,忽视了这些图像所含文本提示信息,这一不足部分地限制了方法性能。为解决该问题,我们提出基于文本与图像编码器的回归框架(TIER)。具体而言,我们将生成图像及其对应文本提示作为输入,分别利用文本编码器和图像编码器提取这些文本提示与生成图像的特征。为验证所提TIER方法的有效性,我们在AGIQA-1K、AGIQA-3K和AIGCIQA2023等多个主流AIGCIQA数据集上开展了广泛实验。结果表明,在多数情况下,我们提出的TIER方法性能普遍优于基线方法。