Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images (AIGIs) from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are typically generated by generative models using text prompts. Considerable efforts have been made in the past years to advance AIGCIQA. However, most existing AIGCIQA methods regress predicted scores directly from individual generated images, overlooking the information contained in the text prompts of these images. This oversight partially limits the performance of these AIGCIQA methods. To address this issue, we propose a text-image encoder-based regression (TIER) framework. Specifically, we process the generated images and their corresponding text prompts as inputs, utilizing a text encoder and an image encoder to extract features from these text prompts and generated images, respectively. To demonstrate the effectiveness of our proposed TIER method, we conduct extensive experiments on several mainstream AIGCIQA databases, including AGIQA-1K, AGIQA-3K, and AIGCIQA2023. The experimental results indicate that our proposed TIER method generally demonstrates superior performance compared to baseline in most cases.
翻译:近年来,从人类感知角度评估AI生成图像质量的AIGC图像质量评估(AIGCIQA)已成为计算机视觉领域的新课题。与由噪声、模糊和压缩等失真源导致的原始图像退化的常见图像质量评估任务不同,AIGCIQA任务中的图像通常由生成模型根据文本提示生成。过去数年间,研究人员已投入大量努力推动AIGCIQA发展。然而,现有多数AIGCIQA方法直接从单个生成图像回归预测分数,忽略了这些图像中文本提示所包含的信息。这种忽略一定程度上限制了方法的性能。为解决该问题,我们提出基于文本-图像编码器的回归(TIER)框架。具体而言,我们将生成图像及其对应文本提示作为输入,分别利用文本编码器和图像编码器提取文本提示与生成图像的特征。为验证所提TIER方法的有效性,我们在AGIQA-1K、AGIQA-3K和AIGCIQA2023等多个主流AIGCIQA数据库上开展广泛实验。实验结果表明,在大多数情况下,所提TIER方法普遍展现出优于基线方法的性能表现。