Text-to-image generation and text-guided image manipulation have received considerable attention in the field of image generation tasks. However, the mainstream evaluation methods for these tasks have difficulty in evaluating whether all the information from the input text is accurately reflected in the generated images, and they mainly focus on evaluating the overall alignment between the input text and the generated images. This paper proposes new evaluation metrics that assess the alignment between input text and generated images for every individual object. Firstly, according to the input text, chatGPT is utilized to produce questions for the generated images. After that, we use Visual Question Answering(VQA) to measure the relevance of the generated images to the input text, which allows for a more detailed evaluation of the alignment compared to existing methods. In addition, we use Non-Reference Image Quality Assessment(NR-IQA) to evaluate not only the text-image alignment but also the quality of the generated images. Experimental results show that our proposed evaluation approach is the superior metric that can simultaneously assess finer text-image alignment and image quality while allowing for the adjustment of these ratios.
翻译:文本到图像生成和文本引导的图像编辑在图像生成任务领域受到了广泛关注。然而,这些任务的主流评估方法难以评估输入文本中的所有信息是否在生成图像中得到准确反映,且主要侧重于评估输入文本与生成图像之间的整体对齐度。本文提出了新的评估指标,用于逐对象评估输入文本与生成图像之间的对齐程度。首先,根据输入文本,利用ChatGPT为生成图像生成问题。随后,我们使用视觉问答(VQA)来度量生成图像与输入文本的相关性,与现有方法相比,这允许进行更细致的对齐评估。此外,我们采用无参考图像质量评估(NR-IQA)来同时评估文本-图像对齐度和生成图像的质量。实验结果表明,我们提出的评估方法是一种更优的指标,能够同时评估更精细的文本-图像对齐度和图像质量,并允许调整这些指标的权重比例。