X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

This paper introduces a novel explainable image quality evaluation approach called X-IQE, which leverages visual large language models (LLMs) to evaluate text-to-image generation methods by generating textual explanations. X-IQE utilizes a hierarchical Chain of Thought (CoT) to enable MiniGPT-4 to produce self-consistent, unbiased texts that are highly correlated with human evaluation. It offers several advantages, including the ability to distinguish between real and generated images, evaluate text-image alignment, and assess image aesthetics without requiring model training or fine-tuning. X-IQE is more cost-effective and efficient compared to human evaluation, while significantly enhancing the transparency and explainability of deep image quality evaluation models. We validate the effectiveness of our method as a benchmark using images generated by prevalent diffusion models. X-IQE demonstrates similar performance to state-of-the-art (SOTA) evaluation methods on COCO Caption, while overcoming the limitations of previous evaluation models on DrawBench, particularly in handling ambiguous generation prompts and text recognition in generated images. Project website: https://github.com/Schuture/Benchmarking-Awesome-Diffusion-Models

翻译：本文提出了一种名为X-IQE的新型可解释图像质量评估方法，该方法利用视觉大语言模型（LLMs）通过生成文本解释来评估文本生成图像方法。X-IQE采用分层思维链（CoT）机制，使MiniGPT-4能够生成与人类评估高度相关的自洽、无偏文本。该方法具有多项优势，包括无需模型训练或微调即可区分真实图像与生成图像、评估文本-图像对齐程度以及评估图像美学质量。与人工评估相比，X-IQE更具成本效益和效率，同时显著提升了深度图像质量评估模型的透明度和可解释性。我们采用主流扩散模型生成的图像验证了该方法作为基准的有效性。在COCO Caption数据集上，X-IQE展现了与最先进（SOTA）评估方法相当的性能，同时克服了以往评估模型在DrawBench上的局限，特别是在处理模糊生成提示和生成图像中文本识别方面。项目网站：https://github.com/Schuture/Benchmarking-Awesome-Diffusion-Models

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

专知会员服务

136+阅读 · 2020年5月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日