WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

from arxiv, Accepted to ICML 2026. We have also released an updated version of the benchmark, WISE_Verified. Please refer to https://github.com/PKU-YuanGroup/WISE for the latest version

Text-to-Image (T2I) models are capable of generating high-quality artistic creations and visual content. However, existing research and evaluation standards predominantly focus on image realism and shallow text-image alignment, lacking a comprehensive assessment of complex semantic understanding and world knowledge integration in text-to-image generation. To address this challenge, we propose \textbf{WISE}, the first benchmark specifically designed for \textbf{W}orld Knowledge-\textbf{I}nformed \textbf{S}emantic \textbf{E}valuation. WISE moves beyond simple word-pixel mapping by challenging models with 1000 meticulously crafted prompts across 25 subdomains in cultural common sense, spatio-temporal reasoning, and natural science. To overcome the limitations of traditional CLIP metric, we introduce \textbf{WiScore}, a novel quantitative metric for assessing knowledge-image alignment. Through comprehensive testing of 20 models (10 dedicated T2I models and 10 unified multimodal models) using 1,000 structured prompts spanning 25 subdomains, our findings reveal significant limitations in their ability to effectively integrate and apply world knowledge during image generation, highlighting critical pathways for enhancing knowledge incorporation and application in next-generation T2I models. Code and data are available at \href{https://github.com/PKU-YuanGroup/WISE}{PKU-YuanGroup/WISE}.

翻译：文本到图像（T2I）模型能够生成高质量的艺术创作和视觉内容。然而，现有研究和评估标准主要侧重于图像真实性和浅层图文对齐，缺乏对文本到图像生成过程中复杂语义理解与世界知识整合能力的全面评估。为解决这一挑战，我们提出**WISE**，这是首个专门用于**世界知识驱动语义评估**的基准测试。WISE超越简单的词-像素映射，通过精心设计的1000个提示词，挑战模型在文化常识、时空推理和自然科学等25个子领域中的表现。为克服传统CLIP度量的局限性，我们引入**WiScore**——一种用于评估知识-图像对齐的新型量化指标。通过对20个模型（包括10个专用T2I模型和10个统一多模态模型）在跨越25个子领域的1000个结构化提示词上进行全面测试，我们的研究揭示了它们在图像生成过程中有效整合与应用世界知识方面的显著局限性，从而凸显了增强下一代T2I模型知识融入与应用能力的关键路径。代码与数据可在\href{https://github.com/PKU-YuanGroup/WISE}{PKU-YuanGroup/WISE}获取。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

文本生成与编辑图像：综述

专知会员服务

11+阅读 · 2025年5月8日

【CVPR2025】先获取后适配：挖掘文本‑图像生成模型在图像复原中的潜力

专知会员服务

11+阅读 · 2025年4月22日

IMAGINE-E：最先进文本到图像模型的图像生成智能评估

专知会员服务

13+阅读 · 2025年2月3日

【博士论文】深度生成表示学习

专知会员服务

35+阅读 · 2025年1月13日