SelfEval: Leveraging the discriminative nature of generative models for evaluation

In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, making the generative model directly applicable to discriminative tasks. Using SelfEval, we repurpose standard datasets created for evaluating multimodal text-image discriminative models to evaluate generative models in a fine-grained manner: assessing their performance on attribute binding, color recognition, counting, shape recognition, spatial understanding. To the best of our knowledge SelfEval is the first automated metric to show a high degree of agreement for measuring text-faithfulness with the gold-standard human evaluations across multiple models and benchmarks. Moreover, SelfEval enables us to evaluate generative models on challenging tasks such as Winoground image-score where they demonstrate competitive performance to discriminative models. We also show severe drawbacks of standard automated metrics such as CLIP-score to measure text faithfulness on benchmarks such as DrawBench, and how SelfEval sidesteps these issues. We hope SelfEval enables easy and reliable automated evaluation for diffusion models.

翻译：本文证明，文本到图像生成模型可以通过“逆向”操作，以完全自动化的方式评估其自身的文本-图像理解能力。我们提出的方法SelfEval利用生成模型计算给定文本提示下真实图像的似然，从而将生成模型直接应用于判别任务。通过SelfEval，我们将原本用于评估多模态文本-图像判别模型的标准数据集重新用于细粒度评估生成模型：评估其在属性绑定、颜色识别、计数、形状识别和空间理解方面的性能。据我们所知，SelfEval是首个在多个模型和基准测试中，与黄金标准人工评估在衡量文本忠实度方面表现出高度一致的自动化指标。此外，SelfEval使我们能够在Winoground图像评分等具有挑战性的任务上评估生成模型，且在这些任务中生成模型展现了与判别模型相当的性能。我们还揭示了CLIP评分等标准自动化指标在衡量DrawBench等基准测试中文本忠实度方面的严重缺陷，并说明了SelfEval如何规避这些问题。我们期望SelfEval能够为扩散模型提供简便且可靠的自动化评估方法。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日