Text-to-image generation has recently emerged as a viable alternative to text-to-image retrieval, due to the visually impressive results of generative diffusion models. Although query performance prediction is an active research topic in information retrieval, to the best of our knowledge, there is no prior study that analyzes the difficulty of queries (prompts) in text-to-image generation, based on human judgments. To this end, we introduce the first dataset of prompts which are manually annotated in terms of image generation performance. In order to determine the difficulty of the same prompts in image retrieval, we also collect manual annotations that represent retrieval performance. We thus propose the first benchmark for joint text-to-image prompt and query performance prediction, comprising 10K queries. Our benchmark enables: (i) the comparative assessment of the difficulty of prompts/queries in image generation and image retrieval, and (ii) the evaluation of prompt/query performance predictors addressing both generation and retrieval. We present results with several pre-generation/retrieval and post-generation/retrieval performance predictors, thus providing competitive baselines for future research. Our benchmark and code is publicly available under the CC BY 4.0 license at https://github.com/Eduard6421/PQPP.
翻译:文本到图像生成近来因生成式扩散模型在视觉上令人印象深刻的结果,已成为文本到图像检索的可行替代方案。尽管查询性能预测是信息检索领域一个活跃的研究课题,但据我们所知,目前尚无基于人类判断来分析文本到图像生成中查询(提示)难度的先前研究。为此,我们首次引入了在图像生成性能方面进行人工标注的提示数据集。为确定同一提示在图像检索中的难度,我们还收集了代表检索性能的人工标注。因此,我们提出了首个联合文本到图像提示与查询性能预测基准,包含10K个查询。我们的基准能够实现:(i) 图像生成与图像检索中提示/查询难度的比较评估,以及 (ii) 针对生成与检索任务的提示/查询性能预测器的评估。我们展示了若干预生成/预检索与后生成/后检索性能预测器的结果,为未来研究提供了竞争性基线。我们的基准和代码已在CC BY 4.0许可下于https://github.com/Eduard6421/PQPP公开提供。