We address the task of advertisement image generation and introduce three evaluation metrics to assess Creativity, prompt Alignment, and Persuasiveness (CAP) in generated advertisement images. Despite recent advancements in Text-to-Image (T2I) generation and their performance in generating high-quality images for explicit descriptions, evaluating these models remains challenging. Existing evaluation methods focus largely on assessing alignment with explicit, detailed descriptions, but evaluating alignment with visually implicit prompts remains an open problem. Additionally, creativity and persuasiveness are essential qualities that enhance the effectiveness of advertisement images, yet are seldom measured. To address this, we propose three novel metrics for evaluating the creativity, alignment, and persuasiveness of generated images. Our findings reveal that current T2I models struggle with creativity, persuasiveness, and alignment when the input text is implicit messages. We further introduce a simple yet effective approach to enhance T2I models' capabilities in producing images that are better aligned, more creative, and more persuasive.
翻译:本文针对广告图像生成任务,提出三项评估指标,用于衡量生成广告图像的创造力(Creativity)、提示对齐性(prompt Alignment)与说服力(Persuasiveness)(合称CAP)。尽管文本到图像(T2I)生成技术近期取得显著进展,并在基于显式描述生成高质量图像方面表现优异,但如何有效评估这些模型仍具挑战。现有评估方法主要关注图像与显式详细描述的匹配程度,而对视觉隐式提示的评估仍属开放问题。此外,创造力与说服力是提升广告图像效力的关键属性,却鲜有量化评估方法。为此,我们提出三项新颖指标,分别用于评估生成图像的创造力、对齐性与说服力。实验结果表明,当输入文本为隐式信息时,现有T2I模型在创造力、说服力及对齐性方面均存在不足。我们进一步提出一种简单而有效的方法,可增强T2I模型生成更精准对齐、更具创意且更具说服力图像的能力。