Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.
翻译:对于提供模糊输入且难以适应模型特性的用户而言,使文本到图像生成与用户意图保持一致仍然具有挑战性。我们提出自适应提示引导(APE),这是一种通过自适应地提出视觉查询来帮助用户优化提示而无需大量文字描述的技术。我们的技术贡献在于在信息论框架下建立了交互式意图推断的数学表述。APE利用语言模型先验将潜在意图表示为可解释的特征需求,自适应地生成视觉查询,并将引导出的需求编译为有效的提示。在IDEA-Bench和DesignBench上的评估表明,APE以更高的效率实现了更强的意图对齐。针对具有挑战性的用户自定义任务进行的用户研究表明,该方法在不增加工作负荷的情况下将意图对齐度提高了19.8%。我们的工作为提示方法提供了原则性框架,为普通用户提供了一种高效有效的补充方案,以改进当前基于提示的文本到图像模型交互范式。