Text-to-image generation systems have emerged as revolutionary tools in the realm of artistic creation, offering unprecedented ease in transforming textual prompts into visual art. However, the efficacy of these systems is intricately linked to the quality of user-provided prompts, which often poses a challenge to users unfamiliar with prompt crafting. This paper addresses this challenge by leveraging user reformulation data from interaction logs to develop an automatic prompt reformulation model. Our in-depth analysis of these logs reveals that user prompt reformulation is heavily dependent on the individual user's capability, resulting in significant variance in the quality of reformulation pairs. To effectively use this data for training, we introduce the Capability-aware Prompt Reformulation (CAPR) framework. CAPR innovatively integrates user capability into the reformulation process through two key components: the Conditional Reformulation Model (CRM) and Configurable Capability Features (CCF). CRM reformulates prompts according to a specified user capability, as represented by CCF. The CCF, in turn, offers the flexibility to tune and guide the CRM's behavior. This enables CAPR to effectively learn diverse reformulation strategies across various user capacities and to simulate high-capability user reformulation during inference. Extensive experiments on standard text-to-image generation benchmarks showcase CAPR's superior performance over existing baselines and its remarkable robustness on unseen systems. Furthermore, comprehensive analyses validate the effectiveness of different components. CAPR can facilitate user-friendly interaction with text-to-image systems and make advanced artistic creation more achievable for a broader range of users.
翻译:文本到图像生成系统已作为艺术创作领域的革命性工具涌现,为将文本提示转化为视觉艺术提供了前所未有的便捷性。然而,这些系统的效能与用户提供的提示质量密切相关,而提示撰写对不熟悉此技巧的用户常构成挑战。本文通过利用交互日志中的用户重写数据,开发自动提示重写模型来应对这一挑战。对日志的深入分析揭示,用户提示重写高度依赖于个体用户的能力,导致重写配对质量存在显著差异。为有效利用这些数据进行训练,我们提出了能力感知提示重写(CAPR)框架。CAPR创新性地通过两个关键组件将用户能力融入重写过程:条件重写模型(CRM)和可配置能力特征(CCF)。CRM根据由CCF表示的指定用户能力对提示进行重写,而CCF则提供灵活性以调整和引导CRM的行为。这使得CAPR能够有效学习不同用户能力下的多样化重写策略,并在推理阶段模拟高能力用户的提示重写。在标准文本到图像生成基准上的大量实验表明,CAPR相较于现有基线方法具有优越性能,并在未见过的系统上展现出显著的鲁棒性。此外,综合分析验证了不同组件的有效性。CAPR可促进用户与文本到图像系统的友好交互,并使更广泛的用户群体更易实现高级艺术创作。