TIPO (Text-to-Image Prompt Optimization) introduces an efficient approach for automatic prompt refinement in text-to-image (T2I) generation. Starting from simple user prompts, TIPO leverages a lightweight pre-trained model to expand these prompts into richer and more detailed versions. Conceptually, TIPO samples refined prompts from a targeted sub-distribution within the broader semantic space, preserving the original intent while significantly improving visual quality, coherence, and detail. Unlike resource-intensive methods based on large language models (LLMs) or reinforcement learning (RL), TIPO offers strong computational efficiency and scalability, opening new possibilities for effective automated prompt engineering in T2I tasks. Extensive experiments across multiple domains demonstrate that TIPO achieves stronger text alignment, reduced visual artifacts, and consistently higher human preference rates, while maintaining competitive aesthetic quality. These results highlight the effectiveness of distribution-aligned prompt engineering and point toward broader opportunities for scalable, automated refinement in text-to-image generation.
翻译:TIPO(文本到图像提示优化)提出了一种高效的文本到图像生成自动提示优化方法。该方法从简单的用户提示出发,利用轻量级预训练模型将原始提示扩展为更丰富、更详细的版本。从概念上讲,TIPO从更广泛语义空间中的目标子分布中采样优化后的提示,在保留原始意图的同时,显著提升了视觉质量、连贯性和细节表现。与基于大型语言模型或强化学习的资源密集型方法不同,TIPO具有强大的计算效率和可扩展性,为文本到图像任务中的高效自动化提示工程开辟了新的可能性。跨多个领域的广泛实验表明,TIPO在保持竞争力的美学质量的同时,实现了更强的文本对齐、更少的视觉伪影以及持续更高的人类偏好率。这些结果凸显了分布对齐提示工程的有效性,并指出了在文本到图像生成中实现可扩展自动化优化的更广阔前景。