Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. In parallel, the problem of data scarcity has brought a growing interest in employing AIGC technology for high-quality data expansion. However, this paradigm requires well-designed prompt engineering that cost-less data expansion and labeling remain under-explored. Inspired by LLM's powerful capability in task guidance, we propose a new paradigm of annotated data expansion named as ChatGenImage. The core idea behind it is to leverage the complementary strengths of diverse models to establish a highly effective and user-friendly pipeline for interactive data augmentation. In this work, we extensively study how LLMs communicate with AIGC model to achieve more controllable image generation and make the first attempt to collaborate them for automatic data augmentation for a variety of downstream tasks. Finally, we present fascinating results obtained from our ChatGenImage framework and demonstrate the powerful potential of our synthetic data for systematic vision adaptation. Our codes are available at https://github.com/Yuqifan1117/Labal-Anything-Pipeline.
翻译:近期文本到图像生成模型在生成高保真度逼真图像方面展现了显著成果。与此同时,数据稀缺问题促使人们日益关注利用AIGC技术进行高质量数据扩展。然而,这一范式需要精心设计的提示工程,且低成本数据扩展与标注仍待深入探索。受大语言模型在任务引导方面强大能力的启发,我们提出了一种名为ChatGenImage的新型标注数据扩展范式。其核心思想在于利用不同模型的互补优势,构建高效且用户友好的交互式数据增强流水线。本研究深入探讨了大语言模型如何与AIGC模型交互以实现更具可控性的图像生成,并首次尝试通过两者协作实现多种下游任务的自动化数据增强。最后,我们展示了ChatGenImage框架的卓越成果,并论证了合成数据在系统性视觉适应中的强大潜力。我们的代码已开源至https://github.com/Yuqifan1117/Labal-Anything-Pipeline。