Diffusion models have recently been employed to generate high-quality images, reducing the need for manual data collection and improving model generalization in tasks such as object detection, instance segmentation, and image perception. However, the synthetic framework is usually designed with meticulous human effort for each task due to various requirements on image layout, content, and annotation formats, restricting the application of synthetic data on more general scenarios. In this paper, we propose AnySynth, a unified framework integrating adaptable, comprehensive, and highly controllable components capable of generating an arbitrary type of synthetic data given diverse requirements. Specifically, the Task-Specific Layout Generation Module is first introduced to produce reasonable layouts for different tasks by leveraging the generation ability of large language models and layout priors of real-world images. A Uni-Controlled Image Generation Module is then developed to create high-quality synthetic images that are controllable and based on the generated layouts. In addition, user specific reference images, and style images can be incorporated into the generation to task requirements. Finally, the Task-Oriented Annotation Module offers precise and detailed annotations for the generated images across different tasks. We have validated our framework's performance across various tasks, including Few-shot Object Detection, Cross-domain Object Detection, Zero-shot Composed Image Retrieval, and Multi-modal Image Perception and Grounding. The specific data synthesized by our framework significantly improves model performance in these tasks, demonstrating the generality and effectiveness of our framework.
翻译:扩散模型近期被用于生成高质量图像,减少了对人工数据收集的需求,并在目标检测、实例分割和图像感知等任务中提升了模型的泛化能力。然而,由于不同任务对图像布局、内容及标注格式的多样化要求,合成框架通常需要针对每个任务进行精细的人工设计,这限制了合成数据在更通用场景中的应用。本文提出AnySynth,一个集成了可适应、全面且高度可控组件的统一框架,能够根据多样化需求生成任意类型的合成数据。具体而言,首先引入任务特定布局生成模块,通过利用大语言模型的生成能力和真实图像的布局先验,为不同任务生成合理的布局。随后开发统一可控图像生成模块,基于生成的布局创建高质量且可控的合成图像。此外,可根据任务需求将用户指定的参考图像和风格图像融入生成过程。最后,面向任务的标注模块为生成图像提供跨不同任务的精确细致标注。我们在少样本目标检测、跨域目标检测、零样本组合图像检索以及多模态图像感知与定位等多种任务上验证了框架性能。由本框架合成的特定数据在这些任务中显著提升了模型表现,证明了框架的通用性和有效性。