Add-SD: Rational Generation without Manual Reference

Diffusion models have exhibited remarkable prowess in visual generalization. Building on this success, we introduce an instruction-based object addition pipeline, named Add-SD, which automatically inserts objects into realistic scenes with rational sizes and positions. Different from layout-conditioned methods, Add-SD is solely conditioned on simple text prompts rather than any other human-costly references like bounding boxes. Our work contributes in three aspects: proposing a dataset containing numerous instructed image pairs; fine-tuning a diffusion model for rational generation; and generating synthetic data to boost downstream tasks. The first aspect involves creating a RemovalDataset consisting of original-edited image pairs with textual instructions, where an object has been removed from the original image while maintaining strong pixel consistency in the background. These data pairs are then used for fine-tuning the Stable Diffusion (SD) model. Subsequently, the pretrained Add-SD model allows for the insertion of expected objects into an image with good rationale. Additionally, we generate synthetic instances for downstream task datasets at scale, particularly for tail classes, to alleviate the long-tailed problem. Downstream tasks benefit from the enriched dataset with enhanced diversity and rationale. Experiments on LVIS val demonstrate that Add-SD yields an improvement of 4.3 mAP on rare classes over the baseline. Code and models are available at https://github.com/ylingfeng/Add-SD.

翻译：扩散模型在视觉泛化方面展现出卓越能力。基于这一成功，我们提出了一种基于指令的物体添加流程，称为Add-SD，该流程能够以合理的尺寸和位置将物体自动插入真实场景中。与布局条件方法不同，Add-SD仅以简单文本提示为条件，无需边界框等任何其他耗费人力的人工参考。本工作的贡献包括三个方面：提出包含大量指令化图像对的数据集；微调扩散模型以实现合理生成；生成合成数据以提升下游任务性能。第一方面涉及创建由原始-编辑图像对及文本指令组成的RemovalDataset，其中原始图像中的物体被移除，同时背景保持高度像素一致性。这些数据对随后用于微调Stable Diffusion（SD）模型。经预训练的Add-SD模型能够以良好合理性将目标物体插入图像。此外，我们大规模生成下游任务数据集的合成实例，特别是针对尾部类别，以缓解长尾分布问题。下游任务受益于增强多样性与合理性的丰富数据集。在LVIS验证集上的实验表明，Add-SD在稀有类别上相比基线实现了4.3 mAP的提升。代码与模型发布于https://github.com/ylingfeng/Add-SD。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日