Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste.
翻译:Copy-Paste是一种简单有效的实例分割数据增强策略。通过将目标实例随机粘贴到新的背景图像上,该方法可以免费生成新的训练数据,并显著提升分割性能,尤其是针对罕见目标类别。尽管Copy-Paste中使用的多样化高质量目标实例能带来更大的性能提升,但先前的工作要么使用人工标注的实例分割数据集中的目标实例,要么使用3D模型渲染的目标实例,这两种方法因成本过高而难以扩展以获得良好的多样性。本文借助新出现的零样本识别模型(如CLIP)和文本到图像生成模型(如StableDiffusion),重新审视了大规模Copy-Paste方法。我们首次证明:使用文本到图像模型生成图像,或使用零样本识别模型过滤不同类别的噪声爬取图像,是使Copy-Paste真正可扩展的可行方案。为实现这一突破,我们设计了一个名为"X-Paste"的数据采集与处理框架,并基于该框架进行了系统性研究。在LVIS数据集上,X-Paste相比以Swin-L为骨干网络的强基线模型CenterNet2取得了显著改进:所有类别的框AP提升了2.6点,掩膜AP提升了2.1点;在长尾类别上更实现了框AP提升6.8点、掩膜AP提升6.5点的显著增益。我们的代码和模型已开源至https://github.com/yoctta/XPaste。