Many deep learning tasks require annotations that are too time consuming for human operators, resulting in small dataset sizes. This is especially true for dense regression problems such as crowd counting which requires the location of every person in the image to be annotated. Techniques such as data augmentation and synthetic data generation based on simulations can help in such cases. In this paper, we introduce PromptMix, a method for artificially boosting the size of existing datasets, that can be used to improve the performance of lightweight networks. First, synthetic images are generated in an end-to-end data-driven manner, where text prompts are extracted from existing datasets via an image captioning deep network, and subsequently introduced to text-to-image diffusion models. The generated images are then annotated using one or more high-performing deep networks, and mixed with the real dataset for training the lightweight network. By extensive experiments on five datasets and two tasks, we show that PromptMix can significantly increase the performance of lightweight networks by up to 26%.
翻译:许多深度学习任务需要的人工标注耗时过长,导致数据集规模较小。这在密集回归问题(如人群计数,需要标注图像中每个人的位置)中尤为突出。数据增强和基于模拟的合成数据生成等技术在此类场景中可发挥重要作用。本文提出PromptMix方法,通过人工扩充现有数据集规模来提升轻量级网络性能。首先,以端到端数据驱动方式生成合成图像:通过图像描述深度网络从现有数据集中提取文本提示,随后将其输入文本到图像扩散模型。生成的图像再利用一个或多个高性能深度网络进行标注,并与真实数据集混合以训练轻量级网络。通过在五个数据集和两项任务上的大量实验,我们证明PromptMix能将轻量级网络性能显著提升最高达26%。