In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios. Recently, DMs have opened up the possibility of generating synthetic images to complement a few training images. However, increasing the diversity of synthetic images also raises the risk of generating samples outside the target distribution. Our approach addresses this issue by embedding novel semantic information into text prompts via LLM and utilizing real images as visual prompts, thus generating semantically rich images. To ensure that the generated images remain within the target distribution, we dynamically adjust the guidance weight based on each image's CLIPScore to control the diversity. Experimental results show that our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution. Consequently, our approach proves to be more efficient in the few-shot setting on several benchmarks. Our code is available at https://github.com/kkyuhun94/dalda .
翻译:本文提出一种有效的数据增强框架,通过结合大语言模型与扩散模型应对数据稀缺场景中的挑战。近年来,扩散模型为生成合成图像以补充少量训练图像提供了可能。然而,增加合成图像的多样性也会提高生成超出目标分布样本的风险。我们的方法通过LLM向文本提示中嵌入新颖语义信息,并利用真实图像作为视觉提示,从而生成语义丰富的图像。为确保生成图像保持在目标分布内,我们根据每张图像的CLIPScore动态调整引导权重以控制多样性。实验结果表明,我们的方法在保持目标分布一致性的同时,生成了多样性更强的合成图像。因此,在多个基准测试的少样本场景中,本方法展现出更高的效率。相关代码已发布于https://github.com/kkyuhun94/dalda。