Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images.
翻译:伪装视觉感知是一项重要的视觉任务,具有广泛的实际应用。由于数据收集与标注成本高昂,该领域面临一个主要瓶颈:其数据集的物种类别仅限于少数目标物种。然而,现有的伪装生成方法需要手动指定背景,因而无法以低成本方式扩展伪装样本的多样性。本文提出一种用于伪装图像生成的潜在背景知识检索增强扩散方法(LAKE-RED)。据我们所知,我们的贡献主要包括:(1)首次提出一种无需接收任何背景输入的伪装生成范式。(2)我们的LAKE-RED是首个具有可解释性的知识检索增强型伪装生成方法,其中我们提出将知识检索与推理增强显式分离的思想,以缓解任务特定的挑战。此外,我们的方法不受特定前景目标或背景的限制,为将伪装视觉感知扩展到更多样化的领域提供了潜力。(3)实验结果表明,我们的方法优于现有方法,能够生成更逼真的伪装图像。