Camouflaged image generation (CIG) has recently emerged as an efficient alternative for acquiring high-quality training data for camouflaged object detection (COD). However, existing CIG methods still suffer from a substantial gap to real camouflaged imagery: generated images either lack sufficient camouflage due to weak visual similarity, or exhibit cluttered backgrounds that are semantically inconsistent with foreground targets. To address these limitations, we propose RealCamo, a novel out-painting-based framework for controllable realistic camouflaged image generation. RealCamo explicitly introduces additional layout controls to regulate global image structure, thereby improving semantic coherence between foreground objects and generated backgrounds. Moreover, we construct a multimodal textual-visual condition by combining a unified fine-grained textual task description with texture-oriented background retrieval, which jointly guides the generation process to enhance visual fidelity and realism. To quantitatively assess camouflage quality, we further introduce a background-foreground distribution divergence metric that measures the effectiveness of camouflage in generated images. Extensive experiments and visualizations demonstrate the effectiveness of our proposed framework.
翻译:伪装图像生成(CIG)最近已成为获取高质量伪装目标检测(COD)训练数据的一种高效替代方案。然而,现有的CIG方法仍与真实伪装图像存在显著差距:生成的图像要么因视觉相似性不足而缺乏足够的伪装效果,要么呈现出与前景目标语义不一致的杂乱背景。为解决这些局限性,我们提出了RealCamo,一种新颖的基于外绘制的可控真实伪装图像生成框架。RealCamo明确引入了额外的布局控制来规范全局图像结构,从而改善前景对象与生成背景之间的语义一致性。此外,我们通过将统一的细粒度文本任务描述与面向纹理的背景检索相结合,构建了一个多模态文本-视觉条件,共同引导生成过程以增强视觉保真度与真实感。为了定量评估伪装质量,我们进一步引入了一种背景-前景分布差异度量,用于衡量生成图像中伪装的有效性。大量的实验与可视化结果证明了我们所提框架的有效性。