The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.
翻译:深度生成模型的快速发展极大地推动了计算机视觉领域的研究,为获取大量昂贵图像提供了一种经济高效的替代方案。然而,现有方法主要集中于合成在全局布局视角上与真实图像对齐的遥感图像,这限制了它们在遥感图像目标检测研究中的适用性。为应对这些挑战,我们提出了一种基于深度生成模型的多类别多尺度目标图像生成器,称为 MMO-IG,旨在同时从全局和局部两个方面生成带有监督目标标签的遥感图像。具体而言,从局部视角出发,MMO-IG 使用等间距实例图对各类遥感实例进行编码。在生成过程中,它解码 ISIM 中具有等间距值的每个实例区域——对应背景和前景实例——通过扩散模型的去噪过程来生成遥感图像。考虑到多类别多尺度目标之间复杂的相互依赖关系,我们构建了一个空间交叉依赖知识图。这确保了区域嵌入中多类别多尺度目标之间现实且可靠的多向分布,从而减小源域与目标域之间的差异。此外,我们提出了一种结构化目标分布指令,与基于 SCDKG 的 ISIM 一起,从全局角度指导合成遥感图像内容的生成。大量实验结果表明,我们的 MMO-IG 在生成带有密集多类别多尺度监督标签的遥感图像方面展现出卓越的能力,并且使用 MMO-IG 预训练的遥感检测器在真实世界数据集上表现出优异的性能。