Remote sensing vision tasks require extensive labeled data across multiple, interconnected domains. However, current generative data augmentation frameworks are task-isolated, i.e., each vision task requires training an independent generative model, and ignores the modeling of geographical information and spatial constraints. To address these issues, we propose \textbf{TerraGen}, a unified layout-to-image generation framework that enables flexible, spatially controllable synthesis of remote sensing imagery for various high-level vision tasks, e.g., detection, segmentation, and extraction. Specifically, TerraGen introduces a geographic-spatial layout encoder that unifies bounding box and segmentation mask inputs, combined with a multi-scale injection scheme and mask-weighted loss to explicitly encode spatial constraints, from global structures to fine details. Also, we construct the first large-scale multi-task remote sensing layout generation dataset containing 45k images and establish a standardized evaluation protocol for this task. Experimental results show that our TerraGen can achieve the best generation image quality across diverse tasks. Additionally, TerraGen can be used as a universal data-augmentation generator, enhancing downstream task performance significantly and demonstrating robust cross-task generalisation in both full-data and few-shot scenarios.
翻译:遥感视觉任务需要跨多个相互关联领域的大量标注数据。然而,当前生成式数据增强框架是任务孤立的,即每个视觉任务需训练独立的生成模型,且忽略地理信息与空间约束的建模。为解决这些问题,我们提出\textbf{TerraGen},一个统一的布局到图像生成框架,能够为各类高层视觉任务(如检测、分割与提取)实现灵活、空间可控的遥感影像合成。具体而言,TerraGen引入地理空间布局编码器,统一边界框与分割掩码输入,结合多尺度注入方案与掩码加权损失,显式编码从全局结构到精细细节的空间约束。同时,我们构建了首个大规模多任务遥感布局生成数据集,包含45k张图像,并为此任务建立了标准化评估协议。实验结果表明,我们的TerraGen能在多样化任务中实现最佳生成图像质量。此外,TerraGen可作为通用数据增强生成器,显著提升下游任务性能,并在全数据与少样本场景中均展现出强大的跨任务泛化能力。