Semantic segmentation of high-resolution remote-sensing imagery is critical for urban mapping and land-cover monitoring, yet training data typically exhibits severe long-tailed pixel imbalance. In the dataset LoveDA, this challenge is compounded by an explicit Urban/Rural split with distinct appearance and inconsistent class-frequency statistics across domains. We present a prompt-controlled diffusion augmentation framework that synthesizes paired label--image samples with explicit control of both domain and semantic composition. Stage~A uses a domain-aware, masked ratio-conditioned discrete diffusion model to generate layouts that satisfy user-specified class-ratio targets while respecting learned co-occurrence structure. Stage~B translates layouts into photorealistic, domain-consistent images using Stable Diffusion with ControlNet guidance. Mixing the resulting ratio and domain-controlled synthetic pairs with real data yields consistent improvements across multiple segmentation backbones, with gains concentrated on minority classes and improved Urban and Rural generalization, demonstrating controllable augmentation as a practical mechanism to mitigate long-tail bias in remote-sensing segmentation. Source codes, pretrained models, and synthetic datasets are available at \href{https://github.com/Buddhi19/SyntheticGen.git}{Github}
翻译:高分辨率遥感影像的语义分割对于城市测绘和土地覆盖监测至关重要,但训练数据通常存在严重的像素级长尾不平衡问题。在LoveDA数据集中,这一挑战因明确的城乡划分而加剧:不同区域具有显著的外观差异,且跨域的类别频率统计不一致。我们提出了一种提示控制的扩散增强框架,能够合成具有明确域控制和语义组合控制的配对标签-图像样本。阶段A采用一种域感知、掩码比率调节的离散扩散模型来生成满足用户指定类别比率目标、同时遵循学习到的共现结构的布局。阶段B利用Stable Diffusion结合ControlNet引导,将布局转换为具有照片级真实感且域一致的图像。将生成的比率与域控制合成样本与真实数据混合,在多种分割骨干网络上均取得了稳定的性能提升,增益主要集中在少数类别上,并改善了城乡区域的泛化能力,证明了可控增强作为缓解遥感分割中长尾偏差的有效机制。源代码、预训练模型及合成数据集发布于\href{https://github.com/Buddhi19/SyntheticGen.git}{Github}。