Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typically calculate latent feature similarity, which is difficult to interpret and does not always correlate with the contribution to downstream tasks. We propose a vision-language grounded framework for interpretable synthetic data augmentation and evaluation in remote sensing. Our approach combines generative models, semantic segmentation and image captioning with vision and language models. Based on this framework, we introduce ARAS400k: A large-scale Remote sensing dataset Augmented with Synthetic data for segmentation and captioning, containing 100k real images and 300k synthetic images, each paired with segmentation maps and descriptions. ARAS400k enables the automated evaluation of synthetic data by analyzing semantic composition, minimizing caption redundancy, and verifying cross-modal consistency between visual structures and language descriptions. Experimental results indicate that while models trained exclusively on synthetic data reach competitive performance levels, those trained with augmented data (a combination of real and synthetic images) consistently outperform real-data baselines. Consequently, this work establishes a scalable benchmark for remote sensing tasks, specifically in semantic segmentation and image captioning. The dataset is available at zenodo.org/records/18890661 and the code base at github.com/caglarmert/ARAS400k.
翻译:深度学习模型受益于数据多样性和数量的增加,这促使了通过合成数据增强来改进现有数据集。然而,现有的合成数据评估指标通常计算潜在特征相似度,这种方法难以解释,且并不总是与对下游任务的贡献相关。我们提出了一种基于视觉与语言的可解释合成数据增强与评估框架,应用于遥感领域。我们的方法将生成模型、语义分割和图像描述与视觉及语言模型相结合。基于此框架,我们引入了ARAS400k:一个用于分割与描述的大规模遥感数据集,通过合成数据增强,包含10万张真实图像和30万张合成图像,每张图像均配有分割图和描述。ARAS400k通过分析语义构成、最小化描述冗余以及验证视觉结构与语言描述之间的跨模态一致性,实现了合成数据的自动化评估。实验结果表明,虽然仅使用合成数据训练的模型达到了具有竞争力的性能水平,但使用增强数据(真实与合成图像组合)训练的模型始终优于仅使用真实数据的基线模型。因此,本研究为遥感任务,特别是语义分割和图像描述,建立了一个可扩展的基准。数据集发布于zenodo.org/records/18890661,代码库位于github.com/caglarmert/ARAS400k。