Synthetic datasets, recognized for their cost effectiveness, play a pivotal role in advancing computer vision tasks and techniques. However, when it comes to remote sensing image processing, the creation of synthetic datasets becomes challenging due to the demand for larger-scale and more diverse 3D models. This complexity is compounded by the difficulties associated with real remote sensing datasets, including limited data acquisition and high annotation costs, which amplifies the need for high-quality synthetic alternatives. To address this, we present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories, and it also provides 40,000 pairs of bitemporal image pairs with building change annotations for building change detection task. We conduct experiments on multiple benchmark remote sensing datasets to verify the effectiveness of SyntheWorld and to investigate the conditions under which our synthetic data yield advantages. We will release SyntheWorld to facilitate remote sensing image processing research.
翻译:合成数据集以其成本效益高而闻名,在推动计算机视觉任务与技术发展中发挥着关键作用。然而,在遥感图像处理领域,由于需要更大规模、更多样化的三维模型,合成数据集的构建面临挑战。真实遥感数据集存在数据获取受限、标注成本高等困难,进一步加剧了对高质量合成替代方案的需求。为此,我们提出SyntheWorld——一个在质量、多样性和规模上均无与伦比的合成数据集。该数据集包含4万张亚米级像素图像及八类细粒度土地覆盖标注,并提供4万对带有建筑物变化标注的双时相图像对,用于建筑物变化检测任务。我们在多个基准遥感数据集上开展实验,验证了SyntheWorld的有效性,并探究了合成数据发挥优势的条件。我们将公开SyntheWorld以推动遥感图像处理研究。