Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets <subject, relationship, object> heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset. The dataset and toolkit are available at: https://linlin-dev.github.io/project/STAR.
翻译:卫星影像中的场景图生成有助于推动地理空间场景从感知到认知的理解。在卫星影像中,物体在尺度和长宽比上表现出巨大差异,且物体间(甚至空间不相交的物体间)存在丰富的关系,这使得在大型超高分辨率卫星影像中整体进行场景图生成具有重要价值。然而,目前缺乏此类场景图生成数据集。由于大型卫星影像的复杂性,挖掘<主体,关系,客体>三元组严重依赖于长距离上下文推理。因此,为小尺寸自然图像设计的场景图生成模型无法直接应用于大型卫星影像。本文构建了一个面向大型超高分辨率卫星影像场景图生成的大规模数据集,图像尺寸范围从512×768到27,860×31,096像素,命名为STAR(大规模卫星影像场景图生成),包含超过21万个物体和超过40万个三元组。为实现大型卫星影像中的场景图生成,我们提出了一个上下文感知级联认知框架,通过物体检测、配对剪枝和关系预测来理解卫星影像。我们还发布了一个面向卫星影像的场景图生成工具包,包含约30种物体检测方法和10种场景图生成方法,这些方法需要通过我们设计的模块在我们具有挑战性的STAR数据集上进行进一步适配。数据集与工具包公开于:https://linlin-dev.github.io/project/STAR。