Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, the lack of SGG datasets with large-size VHR SAI has constrained the advancement of SGG in SAI. Due to the complexity of large-size VHR SAI, mining triplets <subject, relationship, object> in large-size VHR SAI heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size VHR SAI. To address the scarcity of datasets, this paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named RSG, encompassing over 210,000 objects and more than 400,000 triplets. To realize SGG in large-size VHR SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI at three levels: object detection (OBD), pair pruning and relationship prediction. As a fundamental prerequisite for SGG in large-size SAI, a holistic multi-class object detection network (HOD-Net) that can flexibly integrate multi-scale contexts is proposed. With the consideration that there exist a huge amount of object pairs in large-size SAI but only a minority of object pairs contain meaningful relationships, we design a pair proposal generation (PPG) network via adversarial reconstruction to select high-value pairs. Furthermore, a relationship prediction network with context-aware messaging (RPCM) is proposed to predict the relationship types of these pairs.
翻译:卫星影像中的场景图生成有助于推动地理空间场景从感知到认知的智能化理解。在卫星影像中,目标在尺度和长宽比上存在巨大差异,且目标间(甚至空间不相邻的目标间)存在丰富的关系,这要求在大尺寸超高分辨率卫星影像中进行整体化的场景图生成。然而,现有缺乏大尺寸超高分辨率卫星影像的场景图生成数据集,制约了该领域的发展。由于大尺寸超高分辨率卫星影像的复杂性,从中挖掘<主体,关系,客体>三元组高度依赖长程上下文推理。因此,为小尺寸自然图像设计的场景图生成模型无法直接适用于大尺寸超高分辨率卫星影像。为解决数据集稀缺问题,本文构建了一个面向大尺寸超高分辨率卫星影像场景图生成的大规模数据集RSG,其图像尺寸范围从512 x 768到27,860 x 31,096像素,包含超过21万个目标及40余万个三元组。为实现大尺寸超高分辨率卫星影像中的场景图生成,我们提出了一种上下文感知级联认知框架,通过三个层次理解卫星影像:目标检测、对偶剪枝与关系预测。作为大尺寸卫星影像场景图生成的基础前提,我们提出了一种能灵活融合多尺度上下文的整体化多类别目标检测网络。考虑到大尺寸卫星影像中存在海量目标对但仅少数包含有效关系,我们通过对抗重构设计了配对建议生成网络以筛选高价值目标对。此外,我们提出了具备上下文感知消息传递机制的关系预测网络来预测这些目标对的关系类型。