Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, the lack of SGG datasets with large-size VHR SAI has constrained the advancement of SGG in SAI. Due to the complexity of large-size VHR SAI, mining triplets <subject, relationship, object> in large-size VHR SAI heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size VHR SAI. To address the scarcity of datasets, this paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named RSG, encompassing over 210,000 objects and more than 400,000 triplets. To realize SGG in large-size VHR SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI at three levels: object detection (OBD), pair pruning and relationship prediction. As a fundamental prerequisite for SGG in large-size SAI, a holistic multi-class object detection network (HOD-Net) that can flexibly integrate multi-scale contexts is proposed. With the consideration that there exist a huge amount of object pairs in large-size SAI but only a minority of object pairs contain meaningful relationships, we design a pair proposal generation (PPG) network via adversarial reconstruction to select high-value pairs. Furthermore, a relationship prediction network with context-aware messaging (RPCM) is proposed to predict the relationship types of these pairs.
翻译:卫星影像中的场景图生成有助于推动地理空间场景从感知到认知的智能化理解。在卫星影像中,目标在尺度和长宽比上表现出巨大差异,且目标之间(即使是空间上不相邻的目标)存在丰富的关系,这使得在大尺寸甚高分辨率卫星影像中进行整体性的场景图生成成为必要。然而,缺乏大尺寸甚高分辨率卫星影像的场景图生成数据集制约了该领域的发展。由于大尺寸甚高分辨率卫星影像的复杂性,在其中挖掘<主体,关系,客体>三元组严重依赖于长程上下文推理。因此,为小尺寸自然图像设计的场景图生成模型不能直接应用于大尺寸甚高分辨率卫星影像。为解决数据集稀缺问题,本文构建了一个用于大尺寸甚高分辨率卫星影像场景图生成的大规模数据集,图像尺寸范围从512 x 768到27,860 x 31,096像素,命名为RSG,包含超过21万个目标和超过40万个三元组。为实现大尺寸甚高分辨率卫星影像中的场景图生成,我们提出了一种上下文感知级联认知框架,在三个层次上理解卫星影像:目标检测、对剪枝和关系预测。作为大尺寸卫星影像中场景图生成的基本前提,我们提出了一种能够灵活融合多尺度上下文的整体多类目标检测网络。考虑到大尺寸卫星影像中存在海量的目标对,但仅有少数目标对包含有意义的关系,我们设计了一种通过对抗重建生成对提议的网络来筛选高价值目标对。此外,我们提出了一种具有上下文感知消息传递的关系预测网络来预测这些目标对的关系类型。