Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph as a homogeneous graph, which restricts the context-awareness of visual relations between objects. That is, they overlook the fact that the relations tend to be highly dependent on the objects with which the relations are associated. In this paper, we propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context using message passing neural networks. We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image considering the predicate type between objects. Our extensive evaluations demonstrate that HetSGG outperforms state-of-the-art methods, especially outperforming on tail predicate classes.
翻译:最近的场景图生成(SGG)框架聚焦于学习图像中多个对象间的复杂关系。得益于消息传递神经网络(MPNN)对对象及其邻近对象间高阶交互进行建模的特性,该类网络已成为SGG中主流的表征学习模块。然而,现有基于MPNN的框架将场景图视为同质图,这限制了对象间视觉关系的情境感知能力。具体而言,它们忽视了关系往往高度依赖于与之关联的对象这一事实。本文提出一种无偏异构场景图生成框架(HetSGG),通过消息传递神经网络捕获关系感知上下文。我们设计了一种新型消息传递层——关系感知消息传递神经网络(RMP),该网络能够根据对象间的谓词类型聚合图像的上下文信息。广泛评估表明,HetSGG的性能超越了当前最优方法,尤其在尾部谓词类别上表现优异。