Graphs are ubiquitous for modeling complex systems involving structured data and relationships. Consequently, graph representation learning, which aims to automatically learn low-dimensional representations of graphs, has drawn a lot of attention in recent years. The overwhelming majority of existing methods handle unsigned graphs. However, signed graphs appear in an increasing number of application domains to model systems involving two types of opposed relationships. Several authors took an interest in signed graphs and proposed methods for providing vertex-level representations, but only one exists for whole-graph representations, and it can handle only fully connected graphs. In this article, we tackle this issue by proposing two approaches to learning whole-graph representations of general signed graphs. The first is a SG2V, a signed generalization of the whole-graph embedding method Graph2vec that relies on a modification of the Weisfeiler--Lehman relabelling procedure. The second one is WSGCN, a whole-graph generalization of the signed vertex embedding method SGCN that relies on the introduction of master nodes into the GCN. We propose several variants of both these approaches. A bottleneck in the development of whole-graph-oriented methods is the lack of data. We constitute a benchmark composed of three collections of signed graphs with corresponding ground truths. We assess our methods on this benchmark, and our results show that the signed whole-graph methods learn better representations for this task. Overall, the baseline obtains an F-measure score of 58.57, when SG2V and WSGCN reach 73.01 and 81.20, respectively. Our source code and benchmark dataset are both publicly available online.
翻译:图在建模涉及结构化数据和关系的复杂系统中无处不在。因此,旨在自动学习图的低维表示的图表示学习近年来引起了广泛关注。绝大多数现有方法处理的是无符号图。然而,符号图在越来越多的应用领域中出现,用于建模涉及两种对立关系的系统。一些研究者对符号图产生了兴趣,并提出了提供顶点级表示的方法,但仅有一种方法适用于全图表示,且其只能处理全连通图。在本文中,我们通过提出两种学习通用符号图的全图表示的方法来解决这一问题。第一种是SG2V,它是全图嵌入方法Graph2vec的符号泛化,依赖于对Weisfeiler-Lehman重标记过程的修改。第二种是WSGCN,它是符号顶点嵌入方法SGCN的全图泛化,依赖于在GCN中引入主节点。我们为这两种方法提出了若干变体。全图导向方法发展的一个瓶颈是数据的缺乏。我们构建了一个由三个符号图集合及其对应真实标签组成的基准数据集。我们在此基准上评估了我们的方法,结果表明符号全图方法为此任务学习了更好的表示。总体而言,基线方法获得的F-measure分数为58.57,而SG2V和WSGCN分别达到了73.01和81.20。我们的源代码和基准数据集均已公开在线提供。