For analysing real-world networks, graph representation learning is a popular tool. These methods, such as a graph autoencoder (GAE), typically rely on low-dimensional representations, also called embeddings, which are obtained through minimising a loss function; these embeddings are used with a decoder for downstream tasks such as node classification and edge prediction. While GAEs tend to be fairly accurate, they suffer from scalability issues. For improved speed, a Local2Global approach, which combines graph patch embeddings based on eigenvector synchronisation, was shown to be fast and achieve good accuracy. Here we propose L2G2G, a Local2Global method which improves GAE accuracy without sacrificing scalability. This improvement is achieved by dynamically synchronising the latent node representations, while training the GAEs. It also benefits from the decoder computing an only local patch loss. Hence, aligning the local embeddings in each epoch utilises more information from the graph than a single post-training alignment does, while maintaining scalability. We illustrate on synthetic benchmarks, as well as real-world examples, that L2G2G achieves higher accuracy than the standard Local2Global approach and scales efficiently on the larger data sets. We find that for large and dense networks, it even outperforms the slow, but assumed more accurate, GAEs.
翻译:在分析现实世界网络时,图表示学习是一种常用工具。图自编码器(GAE)等方法通常依赖通过最小化损失函数获得的低维表示(也称嵌入),这些嵌入与解码器结合用于节点分类和边预测等下游任务。尽管GAE具有较高的准确性,但其可扩展性较差。为提升速度,基于特征向量同步的图块嵌入组合方法Local2Global已被证明能实现快速计算且保持良好精度。本文提出L2G2G——一种在不牺牲可扩展性的前提下提升GAE准确率的Local2Global方法。该改进通过在训练GAE时动态同步潜在节点表征实现,同时利用仅计算局部块损失的解码器。因此,每个训练周期中对局部嵌入的对齐比单次训练后对齐能利用更多图结构信息,同时保持可扩展性。我们在合成基准数据集和实际案例上的实验表明:L2G2G比标准Local2Global方法具有更高准确率,且能在大规模数据集上高效扩展。对于大规模密集网络,其表现甚至优于速度较慢但被认为更准确的GAE方法。