The Random Dot Product Graph (RDPG) is a generative model for relational data, where nodes are represented via latent vectors in low-dimensional Euclidean space. RDPGs crucially postulate that edge formation probabilities are given by the dot product of the corresponding latent positions. Accordingly, the embedding task of estimating these vectors from an observed graph is typically posed as a low-rank matrix factorization problem. The workhorse Adjacency Spectral Embedding (ASE) enjoys solid statistical properties, but it is formally solving a surrogate problem and can be computationally intensive. In this paper, we bring to bear recent advances in non-convex optimization and demonstrate their impact to RDPG inference. We advocate first-order gradient descent methods to better solve the embedding problem, and to organically accommodate broader network embedding applications of practical relevance. Notably, we argue that RDPG embeddings of directed graphs loose interpretability unless the factor matrices are constrained to have orthogonal columns. We thus develop a novel feasible optimization method in the resulting manifold. The effectiveness of the graph representation learning framework is demonstrated on reproducible experiments with both synthetic and real network data. Our open-source algorithm implementations are scalable, and unlike the ASE they are robust to missing edge data and can track slowly-varying latent positions from streaming graphs.
翻译:随机点积图(RDPG)是一种用于关系数据的生成模型,其中节点通过低维欧氏空间中的潜在向量表示。RDPG的关键假设是边形成概率由对应潜在位置的点积给出。因此,从观测图中估计这些向量的嵌入任务通常被表述为低秩矩阵分解问题。主流方法邻接谱嵌入(ASE)具有可靠的统计性质,但其本质上是求解一个替代问题,且计算成本较高。本文引入非凸优化的最新进展,并展示其对RDPG推断的影响。我们采用一阶梯度下降方法以更优地求解嵌入问题,并自然地适配实际应用中更广泛的网络嵌入场景。值得注意的是,我们认为有向图的RDPG嵌入会因因子矩阵未约束正交列而丧失可解释性,因此我们在由此产生的流形上开发了一种新颖可行的优化方法。通过在合成网络和真实网络数据上的可重复实验,验证了该图表示学习框架的有效性。我们的开源算法实现具有可扩展性,且不同于ASE,该方法对缺失边数据具有鲁棒性,并能从流式图中追踪缓慢变化的潜在位置。