In this work, we discuss low-parametric approaches for approximating SimRank matrices, which estimate the similarity between pairs of nodes in a graph. Although SimRank matrices and their computation require a significant amount of memory, common approaches mostly address the problem of algorithmic complexity. We propose two major formats for the economical embedding of target data. The first approach adopts a non-symmetric form that can be computed using a specialized alternating optimization algorithm. The second is based on a symmetric representation and Newton-type iterations. We propose numerical implementations for both methodologies that avoid working with dense matrices and maintain low memory consumption. Furthermore, we study both types of embeddings numerically using real data from publicly available datasets. The results show that our algorithms yield a good approximation of the SimRank matrices, both in terms of the error norm (particularly the Chebyshev norm) and in preserving the average number of the most similar elements for each given node.
翻译:本文讨论了近似计算SimRank矩阵的低参数方法,该方法用于估计图中节点对之间的相似性。尽管SimRank矩阵及其计算需要大量内存,但现有方法主要关注算法复杂性问题。我们提出了两种经济型目标数据嵌入的主要格式。第一种方法采用非对称形式,可通过专门的交替优化算法进行计算。第二种方法基于对称表示和牛顿型迭代。我们为两种方法提出了数值实现方案,避免处理稠密矩阵并保持较低的内存消耗。此外,我们使用公开数据集的真实数据对两种嵌入类型进行了数值研究。结果表明,无论是在误差范数(特别是切比雪夫范数)方面,还是在保持每个给定节点的最相似元素的平均数量方面,我们的算法都能对SimRank矩阵实现良好的近似。