Most existing graph visualization methods based on dimension reduction are limited to relatively small graphs due to performance issues. In this work, we propose a novel dimension reduction method for graph visualization, called t-Distributed Stochastic Graph Neighbor Embedding (t-SGNE). t-SGNE is specifically designed to visualize cluster structures in the graph. As a variant of the standard t-SNE method, t-SGNE avoids the time-consuming computations of pairwise similarity. Instead, it uses the neighbor structures of the graph to reduce the time complexity from quadratic to linear, thus supporting larger graphs. In addition, to suit t-SGNE, we combined Laplacian Eigenmaps with the shortest path algorithm in graphs to form the graph embedding algorithm ShortestPath Laplacian Eigenmaps Embedding (SPLEE). Performing SPLEE to obtain a high-dimensional embedding of the large-scale graph and then using t-SGNE to reduce its dimension for visualization, we are able to visualize graphs with up to 300K nodes and 1M edges within 5 minutes and achieve approximately 10% improvement in visualization quality. Codes and data are available at https://github.com/Charlie-XIAO/embedding-visualization-test.
翻译:现有基于降维的图可视化方法大多因性能问题局限于相对较小的图。本文提出一种用于图可视化的新型降维方法——t分布随机图邻域嵌入(t-SGNE)。t-SGNE专门设计用于可视化图中的聚类结构。作为标准t-SNE方法的一种变体,t-SGNE避免了耗时的成对相似度计算,转而利用图的邻域结构将时间复杂度从二次降至线性,从而支持更大规模的图。此外,为适配t-SGNE,我们将拉普拉斯特征映射与图中的最短路径算法相结合,形成图嵌入算法SPLEE(最短路径拉普拉斯特征映射嵌入)。通过SPLEE获取大规模图的高维嵌入,再使用t-SGNE进行降维可视化,我们能在5分钟内处理包含30万个节点和100万条边的图,并将可视化质量提升约10%。代码与数据见https://github.com/Charlie-XIAO/embedding-visualization-test。