Demystifying Graph Sparsification Algorithms in Graph Properties Preservation

Graph sparsification is a technique that approximates a given graph by a sparse graph with a subset of vertices and/or edges. The goal of an effective sparsification algorithm is to maintain specific graph properties relevant to the downstream task while minimizing the graph's size. Graph algorithms often suffer from long execution time due to the irregularity and the large real-world graph size. Graph sparsification can be applied to greatly reduce the run time of graph algorithms by substituting the full graph with a much smaller sparsified graph, without significantly degrading the output quality. However, the interaction between numerous sparsifiers and graph properties is not widely explored, and the potential of graph sparsification is not fully understood. In this work, we cover 16 widely-used graph metrics, 12 representative graph sparsification algorithms, and 14 real-world input graphs spanning various categories, exhibiting diverse characteristics, sizes, and densities. We developed a framework to extensively assess the performance of these sparsification algorithms against graph metrics, and provide insights to the results. Our study shows that there is no one sparsifier that performs the best in preserving all graph properties, e.g. sparsifiers that preserve distance-related graph properties (eccentricity) struggle to perform well on Graph Neural Networks (GNN). This paper presents a comprehensive experimental study evaluating the performance of sparsification algorithms in preserving essential graph metrics. The insights inform future research in incorporating matching graph sparsification to graph algorithms to maximize benefits while minimizing quality degradation. Furthermore, we provide a framework to facilitate the future evaluation of evolving sparsification algorithms, graph metrics, and ever-growing graph data.

翻译：图稀疏化是一种通过保留顶点和/或边子集来近似原始图的技术。高效稀疏化算法的目标是在最小化图规模的同时，维护与下游任务相关的特定图属性。受限于不规则性和实际图数据的庞大规模，图算法常面临执行时间过长的问题。通过将完整图替换为规模小得多的稀疏化图，图稀疏化可在不显著降低输出质量的前提下大幅缩短图算法运行时间。然而，众多稀疏化器与图属性之间的相互作用尚未得到广泛探索，图稀疏化的潜力也未得到充分理解。本研究涵盖16种广泛使用的图度量指标、12种代表性图稀疏化算法，以及14个跨越不同类别、呈现多样化特征、规模和密度的真实输入图。我们开发了一个框架，系统评估这些稀疏化算法对图度量指标的保持性能，并对结果进行深入分析。研究表明，不存在一种在所有图属性保持中表现最佳的通用稀疏化器——例如，能有效保持距离相关图属性（离心率）的稀疏化器，在图神经网络（GNN）上表现欠佳。本文通过全面的实验研究，系统评估了稀疏化算法在保持关键图度量指标方面的性能。研究结果为未来在图中匹配稀疏化算法与图算法以最大化效益并最小化质量退化提供了指导。此外，我们提供了一个框架，便于未来评估不断发展的稀疏化算法、图度量指标及持续增长的图数据。