Graph Sparsification via Mixture of Graphs

Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely on a single global sparsity setting and uniform pruning criteria, failing to provide customized sparsification schemes for each node's complex local context. In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node. Specifically, MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each node. Subsequently, MoG performs a mixture of the sparse graphs produced by different experts on the Grassmann manifold to derive an optimal sparse graph. One notable property of MoG is its entirely local nature, as it depends on the specific circumstances of each individual node. Extensive experiments on four large-scale OGB datasets and two superpixel datasets, equipped with five GNN backbones, demonstrate that MoG (I) identifies subgraphs at higher sparsity levels ($8.67\%\sim 50.85\%$), with performance equal to or better than the dense graph, (II) achieves $1.47-2.62\times$ speedup in GNN inference with negligible performance drop, and (III) boosts ``top-student'' GNN performance ($1.02\%\uparrow$ on RevGNN+\textsc{ogbn-proteins} and $1.74\%\uparrow$ on DeeperGCN+\textsc{ogbg-ppa}).

翻译：图神经网络（GNNs）在各种图学习任务中展现出卓越性能，但在应用于大规模图时面临显著的计算挑战。图稀疏化是缓解这些挑战的有效方法，其通过移除非关键边来降低计算开销。然而，现有的图稀疏化方法通常依赖单一的全局稀疏度设置和统一的剪枝准则，无法为每个节点复杂的局部上下文提供定制化的稀疏方案。本文引入基于专家混合（MoE）思想的图混合（Mixture-of-Graphs, MoG）方法，为每个节点动态选择定制化的剪枝方案。具体而言，MoG包含多个具有不同稀疏度水平和剪枝准则的稀疏化专家，并为每个节点选择合适的专家。随后，MoG在格拉斯曼流形上对不同专家生成的稀疏图进行混合，从而得到最优稀疏图。MoG的一个显著特性是其完全局部性，即其决策依赖于每个节点的具体情境。在四个大规模OGB数据集和两个超像素数据集上，结合五种GNN骨干网络进行的广泛实验表明，MoG能够：（I）在更高稀疏度水平（$8.67\%\sim 50.85\%$）下识别子图，其性能与稠密图相当或更优；（II）以可忽略的性能损失实现GNN推理速度$1.47-2.62$倍的提升；（III）提升“顶尖”GNN模型的性能（在RevGNN+\textsc{ogbn-proteins}上提升$1.02\%\uparrow$，在DeeperGCN+\textsc{ogbg-ppa}上提升$1.74\%\uparrow$）。