Over the years, many frameworks and optimization techniques have been proposed to accelerate graph neural networks (GNNs). Compared to the optimizations explored in these systems, we observe that different matrix re-associations of GNN computations lead to novel input-sensitive performance behavior. We leverage this observation to propose SENSEi, a system that exposes different sparse and dense matrix primitive compositions based on different matrix re-associations of GNN computations and selects the best among them based on input attributes. SENSEi executes in two stages: (1) an offline compilation stage that enumerates all valid re-associations leading to different sparse-dense matrix compositions and uses input-oblivious pruning techniques to prune away clearly unprofitable candidates and (2) an online runtime system that explores the remaining candidates and uses light-weight cost models to select the best re-association based on the input graph and the embedding sizes on a given hardware platform. On a wide range of configurations, SENSEi achieves speedups of up to $2.012\times$ and $1.85\times$ on graph convolutional networks and up to $6.294\times$ and $16.274\times$ on graph attention networks, on GPUs and CPUs respectively. We also show that its technique generalizes to GNN variants, including those that require sampling. Furthermore, we show that SENSEi's techniques are agnostic to the underlying GNN system, and can be used to yield synergistic improvements across a diverse set of implementations.
翻译:多年来,研究者已提出众多框架与优化技术用于加速图神经网络(GNN)。与现有系统所探索的优化手段相比,我们发现GNN计算中不同的矩阵重关联方式会产生新颖的输入敏感性能表现。基于这一观察,我们提出SENSEi系统——通过暴露基于不同矩阵重关联的稀疏-稠密矩阵原语组合,并根据输入属性选择最优方案。SENSEi分两阶段执行:(1)离线编译阶段枚举所有有效重关联(生成不同稀疏-稠密矩阵组合),并采用输入无关剪枝技术剔除明显低效候选方案;(2)在线运行时系统探索剩余候选方案,通过轻量级成本模型基于输入图结构、嵌入维度及特定硬件平台选择最优重关联方式。在广泛配置测试中,SENSEi在GPU和CPU上分别实现:图卷积网络加速比达$2.012\times$和$1.85\times$,图注意力网络加速比达$6.294\times$和$16.274\times$。实验表明该技术可泛化至包括需要采样的GNN变体。此外,我们证明SENSEi的技术对底层GNN系统无关,可通过多样化实现产生协同性能提升。