We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our ``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of vertices for writes through to main memory. It is highly parallelizable on GPUs\@. We show that, compared to a randomized baseline, the ordering produced gives improved locality of reference in sparse matrix-vector multiplication (SpMV) as well as other graph algorithms. Moreover, it can substantially speed up the conversion from a COO representation to the compressed format CSR, a very common workflow. Thus, it can give \emph{end-to-end} speedups even in SpMV\@. Unlike other lightweight approaches, this reordering does not rely on explicitly knowing the degrees of the vertices, and indeed its runtime is comparable to that of computing degrees. Instead, it uses the structure and edge distribution inherent in the input edge list, making it a candidate for default use in a pragmatic graph creation pipeline. This algorithm is suitable for road-type networks as well as scale-free. It improves cache locality on both CPUs and GPUs, achieving hit rates similar to the heavyweight techniques (e.g., for SpMV, 7--52\% and 11--67\% in the L1 and L2 caches, respectively). Compared to randomly labeled graphs, BOBA-reordered graphs achieve end-to-end speedups of up to 3.45. The reordering time is approximately one order of magnitude faster than existing lightweight techniques and up to 2.5 orders of magnitude faster than heavyweight techniques.
翻译:我们提出了一种简单且支持并行的轻量级COO图(边列表)重排序算法。该“批量按边附着排序”(BOBA)算法在读操作方面与边数呈线性关系,在写操作方面与顶点数呈线性关系(针对主存),且易于在GPU上高度并行化。我们证明,与随机基线相比,该算法产生的排序改进了稀疏矩阵-向量乘法(SpMV)及其他图算法中的引用局部性。此外,它还能显著加速从COO表示到压缩格式CSR的转换(一种常见工作流),因此即使在SpMV中也能实现端到端加速。不同于其他轻量级方法,该重排序不依赖明确已知的顶点度数,其运行时性能与计算度数相当。相反,它利用输入边列表固有的结构和边分布特性,使其成为实用图创建流程中默认使用的一个候选方案。该算法适用于道路网络和无标度网络,可在CPU和GPU上改善缓存局部性,达到与重重量级技术相似(例如,对于SpMV,L1和L2缓存命中率分别提高7%-52%和11%-67%)的缓存命中率。与随机标记的图相比,经BOBA重排序的图可实现高达3.45倍的端到端加速比。其重排序时间比现有轻量级技术快约一个数量级,比重重量级技术快至多两个半数量级。