Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a \emph{key} per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily extend to two related problems, \emph{histogram} and \emph{collect-reduce}. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. We also test two important applications with real-world data, and show that our algorithms improve the performance over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.
翻译:半排序是一种基础算法原语,广泛应用于高效并行算法的设计与分析。该算法以记录数组和提取每条记录*键值*的函数为输入,通过重新排序使得具有相同键值的记录连续排列。由于许多应用仅需收集相等值而无需完全排序输入,半排序在字符串算法、图分析与几何处理等众多领域具有广泛适用性。然而,尽管近几十篇论文在理论分析中使用半排序,且存在渐近最优的并行半排序算法,但由于现有半排序实现存在潜在性能问题,大多数并行算法在实际中仍通过比较排序或整数排序来实现半排序。本文重新审视半排序问题,旨在实现具有灵活接口的高性能并行半排序方案。我们的方法可轻松扩展到两个相关问题:*直方图*与*收集归约*。我们的算法在实践中实现了强加速,重要的是,在测试的几乎所有场景(包括不同输入规模、分布和键值类型)中,其性能均优于现有最先进的并行排序与半排序方法。我们还使用真实世界数据测试了两个重要应用,结果表明我们的算法相较现有方法提升了性能。我们相信,许多其他并行算法实现均可利用我们的结果进行加速。