The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving other problems. While it has been intensively studied in the early days of parallel computing, few things happened in the last 20 years. In particular, there is little work on scaling list ranking to large machines and input sizes. We reconsider list ranking starting from the ground-breaking results of Sibeyn a quarter century ago. We employ algorithm and performance engineering to improve his sparse ruling-set algorithm, making it capable of scaling to many processors, and provide a more detailed analysis of the impact of the algorithm's parameters, further guiding our practical implementation. We perform an extensive experimental study across a variety of input instances with different structural properties. We demonstrate that indirect communication, exploiting input locality, and message coalescing allows scaling to billions of elements on up to 24,576 cores.
翻译:列表排序问题是并行计算中的经典问题之一,其算法设计具有非平凡性,且作为解决其他问题的子程序具有广泛应用。尽管该问题在并行计算早期已得到深入研究,但近二十年鲜有进展。尤其缺乏将列表排序扩展到大规模机器和输入规模的相关研究。我们重新审视了二十五年前Sibeyn的开创性成果,通过算法与性能工程优化其稀疏统治集算法,使其具备扩展到多处理器的能力,并对算法参数影响进行了更详细的分析,进一步指导实际实现。我们在具有不同结构特性的多种输入实例上开展了广泛的实验研究,证明间接通信、输入局部性利用和消息合并技术能够实现多达24,576个核心上数十亿元素的规模扩展。