String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least) proportional to the number of processors $p$ or communicate the data a large number of times (at least logarithmic). We present practical and efficient algorithms for distributed-memory string sorting that scale to large $p$. Similar to state-of-the-art sorters for atomic objects, the algorithms have latency of about $p^{1/k}$ when allowing the data to be communicated $k$ times. Experiments indicate good scaling behavior on a wide range of inputs on up to 49152 cores. Overall, we achieve speedups of up to 5 over the current state-of-the-art distributed string sorting algorithms.
翻译:字符串排序是构建索引数据结构等任务中的重要组成部分。然而,当前的字符串排序算法无法扩展到大规模并行分布式内存机器,因为它们要么存在至少与处理器数量$p$成正比的延迟,要么需要大量次数的数据通信(至少为对数次)。我们提出了适用于分布式内存的实用且高效的字符串排序算法,这些算法可扩展到大规模的$p$。与面向原子对象的最先进排序器类似,当允许数据通信$k$次时,这些算法的延迟约为$p^{1/k}$。实验表明,在多达49152个核心上的各类输入条件下,算法均展现出良好的可扩展性。总体而言,与当前最先进的分布式字符串排序算法相比,我们实现了高达5倍的加速比。