Shuffling is the process of rearranging a sequence of elements into a random order such that any permutation occurs with equal probability. It is an important building block in a plethora of techniques used in virtually all scientific areas. Consequently considerable work has been devoted to the design and implementation of shuffling algorithms. We engineer, -- to the best of our knowledge -- for the first time, a practically fast, parallel shuffling algorithm with $\Oh{\sqrt{n}\log n}$ parallel depth that requires only poly-logarithmic auxiliary memory. Our reference implementations in Rust are freely available, easy to include in other projects, and can process large data sets approaching the size of the system's memory. In an empirical evaluation, we compare our implementations with a number of existing solutions on various computer architectures. Our algorithms consistently achieve the highest through-put on all machines. Further, we demonstrate that the runtime of our parallel algorithm is comparable to the time that other algorithms may take to acquire the memory from the operating system to copy the input.
翻译:洗牌是将序列元素重新排列成随机顺序,使得每种排列以等概率出现的过程。它是几乎所有科学领域中大量技术的重要构建基础。因此,已有大量工作致力于洗牌算法的设计与实现。据我们所知,我们首次工程化实现了一种实际快速的并行洗牌算法,其并行深度为 $\Oh{\sqrt{n}\log n}$,仅需多对数级别的辅助内存。我们的Rust参考实现免费提供,易于集成到其他项目中,并能处理接近系统内存大小的庞大数据集。在实证评估中,我们将该实现与多种现有方案在不同计算机架构上进行了比较。我们的算法在所有机器上均稳定实现最高吞吐量。此外,我们证明该并行算法的运行时间与其他算法从操作系统获取内存以复制输入所需的时间相当。