Context. A novel high-performance exact pair counting toolkit called Fast Correlation Function Calculator (FCFC) is presented, which is publicly available at https://github.com/cheng-zhao/FCFC. Aims. As the rapid growth of modern cosmological datasets, the evaluation of correlation functions with observational and simulation catalogues has become a challenge. High-efficiency pair counting codes are thus in great demand. Methods. We introduce different data structures and algorithms that can be used for pair counting problems, and perform comprehensive benchmarks to identify the most efficient ones for real-world cosmological applications. We then describe the three levels of parallelisms used by FCFC -- including SIMD, OpenMP, and MPI -- and run extensive tests to investigate the scalabilities. Finally, we compare the efficiency of FCFC against alternative pair counting codes. Results. The data structures and histogram update algorithms implemented in FCFC are shown to outperform alternative methods. FCFC does not benefit much from SIMD as the bottleneck of our histogram update algorithm is mostly cache latency. Nevertheless, the efficiency of FCFC scales well with the numbers of OpenMP threads and MPI processes, albeit the speedups may be degraded with over a few thousand threads in total. FCFC is found to be faster than most (if not all) other public pair counting codes for modern cosmological pair counting applications.
翻译:上下文。本文介绍了一种名为快速相关函数计算器(FCFC)的新型高性能精确配对计数工具包,该工具包公开发布于https://github.com/cheng-zhao/FCFC。目的。随着现代宇宙学数据集的快速增长,利用观测和模拟目录评估相关函数已成为一项挑战,因此对高效配对计数代码的需求十分迫切。方法。我们介绍了可用于配对计数问题的不同数据结构与算法,并通过全面基准测试识别出最适用于实际宇宙学应用的高效方案。随后描述了FCFC采用的三个层次并行技术——包括SIMD、OpenMP和MPI,并通过广泛测试研究了其可扩展性。最后,将FCFC的效率与其他配对计数代码进行了对比。结果。FCFC中实现的数据结构和直方图更新算法优于其他方法。由于直方图更新算法的瓶颈主要在于缓存延迟,SIMD对FCFC的提升作用有限。尽管如此,FCFC的效率随OpenMP线程数和MPI进程数的增加而具有良好的可扩展性,尽管总线程数超过数千时加速比可能下降。FCFC在现代宇宙学配对计数应用中比大多数(若非全部)其他公开配对计数代码更快。