The persistence diagram, which describes the topological features of a dataset, is a key descriptor in Topological Data Analysis. The "Discrete Morse Sandwich" (DMS) method has been reported to be the most efficient algorithm for computing persistence diagrams of 3D scalar fields on a single node, using shared-memory parallelism. In this work, we extend DMS to distributed-memory parallelism for the efficient and scalable computation of persistence diagrams for massive datasets across multiple compute nodes. On the one hand, we can leverage the embarrassingly parallel procedure of the first and most time-consuming step of DMS (namely the discrete gradient computation). On the other hand, the efficient distributed computations of the subsequent DMS steps are much more challenging. To address this, we have extensively revised the DMS routines by contributing a new self-correcting distributed pairing algorithm, redesigning key data structures and introducing computation tokens to coordinate distributed computations. We have also introduced a dedicated communication thread to overlap communication and computation. Detailed performance analyses show the scalability of our hybrid MPI+thread approach for strong and weak scaling using up to 16 nodes of 32 cores (512 cores total). Our algorithm outperforms DIPHA, a reference method for the distributed computation of persistence diagrams, with an average speedup of x8 on 512 cores. We show the practical capabilities of our approach by computing the persistence diagram of a public 3D scalar field of 6 billion vertices in 174 seconds on 512 cores. Finally, we provide a usage example of our open-source implementation at https://github.com/eve-le-guillou/DDMS-example.
翻译:持久性图作为描述数据集拓扑特征的关键描述符,是拓扑数据分析中的核心工具。已有研究表明,“离散莫尔斯三明治”(DMS)方法是单节点上利用共享内存并行性计算三维标量场持久性图的最有效算法。本研究将DMS扩展至分布式内存并行架构,实现了跨多计算节点的海量数据集持久性图高效可扩展计算。一方面,我们可以充分利用DMS第一步(即离散梯度计算)这一天然并行且最耗时的过程;另一方面,后续DMS步骤的高效分布式计算则面临更大挑战。为此,我们通过以下方式对DMS流程进行了全面重构:提出新型自校正分布式配对算法,重新设计关键数据结构,引入计算令牌协调分布式计算过程,并增设专用通信线程以实现通信与计算重叠。详细的性能分析表明,我们的混合MPI+线程方法在强扩展与弱扩展测试中展现出良好可扩展性,测试规模达16节点×32核心(共512核心)。本算法在512核心上平均加速比达到8倍,显著优于分布式持久性图计算的基准方法DIPHA。我们通过实际案例展示了方法的实用能力:在512核心上仅用174秒即完成了包含60亿顶点的公开三维标量场持久性图计算。最后,我们在https://github.com/eve-le-guillou/DDMS-example 提供了开源实现的使用示例。