A Generic Software Framework for Distributed Topological Analysis Pipelines

This system paper presents a software framework for the support of topological analysis pipelines in a distributed-memory model. While several recent papers introduced topology-based approaches for distributed-memory environments, these were reporting experiments obtained with tailored, mono-algorithm implementations. In contrast, we describe in this paper a general-purpose, generic framework for topological analysis pipelines, i.e. a sequence of topological algorithms interacting together, possibly on distinct numbers of processes. Specifically, we instantiated our framework with the MPI model, within the Topology ToolKit (TTK). While developing this framework, we faced several algorithmic and software engineering challenges, which we document in this paper. We provide a taxonomy for the distributed-memory topological algorithms supported by TTK, depending on their communication needs and provide examples of hybrid MPI+thread parallelizations. Detailed performance analyses show that parallel efficiencies range from $20\%$ to $80\%$ (depending on the algorithms), and that the MPI-specific preconditioning introduced by our framework induces a negligible computation time overhead. We illustrate the new distributed-memory capabilities of TTK with an example of advanced analysis pipeline, combining multiple algorithms, run on the largest publicly available dataset we have found (120 billion vertices) on a standard cluster with 64 nodes (for a total of 1,536 cores). Finally, we provide a roadmap for the completion of TTK's MPI extension, along with generic recommendations for each algorithm communication category.

翻译：本系统论文介绍了一种支持分布式内存模型下拓扑分析流程的软件框架。尽管近期多篇论文提出了面向分布式内存环境的基于拓扑的方法，但这些报告均基于定制化单算法实现所获得的实验结果。与此相反，本文描述了一个面向拓扑分析流程的通用框架——即一系列可能以不同进程数协同运行的拓扑算法序列。具体而言，我们基于MPI模型，在拓扑工具包（TTK）中实例化了该框架。在框架开发过程中，我们面临了若干算法与软件工程挑战，本文对此进行了系统记录。我们根据通信需求，提出了TTK所支持的分布式内存拓扑算法分类体系，并给出了混合MPI+线程并行化的实例。详细的性能分析表明，并行效率在20%至80%之间（取决于具体算法），而框架引入的MPI特定预处理仅造成可忽略的计算时间开销。我们通过一个高级分析流程示例展示了TTK的分布式内存新能力——该流程组合了多种算法，在标准集群（64个节点，共1536核）上对目前公开的最大数据集（1200亿顶点）进行了运行。最后，我们给出了完成TTK的MPI扩展的路线图，并为各类算法通信类别提供了通用性建议。