Parallel input performance issues are often neglected in large scale parallel applications in Computational Science and Engineering. Traditionally, there has been less focus on input performance because either input sizes are small (as in biomolecular simulations) or the time doing input is insignificant compared with the simulation with many timesteps. But newer applications, such as graph algorithms add a premium to file input performance. Additionally, over-decomposed systems, such as Charm++/AMPI, present new challenges in this context in comparison to MPI applications. In the over-decomposition model, naive parallel I/O in which every task makes its own I/O request is impractical. Furthermore, load balancing supported by models such as Charm++/AMPI precludes assumption of data contiguity on individual nodes. We develop a new I/O abstraction to address these issues by separating the decomposition of consumers of input data from that of file-reader tasks that interact with the file system. This enables applications to scale the number of consumers of data without impacting I/O behavior or performance. These ideas are implemented in a new input library, CkIO, that is built on Charm++, which is a well-known task-based and overdecomposed-partitions system. CkIO is configurable via multiple parameters (such as the number of file readers and/or their placement) that can be tuned depending on characteristics of the application, such as file size and number of application objects. Additionally, CkIO input allows for capabilities such as effective overlap of input and application-level computation, as well as load balancing and migration. We describe the relevant challenges in understanding file system behavior and architecture, the design alternatives being explored, and preliminary performance data.
翻译:在计算科学与工程领域的大规模并行应用中,并行输入性能问题常被忽视。传统上,由于输入数据规模较小(如生物分子模拟),或输入时间相对于多时间步长的模拟过程可忽略不计,输入性能问题较少受到关注。然而,新型应用(如图算法)对文件输入性能提出了更高要求。此外,与MPI应用相比,过分解系统(如Charm++/AMPI)在此背景下带来了新的挑战。在过分解模型中,每个任务独立发起I/O请求的简单并行I/O方案并不可行。同时,Charm++/AMPI等模型支持的负载均衡机制,使得单个节点上的数据连续性假设不再成立。为此,我们提出了一种新型I/O抽象方案,通过将输入数据消费者任务的分解与文件系统交互的文件读取器任务解耦,以应对上述挑战。该方案使得应用能够在不影响I/O行为或性能的前提下,灵活扩展数据消费者的数量。基于这些理念,我们在知名任务型过分解分区系统Charm++上构建了新型输入库CkIO。CkIO支持通过多参数(如文件读取器数量及其部署策略)进行配置,这些参数可根据应用特征(如文件大小和应用对象数量)进行调优。此外,CkIO还具备输入与应用级计算的有效重叠、负载均衡与迁移等能力。本文阐述了理解文件系统行为与架构时面临的关键挑战,探讨了多种设计方案,并提供了初步性能数据。