Parallel I/O refers to the ability of scientific programs to concurrently read/write from/to a single file from multiple processes executing on distributed memory platforms like compute clusters. In the HPC world, I/O becomes a significant bottleneck for many real-world scientific applications. In the last two decades, there has been significant research in improving the performance of I/O operations in scientific computing for traditional languages including C, C++, and Fortran. As a result of this, several mature and high-performance libraries including ROMIO (implementation of MPI-IO), parallel HDF5, Parallel I/O (PIO), and parallel netCDF are available today that provide efficient I/O for scientific applications. However, there is very little research done to evaluate and improve I/O performance of Java-based HPC applications. The main hindrance in the development of efficient parallel I/O Java libraries is the lack of a standard API (something equivalent to MPI-IO). Some adhoc solutions have been developed and used in proprietary applications, but there is no general-purpose solution that can be used by performance hungry applications. As part of this project, we plan to develop a Java-based parallel I/O API inspired by the MPI-IO bindings (MPI 2.0 standard document) for C, C++, and Fortran. Once the Java equivalent API of MPI-IO has been developed, we will develop a reference implementation on top of existing Java messaging libraries. Later, we will evaluate and compare performance of our reference Java Parallel I/O library with C/C++ counterparts using benchmarks and real-world applications.
翻译:并行I/O是指科学程序能够在分布式内存平台(如计算集群)上,从多个并发执行的进程同时对单个文件进行读/写操作的能力。在高性能计算(HPC)领域,I/O成为许多实际科学应用中的重要瓶颈。在过去二十年中,针对传统语言(包括C、C++和Fortran)在科学计算中的I/O性能提升,已有大量研究。因此,目前已有多个成熟且高性能的库,如ROMIO(MPI-IO的实现)、并行HDF5、并行I/O(PIO)和并行netCDF,为科学应用提供高效I/O。然而,针对基于Java的HPC应用的I/O性能评估与优化研究却非常有限。开发高效并行I/O Java库的主要障碍在于缺乏标准API(类似于MPI-IO)。虽然已有一些临时解决方案被开发并用于专有应用,但尚无适用于对性能有苛刻需求的通用解决方案。作为本项目的一部分,我们计划开发一个受C、C++和Fortran的MPI-IO绑定(MPI 2.0标准文档)启发的基于Java的并行I/O API。一旦开发出MPI-IO的等价Java API,我们将基于现有的Java消息传递库构建参考实现。随后,我们将利用基准测试和实际应用,将我们的参考Java并行I/O库与C/C++对应库进行性能评估与对比。