Driven by scientific and industry ambition, HPC and AI applications such as operational Numerical Weather Prediction (NWP) require processing and storing ever-increasing data volumes as fast as possible. Whilst POSIX distributed file systems and NVMe SSDs are currently a common HPC storage configuration providing I/O to applications, new storage solutions have proliferated or gained traction over the last decade with potential to address performance limitations POSIX file systems manifest at scale for certain I/O workloads. This work has primarily aimed to assess the suitability and performance of two object storage systems -namely DAOS and Ceph- for the ECMWF's operational NWP as well as for HPC and AI applications in general. New software-level adapters have been developed which enable the ECMWF's NWP to leverage these systems, and extensive I/O benchmarking has been conducted on a few computer systems, comparing the performance delivered by the evaluated object stores to that of equivalent Lustre file system deployments on the same hardware. Challenges of porting to object storage and its benefits with respect to the traditional POSIX I/O approach have been discussed and, where possible, domain-agnostic performance analysis has been conducted, leading to insight also of relevance to I/O practitioners and the broader HPC community. DAOS and Ceph have both demonstrated excellent performance, but DAOS stood out relative to Ceph and Lustre, providing superior scalability and flexibility for applications to perform I/O at scale as desired. This sets a promising outlook for DAOS and object storage, which might see greater adoption at HPC centres in the years to come, although not necessarily implying a shift away from POSIX-like I/O.
翻译:在科学与工业目标的驱动下,诸如业务化数值天气预报(NWP)等高性能计算(HPC)与人工智能(AI)应用,需要尽可能快地处理和存储不断增长的数据量。虽然目前POSIX分布式文件系统与NVMe固态硬盘(SSD)是常见的HPC存储配置,为应用程序提供I/O服务,但在过去十年中,新的存储解决方案已大量涌现或获得关注,这些方案有潜力解决POSIX文件系统在特定I/O工作负载下大规模运行时表现出的性能局限。本研究的主要目标是评估两种对象存储系统——即DAOS与Ceph——对于欧洲中期天气预报中心(ECMWF)业务化NWP以及更广泛的HPC与AI应用的适用性与性能。我们开发了新的软件级适配器,使ECMWF的NWP能够利用这些系统,并在多个计算机系统上进行了广泛的I/O基准测试,将所评估的对象存储系统与相同硬件上部署的等效Lustre文件系统的性能进行了比较。本文讨论了向对象存储移植的挑战及其相对于传统POSIX I/O方法的优势,并在可能的情况下进行了与领域无关的性能分析,所得见解对I/O实践者及更广泛的HPC社区也具有参考价值。DAOS与Ceph均表现出优异的性能,但DAOS相对于Ceph和Lustre更为突出,为应用程序按需进行大规模I/O提供了卓越的可扩展性和灵活性。这为DAOS和对象存储描绘了广阔的前景,未来几年它们可能在HPC中心得到更广泛的采用,尽管这并不一定意味着要完全放弃类POSIX的I/O方式。