Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnostic storage middleware that enables QoS control of data and metadata workflows in HPC storage systems. It adopts ideas from Software-Defined Storage, building data plane stages that mediate and rate limit POSIX requests submitted to the shared file system, and a control plane that holistically coordinates how all I/O workflows are handled. We demonstrate its performance and feasibility under multiple QoS policies using synthetic benchmarks, real-world applications, and traces collected from a production file system. Results show that PADLL can enforce complex storage QoS policies over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.
翻译:现代运行于HPC基础设施上的I/O应用日益呈现出高读取与元数据密集特征。然而,多个并发应用提交大量元数据操作极易使共享并行文件系统的元数据资源饱和,导致整体性能下降及I/O不公平性。本文提出PADLL——一种与应用及文件系统无关的存储中间件,可在HPC存储系统中实现数据与元数据工作流的服务质量(QoS)控制。该中间件借鉴软件定义存储理念,构建数据平面阶段以调节并限速提交至共享文件系统的POSIX请求,同时构建控制平面以全局协调所有I/O工作流的处理方式。我们通过合成基准测试、真实应用及从生产文件系统采集的轨迹,验证了其在多种QoS策略下的性能与可行性。结果表明,PADLL能够对并发元数据密集型作业执行复杂的存储QoS策略,确保公平性与优先级调度。