GPU-initiated I/O has emerged as a key mechanism for achieving high-throughput storage access by leveraging massive GPU thread-level parallelism, while recent industry trends point toward SSDs optimized for ultra-high random-read IOPS. Together, these trends are enabling the emergence of IOPS-optimized, GPU-centric storage systems. Despite this momentum, no existing framework enables quantitative end-to-end evaluation of storage systems optimized for GPU-initiated I/O. While conventional SSD emulators provide a promising path toward end-to-end modeling in traditional storage systems, they face three key challenges in this GPU-centric setting: limited frontend scalability for ingesting massive request streams, high software overhead in emulating GPU-initiated I/O control and data paths, and excessive timing-model maintenance overhead at extremely high I/O request rates. We propose SwarmIO, an SSD emulator for massively parallel, GPU-centric storage. SwarmIO faithfully models IOPS-optimized SSDs at target performance levels of up to 40 MIOPS, achieving a 303.9x speedup over the state-of-the-art baseline SSD emulator under GPU-initiated I/O. We further demonstrate its utility through a vector search case study, showing that increasing SSD IOPS from 2.5 MIOPS to 40 MIOPS yields an average end-to-end speedup of up to 9.7x.
翻译:GPU发起的I/O已成为通过利用大规模GPU线程级并行性实现高吞吐量存储访问的关键机制,而近期行业趋势指向针对超高随机读IOPS优化的SSD。这些趋势共同推动了面向IOPS优化的以GPU为中心的存储系统的出现。然而,目前尚无现有框架能够实现对针对GPU发起I/O优化的存储系统进行定量的端到端评估。尽管传统SSD仿真器为在传统存储系统中实现端到端建模提供了有前景的途径,但在这种以GPU为中心的场景下,它们面临三个关键挑战:前端可扩展性有限,难以吸纳大规模请求流;在仿真GPU发起I/O控制路径和数据路径时软件开销过高;以及在极高的I/O请求速率下时序模型维护开销过大。我们提出SwarmIO,一种用于大规模并行、以GPU为中心的存储系统的SSD仿真器。SwarmIO在高达40MIOPS的目标性能水平下忠实建模了面向IOPS优化的SSD,在GPU发起I/O场景下,相比最先进的基线SSD仿真器实现了303.9倍的加速。我们进一步通过向量搜索案例研究展示了其实用性,表明将SSD IOPS从2.5 MIOPS提升至40 MIOPS可实现平均高达9.7倍的端到端加速。