Disaggregated storage systems improve resource utilization and enable independent scaling of storage and compute resources by separating storage resources from computing resources in data centers. NVMe over fabrics (NVMeoF) is a key technology that underpins the functionality and benefits of disaggregated storage systems. While NVMeoF inherently possesses substantial computing and memory capacity, these resources are often underutilized for tasks beyond simple I/O delegation. This study proposes OffloadFS, a user-level file system that enables offloaded IO-intensive tasks primarily to a disaggregated storage node for near-data processing, with the option to offload to peer compute nodes as well, without the need for distributed lock management. OffloadFS optimizes cache management by reducing interference between threads performing distinct I/O operations. On top of OffloadFS, we develop OffloadDB, which enables RocksDB to offload MemTable flush and compaction operations, and OffloadPrep, which offloads image pre-processing tasks for machine learning to disaggregated storage nodes. Our evaluation shows that OffloadFS improves the performance of RocksDB and machine learning pre-processing tasks by up to 3.36x and 1.85x, respectively, compared to OCFS2.
翻译:解耦存储系统通过将存储资源与计算资源在数据中心中分离,提升了资源利用率并支持存储与计算资源的独立扩展。NVMe over Fabrics (NVMeoF) 是支撑解耦存储系统功能与优势的关键技术。尽管NVMeoF本身具备强大的计算和内存能力,但这些资源在除简单I/O委派之外的任务中常被低效利用。本研究提出OffloadFS,一种用户级文件系统,主要将I/O密集型任务卸载至解耦存储节点以进行近数据处理,同时也可选择卸载至对等计算节点,且无需分布式锁管理。OffloadFS通过减少执行不同I/O操作的线程间的干扰,优化了缓存管理。基于OffloadFS,我们开发了OffloadDB,使RocksDB能够卸载MemTable刷新和压缩操作;以及OffloadPrep,将机器学习图像预处理任务卸载至解耦存储节点。评估结果表明,与OCFS2相比,OffloadFS分别将RocksDB和机器学习预处理任务的性能提升至多3.36倍和1.85倍。