Log-Structured Merge-tree-based Key-Value Stores (LSM-KVS) have been optimized and redesigned for disaggregated storage via techniques such as compaction offloading to reduce the network I/Os between compute and storage. However, the constrained memory space and slow flush at the compute node severely limit the overall write throughput of existing optimizations. In this paper, we propose O3-LSM, a fundamental new LSM-KVS architecture, that leverages the shared Disaggregated Memory (DM) to support a three-layer offloading, i.e., memtable Offloading, flush Offloading, and the existing compaction Offloading. Compared to the existing disaggregated LSM-KVS with compaction offloading only, O3-LSM maximizes the write performance by addressing the above issues. O3-LSM first leverages a novel DM-Optimized Memtable to achieve dynamic memtable offloading, which extends the write buffer while enabling fast, asynchronous, and parallel memtable transmission. Second, we propose Collaborative Flush Offloading that decouples the flush control plane from execution and supports memtable flush offloading at any node with dedicated scheduling and global optimizations. Third, O3-LSM is further improved with the Shard-Level Optimization, which partitions the memtable into shards based on disjoint key-ranges that can be transferred and flushed independently, unlocking parallelism across shards. Besides, to mitigate slow lookups in the disaggregated setting, O3-LSM also employs an adaptive Cache-Enhanced Read Delegation mechanism to combine a compact local cache with DM-assisted memtable delegated read. Our evaluation shows that O3-LSM achieves up to 4.5X write, 5.2X range query, and 1.8X point lookup throughput improvement, and up to 76% P99 latency reduction compared with Disaggregated-RocksDB, CaaS-LSM, and Nova-LSM.
翻译:基于日志结构合并树的键值存储(LSM-KVS)已通过诸如将压实操作卸载至存储节点等技术进行优化和重新设计,以减少计算节点与存储节点间的网络I/O。然而,计算节点有限的内存空间和缓慢的刷写操作严重制约了现有优化方案的整体写入吞吐量。本文提出O^3-LSM,一种全新的LSM-KVS架构,它利用共享的分离式内存(DM)支持三层卸载,即内存表卸载、刷写卸载以及现有的压实卸载。与仅支持压实卸载的现有分离式LSM-KVS相比,O^3-LSM通过解决上述问题,最大化写入性能。O^3-LSM首先采用一种新颖的DM优化内存表实现动态内存表卸载,在扩展写缓冲区的同时支持快速、异步、并行的内存表传输。其次,我们提出协作式刷写卸载,将刷写控制平面与执行解耦,并通过专用调度与全局优化支持在任何节点上进行内存表刷写卸载。第三,O^3-LSM通过分片级优化进一步改进,该优化基于不相交的键值范围将内存表划分为可独立传输和刷写的分片,从而释放跨分片的并行性。此外,为缓解分离式设置中的慢速查找问题,O^3-LSM还采用了一种自适应缓存增强读取委托机制,将紧凑的本地缓存与DM辅助的内存表委托读取相结合。我们的评估表明,与Disaggregated-RocksDB、CaaS-LSM和Nova-LSM相比,O^3-LSM在写入吞吐量上最高提升4.5倍,范围查询吞吐量最高提升5.2倍,点查找吞吐量最高提升1.8倍,P99延迟最高降低76%。