We present~\emph{KV-Tandem}, a modular architecture for building LSM-based storage engines on top of simple, non-ordered persistent key-value stores (KVSs). KV-Tandem enables advanced functionalities such as range queries and snapshot reads, while maintaining the native KVS performance for random reads and writes. Its modular design offers better performance trade-offs compared to previous KV-separation solutions, which struggle to decompose the monolithic LSM structure. Central to KV-Tandem is~\emph{LSM bypass} -- a novel algorithm that offers a fast path to basic operations while ensuring the correctness of advanced APIs. We implement KV-Tandem in \emph{XDP-Rocks}, a RocksDB-compatible storage engine that leverages the XDP KVS and incorporates practical design optimizations for real-world deployment. Through extensive microbenchmark and system-level comparisons, we demonstrate that XDP-Rocks achieves 3x to 4x performance improvements over RocksDB across various workloads. XDP-Rocks is already deployed in production, delivering significant operator cost savings consistent with these performance gains.
翻译:本文提出 KV-Tandem,一种在简单、非有序的持久化键值存储之上构建基于LSM的存储引擎的模块化架构。KV-Tandem 在保持原生KVS随机读写性能的同时,实现了范围查询和快照读取等高级功能。其模块化设计相比以往难以分解单体LSM结构的KV分离方案,提供了更优的性能权衡。KV-Tandem 的核心是 LSM bypass——一种新颖算法,它为基本操作提供快速路径,同时确保高级API的正确性。我们在 XDP-Rocks 中实现了 KV-Tandem,这是一个兼容 RocksDB 的存储引擎,它利用 XDP KVS 并集成了面向实际部署的实用设计优化。通过大量的微基准测试和系统级对比,我们证明 XDP-Rocks 在各种工作负载下相比 RocksDB 实现了 3 倍至 4 倍的性能提升。XDP-Rocks 已投入生产部署,其带来的显著性能提升为运营商节约了大量成本。