In cloud block store, indexing is on the critical path of I/O operations and typically resides in memory. With the scaling of users and the emergence of denser storage media, the index has become a primary memory consumer, causing memory strain. Our extensive analysis of production traces reveals that write requests exhibit a strong tendency to target continuous block ranges in cloud storage systems. Thus, compared to current per-block indexing, our insight is that we should directly index block ranges (i.e., range-as-a-key) to save memory. In this paper, we propose RASK, a memory-efficient and high-performance tree-structured index that natively indexes ranges. While range-as-a-key offers the potential to save memory and improve performance, realizing this idea is challenging due to the range overlap and range fragmentation issues. To handle range overlap efficiently, RASK introduces the log-structured leaf, combined with range-tailored search and garbage collection. To reduce range fragmentation, RASK employs range-aware split and merge mechanisms. Our evaluations on four production traces show that RASK reduces memory footprint by up to 98.9% and increases throughput by up to 31.0x compared to ten state-of-the-art indexes.
翻译:在云块存储系统中,索引位于I/O操作的关键路径上,通常常驻内存。随着用户规模的扩大和更高密度存储介质的出现,索引已成为主要的内存消耗源,导致内存压力加剧。我们对生产环境追踪数据的广泛分析表明,云存储系统中的写入请求呈现出强烈指向连续块范围的趋势。因此,相较于当前基于单块的索引方式,我们的核心观点是应直接对块范围进行索引(即“范围即键值”)以节省内存。本文提出RASK——一种原生支持范围索引的内存高效、高性能树状结构索引。虽然“范围即键值”具备节省内存与提升性能的潜力,但由于范围重叠与范围碎片化问题的存在,实现这一理念面临挑战。为高效处理范围重叠,RASK引入了日志结构叶子节点,并结合针对范围定制的搜索与垃圾回收机制。为减少范围碎片化,RASK采用了范围感知的分裂与合并策略。基于四条生产环境追踪数据的评估表明,相较于十种先进索引方案,RASK最高可降低98.9%的内存占用,并实现高达31.0倍的吞吐量提升。