Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict search positions and accelerate query processing. While learned indexes substantially outperform traditional structures for point lookups, they often suffer from high tail latency, suboptimal range query performance, and inconsistent effectiveness across diverse workloads. To address these challenges, this paper proposes HIRE, a hybrid in-memory index structure designed to deliver efficient performance consistently. HIRE combines the structural and performance robustness of traditional indexes with the predictive power of model-based prediction to reduce search overhead while maintaining worst-case stability. Specifically, it employs (1) hybrid leaf nodes adaptive to varying data distributions and workloads, (2) model-accelerated internal nodes augmented by log-based updates for efficient updates, (3) a nonblocking, cost-driven recalibration mechanism for dynamic data, and (4) an inter-level optimized bulk-loading algorithm accounting for leaf and internal-node errors. Experimental results on multiple real-world datasets demonstrate that HIRE outperforms both state-of-the-art learned indexes and traditional structures in range-query throughput, tail latency, and overall stability. Compared to state-of-the-art learned indexes and traditional indexes, HIRE achieves up to 41.7$\times$ higher throughput under mixed workloads, reduces tail latency by up to 98% across varying scenarios.
翻译:索引是现代数据库中实现高效数据检索与更新的关键。近期机器学习的进展催生了学习索引,其通过建模数据的累积分布函数来预测搜索位置并加速查询处理。尽管学习索引在点查询上显著优于传统结构,但其常面临高尾延迟、范围查询性能欠佳以及跨不同工作负载下效果不稳定等问题。为应对这些挑战,本文提出HIRE——一种旨在持续提供高效性能的混合内存索引结构。HIRE融合了传统索引的结构与性能鲁棒性,以及基于模型预测的搜索开销削减能力,同时保持最坏情况下的稳定性。具体而言,其采用:(1) 适应不同数据分布与工作负载的混合叶节点;(2) 由基于日志的更新增强的模型加速内部节点,以实现高效更新;(3) 面向动态数据的无阻塞、成本驱动重校准机制;(4) 考虑叶节点与内部节点误差的层级间优化批量加载算法。在多个真实数据集上的实验结果表明,HIRE在范围查询吞吐量、尾延迟及整体稳定性上均优于现有最先进的学习索引与传统结构。与最先进的学习索引和传统索引相比,HIRE在混合工作负载下可实现高达41.7倍的吞吐量提升,并在不同场景下将尾延迟降低最高达98%。