Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict search positions and accelerate query processing. While learned indexes substantially outperform traditional structures for point lookups, they often suffer from high tail latency, suboptimal range query performance, and inconsistent effectiveness across diverse workloads. To address these challenges, this paper proposes HIRE, a hybrid in-memory index structure designed to deliver efficient performance consistently. HIRE combines the structural and performance robustness of traditional indexes with the predictive power of model-based prediction to reduce search overhead while maintaining worst-case stability. Specifically, it employs (1) hybrid leaf nodes adaptive to varying data distributions and workloads, (2) model-accelerated internal nodes augmented by log-based updates for efficient updates, (3) a nonblocking, cost-driven recalibration mechanism for dynamic data, and (4) an inter-level optimized bulk-loading algorithm accounting for leaf and internal-node errors. Experimental results on multiple real-world datasets demonstrate that HIRE outperforms both state-of-the-art learned indexes and traditional structures in range-query throughput, tail latency, and overall stability. Compared to state-of-the-art learned indexes and traditional indexes, HIRE achieves up to 41.7$\times$ higher throughput under mixed workloads, reduces tail latency by up to 98% across varying scenarios.
翻译:索引是现代数据库中实现高效数据检索与更新的关键。近年来机器学习的进展催生了学习型索引,通过建模数据的累积分布函数来预测搜索位置以加速查询处理。尽管学习型索引在点查询上显著优于传统结构,但其常面临高尾延迟、范围查询性能欠佳以及跨异构工作负载效果不稳定等问题。针对上述挑战,本文提出HIRE——一种旨在持续提供高效性能的混合内存索引结构。HIRE将传统索引的结构鲁棒性与性能稳定性同基于模型的预测能力相结合,在降低搜索开销的同时确保最坏情况下的稳定性。具体而言,它采用:(1) 自适应数据分布与工作负载的混合叶节点;(2) 通过日志式更新增强的模型加速内部节点以实现高效更新;(3) 面向动态数据的无锁、代价驱动的重校准机制;(4) 考虑叶节点与内部节点误差的跨层级优化批量加载算法。在多个真实数据集上的实验结果表明,HIRE在范围查询吞吐量、尾延迟及整体稳定性方面均优于现有最优学习型索引与传统结构。相较现有最优学习型索引与传统索引,HIRE在混合工作负载下吞吐量提升高达41.7倍,不同场景下尾延迟降低达98%。