While in-memory learned indexes have shown promising performance as compared to B+-tree, most widely used databases in real applications still rely on disk-based operations. Based on our experiments, we observe that directly applying the existing learned indexes on disk suffers from several drawbacks and cannot outperform a standard B+-tree in most cases. Therefore, in this work we make the first attempt to show how the idea of learned index can benefit the on-disk index by proposing AULID, a fully on-disk updatable learned index that can achieve state-of-the-art performance across multiple workload types. The AULID approach combines the benefits from both traditional indexing techniques and the learned indexes to reduce the I/O cost, the main overhead under disk setting. Specifically, three aspects are taken into consideration in reducing I/O costs: (1) reduce the overhead in updating the index structure; (2) induce shorter paths from root to leaf node; (3) achieve better locality to minimize the number of block reads required to complete a scan. Five principles are proposed to guide the design of AULID which shows remarkable performance gains and meanwhile is easy to implement. Our evaluation shows that AULID has comparable storage costs to a B+-tree and is much smaller than other learned indexes, and AULID is up to 2.11x, 8.63x, 1.72x, 5.51x, and 8.02x more efficient than FITing-tree, PGM, B+-tree, ALEX, and LIPP.
翻译:尽管内存学习索引相较于B+-tree展现出令人期待的性能,但实际应用中绝大多数广泛使用的数据库仍依赖磁盘操作。基于我们的实验观察,直接将现有学习索引应用于磁盘存在若干缺陷,在大多数情况下无法超越标准B+-tree。为此,本工作首次尝试论证学习索引思想如何惠及磁盘索引:通过提出AULID——一种完全基于磁盘的可更新学习索引,该索引可在多种负载类型下实现最先进的性能。AULID方法融合了传统索引技术与学习索引的优势,以降低磁盘环境下最主要的开销——I/O成本。具体而言,通过三个方面优化I/O成本:(1) 减少索引结构更新开销;(2) 缩短根节点到叶节点的路径;(3) 实现更优的局部性以最小化扫描所需的块读取次数。我们提出五项设计原则指导AULID的构建,该方法在展现显著性能提升的同时保持了实现简易性。评估表明,AULID的存储成本与B+-tree相当且远小于其他学习索引,相较于FITing-tree、PGM、B+-tree、ALEX和LIPP,AULID的效率分别提升至2.11倍、8.63倍、1.72倍、5.51倍和8.02倍。