The growth in data storage capacity and the increasing demands for high performance have created several challenges for concurrent indexing structures. One promising solution is learned indexes, which use a learning-based approach to fit the distribution of stored data and predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing learned indexes exhibit constraints and encounter issues of scalability on multi-core data storage. This paper introduces SALI, the Scalable Adaptive Learned Index framework, which incorporates two strategies aimed at achieving high scalability, improving efficiency, and enhancing the robustness of the learned index. Firstly, a set of node-evolving strategies is defined to enable the learned index to adapt to various workload skews and enhance its concurrency performance in such scenarios. Secondly, a lightweight strategy is proposed to maintain statistical information within the learned index, with the goal of further improving the scalability of the index. Furthermore, to validate their effectiveness, SALI applied the two strategies mentioned above to the learned index structure that utilizes fine-grained write locks, known as LIPP. The experimental results have demonstrated that SALI significantly enhances the insertion throughput with 64 threads by an average of 2.04x compared to the second-best learned index. Furthermore, SALI accomplishes a lookup throughput similar to that of LIPP+.
翻译:数据存储容量的增长以及对高性能日益增长的需求,给并发索引结构带来了若干挑战。一种有前景的解决方案是学习索引,它采用基于学习的方法拟合存储数据的分布,并预测性地定位目标键,从而显著提升查询性能。尽管具有这些优势,当前主流的学习索引仍存在局限性,并且在多核数据存储的可扩展性方面面临问题。本文介绍SALI,即可扩展自适应学习索引框架,它融合了两种策略,旨在实现高可扩展性、提升效率并增强学习索引的鲁棒性。首先,定义了一组节点演化策略,使学习索引能够适应各种工作负载倾斜,并在此类场景下提升其并发性能。其次,提出了一种轻量级策略来维护学习索引中的统计信息,目标是进一步提高索引的可扩展性。此外,为验证其有效性,SALI将上述两种策略应用于采用细粒度写锁的学习索引结构(称为LIPP)。实验结果表明,与次优的学习索引相比,SALI在64线程下将插入吞吐量平均提升了2.04倍。同时,SALI实现了与LIPP+相似的查询吞吐量。