The growth in data storage capacity and the increasing demands for high performance have created several challenges for concurrent indexing structures. One promising solution is learned indexes, which use a learning-based approach to fit the distribution of stored data and predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing learned indexes exhibit constraints and encounter issues of scalability on multi-core data storage. This paper introduces SALI, the Scalable Adaptive Learned Index framework, which incorporates two strategies aimed at achieving high scalability, improving efficiency, and enhancing the robustness of the learned index. Firstly, a set of node-evolving strategies is defined to enable the learned index to adapt to various workload skews and enhance its concurrency performance in such scenarios. Secondly, a lightweight strategy is proposed to maintain statistical information within the learned index, with the goal of further improving the scalability of the index. Furthermore, to validate their effectiveness, SALI applied the two strategies mentioned above to the learned index structure that utilizes fine-grained write locks, known as LIPP. The experimental results have demonstrated that SALI significantly enhances the insertion throughput with 64 threads by an average of 2.04x compared to the second-best learned index. Furthermore, SALI accomplishes a lookup throughput similar to that of LIPP+.
翻译:数据存储容量的增长以及对高性能日益增长的需求,给并发索引结构带来了诸多挑战。学习索引是一种有前景的解决方案,它采用基于学习的方法来拟合存储数据的分布,并预测性地定位目标键,从而显著提升查找性能。尽管具有优势,但主流学习索引仍存在局限,并在多核数据存储上面临可扩展性问题。本文介绍了SALI(可扩展自适应学习索引框架),该框架整合了两种策略,旨在实现高可扩展性、提高效率以及增强学习索引的鲁棒性。首先,定义了一组节点演进策略,使学习索引能够适应各种工作负载倾斜,并在此类场景下提升其并发性能。其次,提出了一种轻量级策略,用于在学习索引内维护统计信息,以期进一步提升索引的可扩展性。此外,为验证其有效性,SALI将上述两种策略应用于采用细粒度写锁的学习索引结构(即LIPP)。实验结果表明,与性能次优的学习索引相比,SALI在64线程下平均将插入吞吐量提升了2.04倍。同时,SALI实现了与LIPP+相近的查找吞吐量。