A recent research trend involves treating database index structures as Machine Learning (ML) models. In this domain, single or multiple ML models are trained to learn the mapping from keys to positions inside a data set. This class of indexes is known as "Learned Indexes." Learned indexes have demonstrated improved search performance and reduced space requirements for one-dimensional data. The concept of one-dimensional learned indexes has naturally been extended to multi-dimensional (e.g., spatial) data, leading to the development of "Learned Multi-dimensional Indexes". This survey focuses on learned multi-dimensional index structures. Specifically, it reviews the current state of this research area, explains the core concepts behind each proposed method, and classifies these methods based on several well-defined criteria. We present a taxonomy that classifies and categorizes each learned multi-dimensional index, and survey the existing literature on learned multi-dimensional indexes according to this taxonomy. Additionally, we present a timeline to illustrate the evolution of research on learned indexes. Finally, we highlight several open challenges and future research directions in this emerging and highly active field.
翻译:近期研究趋势涉及将数据库索引结构视为机器学习模型。在此领域中,训练单个或多个机器学习模型来学习从键到数据集内位置的映射。这类索引被称为"学习索引"。学习索引已证明在一维数据上具有改进的搜索性能和减少的空间需求。一维学习索引的概念自然扩展至多维(如空间)数据,从而催生了"学习型多维索引"的发展。本综述聚焦于学习型多维索引结构,具体而言:回顾该研究领域现状,阐释各方法背后的核心概念,并基于若干明确标准对这些方法进行分类。我们提出一种分类体系,用于对每种学习型多维索引进行归类整理,并根据该分类体系综述现有文献。此外,我们通过时间线展示学习索引研究的演进历程。最后,针对这一新兴且高度活跃的领域,指出若干开放性挑战与未来研究方向。