Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided by its child nodes such that a key search enjoys perfect model prediction accuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key search time. Moreover, we design flexible algorithms for DILI to efficiently insert and delete keys and automatically adjust the tree structure when necessary. Extensive experimental results show that DILI outperforms the state-of-the-art alternatives on different kinds of workloads.
翻译:针对内存中的一维搜索键,我们提出了一种新颖的分布驱动学习型索引树(DILI),其中每个节点使用简洁且计算高效的线性回归模型。内部节点的键范围被其子节点等分,使得键搜索能够以完美的模型预测精度找到相应的叶节点。叶节点利用机器学习模型生成可搜索的数据布局,从而精确预测键对应的数据记录位置。为构建DILI,我们首先根据全局和局部键分布,使用线性回归模型构建一棵自底向上的树。基于该自底向上的树,我们以自顶向下的方式构建DILI,并根据局部分布为内部节点个性化设置扇出。DILI在叶节点数量与树高度这两个键搜索时间的关键因素之间取得了良好平衡。此外,我们设计了灵活的算法,支持DILI高效地插入和删除键,并在必要时自动调整树结构。大量实验结果表明,DILI在不同类型的工作负载下均优于当前最先进的替代方案。