LMG Index: A Robust and Efficient Learned Index Framework for Multi-Dimensional Performance Balance

Index structures are fundamental for efficient query processing on large-scale datasets. Learned indexes model the indexing process as a prediction problem to overcome the inherent trade-offs of traditional indexes. However, most existing learned indexes optimize only for limited objectives like query latency or space usage, neglecting other practical evaluation dimensions such as update efficiency and stability. Moreover, many learned indexes rely on assumptions about data distributions or workloads, lacking theoretical guarantees when facing unknown or evolving scenarios, which limits their generality in real-world systems. In this paper, we propose LMG, a robust and efficient learned index framework designed for multi-dimensional performance balance. LMG integrates a decoupled routing structure with theoretical $O(1)$ complexity for fixed key types and an optimal error threshold training algorithm that approaches $O(1)$ overhead in practice. Furthermore, the framework enhances query performance by optimizing gap allocation. Extensive evaluations show that our framework achieves competitive or leading performance across all key evaluation dimensions, including bulk loading (up to 7.55$\times$ faster), point queries (up to 1.68$\times$ faster), range queries (up to 11.41$\times$ faster), and mixed read-write throughput (up to 3.50$\times$ faster). Furthermore, LMG ensures robust long-term stability and high space efficiency (up to 6.26$\times$ smaller footprint). These results demonstrate that LMG significantly mitigates the multi-dimensional performance trade-offs often observed in state-of-the-art approaches, offering a balanced and efficient framework.

翻译：索引结构是大规模数据集高效查询处理的基础。学习索引将索引过程建模为预测问题，以克服传统索引固有的性能权衡。然而，现有大多数学习索引仅针对查询延迟或空间占用等有限目标进行优化，忽视了更新效率与稳定性等其他实际评估维度。此外，许多学习索引依赖于数据分布或工作负载的假设，在面临未知或动态变化场景时缺乏理论保证，限制了其在真实系统中的普适性。本文提出LMG——一种面向多维性能平衡的鲁棒高效学习索引框架。LMG集成了解耦路由结构（固定键类型下具有理论$O(1)$复杂度）与最优误差阈值训练算法（实践中接近$O(1)$开销）。同时，该框架通过优化间隙分配提升查询性能。大量评估表明，我们的框架在所有关键评估维度上均达到具有竞争力或领先的性能：批量加载（最高提速7.55倍）、点查询（最高提速1.68倍）、范围查询（最高提速11.41倍）以及混合读写吞吐量（最高提速3.50倍）。此外，LMG确保鲁棒的长期稳定性与高空间效率（内存占用最高降低6.26倍）。这些结果证明，LMG显著缓解了现有最优方法中常见的多维性能权衡问题，提供了一个平衡高效的框架。