Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database components to remain effective and efficient in the face of data and workload drift. Adaptability, therefore, is a key factor in assessing their practical applicability. However, existing benchmarks for learned database components either overlook or oversimplify the treatment of data and workload drift, failing to evaluate learned database components across a broad range of drift scenarios. This paper presents NeurBench, a new benchmark suite that applies measurable and controllable data and workload drift to enable systematic performance evaluations of learned database components. We quantify diverse types of drift by introducing a key concept called the drift factor. Building on this formulation, we propose a drift-aware data and workload generation framework that effectively simulates real-world drift while preserving inherent correlations. We employ NeurBench to evaluate state-of-the-art learned query optimizers, learned indexes, and learned concurrency control within a consistent experimental process, providing insights into their performance under diverse data and workload drift scenarios.
翻译:学习型数据库组件将机器学习深度集成至其设计中,近年来已得到广泛研究。鉴于数据库的动态性,即数据与工作负载持续发生漂移,学习型数据库组件在面对数据与工作负载漂移时保持高效性与有效性至关重要。因此,适应性是评估其实际适用性的关键因素。然而,现有针对学习型数据库组件的基准测试要么忽视数据与工作负载漂移,要么对其处理方式过度简化,未能覆盖广泛的漂移场景对学习型数据库组件进行全面评估。本文提出NeurBench——一个新型基准测试套件,通过施加可度量且可控的数据与工作负载漂移,实现对学习型数据库组件的系统性性能评估。我们通过引入“漂移因子”这一核心概念来量化多种类型的漂移。基于此形式化框架,我们提出了一种漂移感知的数据与工作负载生成框架,能在保持内在关联性的同时有效模拟真实世界的漂移现象。我们运用NeurBench在一致的实验流程中评估了前沿的学习型查询优化器、学习型索引及学习型并发控制机制,从而揭示了它们在多样化数据与工作负载漂移场景下的性能表现。