Learned database components, which deeply integrate machine learning into their design, have been extensively studied in recent years. Given the dynamism of databases, where data and workloads continuously drift, it is crucial for learned database components to remain effective and efficient in the face of data and workload drift. Robustness, therefore, is a key factor in assessing their practical applicability. Although recent works examine learned database components under specific drift, they fail to enable systematic performance evaluations across a broad range of drift or under customized drift as needed. This paper presents NeurBench, a new benchmark suite that supports evaluating learned database components under measurable and controllable data and workload drift. We quantify diverse types of drift by introducing a key concept called the drift factor. Building on this formulation, we propose a drift-aware data and workload generation framework that effectively simulates real-world drift while preserving inherent correlations. Experimental results demonstrate the effectiveness of NeurBench in generating realistic data and workload drift, while providing insights into the performance of representative learned database components under different drift scenarios.
翻译:学习型数据库组件,其设计深度融合机器学习技术,近年来得到了广泛研究。鉴于数据库具备动态特性,其中的数据和工作负载持续发生漂移,学习型数据库组件必须能在数据和负载漂移面前保持有效性和高效性。因此,鲁棒性是评估其实际应用可行性的关键因素。尽管近期研究在特定漂移下考察了学习型数据库组件,但未能实现系统性地跨多种漂移类型、或根据需求定制漂移的性能评估。本文提出NeurBench,这是一个新型基准测试套件,支持在可度量、可控的数据和工作负载漂移条件下评估学习型数据库组件。我们通过引入一个核心概念——漂移因子,来量化不同类型的漂移。基于此公式化表达,我们提出了一个具备漂移感知能力的数据与工作负载生成框架,该框架能在有效模拟真实世界漂移的同时,保持数据间固有的关联性。实验结果表明,NeurBench在生成逼真的数据与工作负载漂移方面效果显著,并为不同漂移场景下代表性学习型数据库组件的性能提供了深入洞察。