Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an $87.3\%$ median F1 score, and can result in $1.7\times$ and $1.3\times$ performance improvement when applied to materialized view selection and index selection applications, respectively.
翻译:数据库系统通常依赖历史查询轨迹进行基于工作负载的性能调优。然而,实际生产工作负载具有时间演变特性,这使得历史查询难以有效优化未来工作负载。为应对这一挑战,我们提出SIBYL——一个基于机器学习的端到端框架,能够在不同预测窗口内准确预测包含完整查询语句的未来查询序列。通过从真实工作负载中提炼洞见,我们提出基于模板的特征化技术,并开发了采用编码器-解码器架构的堆叠LSTM模型以实现查询工作负载的精确预测。我们还开发了提升大预测窗口下预测精度、以及在高查询到达率变异性的大规模工作负载场景中实现高可扩展性的技术。最后,我们提出处理工作负载漂移的方法。在四个真实工作负载上的评估表明,SIBYL能以$87.3\%$的中位F1分数预测工作负载,应用于物化视图选择与索引选择应用时,可分别带来$1.7倍$和$1.3倍$的性能提升。