Stage: Query Execution Time Prediction in Amazon Redshift

Query performance (e.g., execution time) prediction is a critical component of modern DBMSes. As a pioneering cloud data warehouse, Amazon Redshift relies on an accurate execution time prediction for many downstream tasks, ranging from high-level optimizations, such as automatically creating materialized views, to low-level tasks on the critical path of query execution, such as admission, scheduling, and execution resource control. Unfortunately, many existing execution time prediction techniques, including those used in Redshift, suffer from cold start issues, inaccurate estimation, and are not robust against workload/data changes. In this paper, we propose a novel hierarchical execution time predictor: the Stage predictor. The Stage predictor is designed to leverage the unique characteristics and challenges faced by Redshift. The Stage predictor consists of three model states: an execution time cache, a lightweight local model optimized for a specific DB instance with uncertainty measurement, and a complex global model that is transferable across all instances in Redshift. We design a systematic approach to use these models that best leverages optimality (cache), instance-optimization (local model), and transferable knowledge about Redshift (global model). Experimentally, we show that the Stage predictor makes more accurate and robust predictions while maintaining a practical inference latency and memory overhead. Overall, the Stage predictor can improve the average query execution latency by $20\%$ on these instances compared to the prior query performance predictor in Redshift.

翻译：查询性能（如执行时间）预测是现代数据库管理系统（DBMS）的关键组成部分。作为领先的云数据仓库，Amazon Redshift 依赖准确的执行时间预测来完成下游任务，范围从高级优化（如自动创建物化视图）到查询执行关键路径上的低级任务（如准入、调度和执行资源控制）。遗憾的是，许多现有的执行时间预测技术（包括 Redshift 中使用的技术）存在冷启动问题、估计不准确，且对工作负载/数据变化的鲁棒性不足。在本文中，我们提出了一种新颖的分层执行时间预测器：Stage 预测器。Stage 预测器旨在利用 Redshift 面临的独特特征和挑战。它由三种模型状态组成：执行时间缓存、针对特定数据库实例优化且带有不确定性测量的轻量级局部模型，以及可在 Redshift 所有实例间迁移的复杂全局模型。我们设计了一种系统方法，以最佳方式利用这些模型，充分发挥最优性（缓存）、实例优化（局部模型）和可迁移的 Redshift 知识（全局模型）。实验表明，与 Redshift 中先前的查询性能预测器相比，Stage 预测器在保持实用推理延迟和内存开销的同时，能做出更准确、更鲁棒的预测。总体而言，Stage 预测器可使这些实例上的平均查询执行延迟降低 $20\%$。