Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time but also disrupts the execution of core business processes, as memory-overloading (MO) queries are typically part of complex workflows. If such queries are identified in advance and scheduled to memory-rich serverless clusters, it can prevent resource wastage and query execution failure. Therefore, cloud data warehouses desire an admission control framework with high prediction precision, interpretability, efficiency, and adaptability to effectively identify MO queries. However, existing admission control frameworks primarily focus on scenarios like SLA satisfaction and resource isolation, with limited precision in identifying MO queries. Moreover, there is a lack of publicly available MO-labeled datasets with workloads for training and benchmarking. To tackle these challenges, we propose SafeLoad, the first query admission control framework specifically designed to identify MO queries. Alongside, we release SafeBench, an open-source, industrial-scale benchmark for this task, which includes 150 million real queries. SafeLoad first filters out memory-safe queries using the interpretable discriminative rule. It then applies a hybrid architecture that integrates both a global model and cluster-level models, supplemented by a misprediction correction module to identify MO queries. Additionally, a self-tuning quota management mechanism dynamically adjusts prediction quotas per cluster to improve precision. Experimental results show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead. Specifically, SafeLoad improves precision by up to 66% over the best baseline and reduces wasted CPU time by up to 8.09x compared to scenarios without SafeLoad.


翻译:内存过载是云数据仓库中一种常见的资源耗尽形式。当数据库查询因内存过载而失败时,不仅会浪费CPU时间等关键资源,还会中断核心业务流程的执行,因为内存过载查询通常是复杂工作流的一部分。若能提前识别此类查询并将其调度至内存充足的无服务器集群,即可避免资源浪费和查询执行失败。因此,云数据仓库亟需一种具备高预测精度、可解释性、高效性和适应性的准入控制框架,以有效识别内存过载查询。然而,现有的准入控制框架主要关注SLA满足度与资源隔离等场景,在识别内存过载查询方面精度有限。此外,目前缺乏公开可用的、带有工作负载的内存过载标注数据集用于训练和基准测试。为应对这些挑战,我们提出了SafeLoad——首个专门用于识别内存过载查询的查询准入控制框架。同时,我们开源了SafeBench,一个面向该任务的工业级基准测试集,包含1.5亿条真实查询。SafeLoad首先通过可解释判别规则过滤内存安全查询,随后采用集成全局模型与集群级模型的混合架构,并辅以误判校正模块来识别内存过载查询。此外,通过自调优配额管理机制动态调整各集群的预测配额以提升精度。实验结果表明,SafeLoad以较低的在线与离线时间开销实现了最先进的预测性能。具体而言,相较于最优基线方法,SafeLoad将预测精度最高提升66%;与未部署SafeLoad的场景相比,其CPU时间浪费最高减少8.09倍。

0
下载
关闭预览

相关内容

Python图像处理,366页pdf,Image Operators Image Processing in Python
论文报告 | Graph-based Neural Multi-Document Summarization
科技创新与创业
15+阅读 · 2017年12月15日
国家自然科学基金
1+阅读 · 2015年12月31日
国家自然科学基金
4+阅读 · 2015年12月31日
国家自然科学基金
6+阅读 · 2015年12月31日
国家自然科学基金
1+阅读 · 2014年12月31日
VIP会员
相关基金
国家自然科学基金
1+阅读 · 2015年12月31日
国家自然科学基金
4+阅读 · 2015年12月31日
国家自然科学基金
6+阅读 · 2015年12月31日
国家自然科学基金
1+阅读 · 2014年12月31日
Top
微信扫码咨询专知VIP会员