Host load prediction is essential for dynamic resource scaling and job scheduling in a cloud computing environment. In this context, workload prediction is challenging because of several issues. First, it must be accurate to enable precise scheduling decisions. Second, it must be fast to schedule at the right time. Third, a model must be able to account for new patterns of workloads so it can perform well on the latest and old patterns. Not being able to make an accurate and fast prediction or the inability to predict new usage patterns can result in severe outcomes such as service level agreement (SLA) misses. Our research trains a fast model with the ability of online adaptation based on the gated recurrent unit (GRU) to mitigate the mentioned issues. We use a multivariate approach using several features, such as memory usage, CPU usage, disk I/O usage, and disk space, to perform the predictions accurately. Moreover, we predict multiple steps ahead, which is essential for making scheduling decisions in advance. Furthermore, we use two pruning methods: L1 norm and random, to produce a sparse model for faster forecasts. Finally, online learning is used to create a model that can adapt over time to new workload patterns.
翻译:主机负载预测对于云计算环境中的动态资源扩展与任务调度至关重要。在此背景下,负载预测面临多重挑战:首先,预测需具备高精度以确保调度决策的准确性;其次,预测速度必须足够快以支持实时调度;最后,模型需具备对新型负载模式的适应能力,确保其在最新和传统模式上均能保持良好性能。若无法实现快速精准预测或缺失对新使用模式的识别能力,可能导致服务等级协议(SLA)违约等严重后果。本研究基于门控循环单元(GRU)构建具备在线自适应能力的快速预测模型以应对上述问题。我们采用多变量方法,综合利用内存使用率、CPU使用率、磁盘I/O使用率及磁盘空间等多维特征提升预测精度,并实现多步提前预测以满足前置调度需求。此外,通过L1范数剪枝与随机剪枝两种策略生成稀疏化模型以加速预测过程。最终,采用在线学习机制建立可随时间动态适应新型负载模式的预测模型。