Predicting future resource demand in Cloud Computing is essential for optimizing the trade-off between serving customers' requests efficiently and minimizing the provisioning cost. Modelling prediction uncertainty is also desirable to better inform the resource decision-making process, but research in this field is under-investigated. In this paper, we propose univariate and bivariate Bayesian deep learning models that provide predictions of future workload demand and its uncertainty. We run extensive experiments on Google and Alibaba clusters, where we first train our models with datasets from different cloud providers and compare them with LSTM-based baselines. Results show that modelling the uncertainty of predictions has a positive impact on performance, especially on service level metrics, because uncertainty quantification can be tailored to desired target service levels that are critical in cloud applications. Moreover, we investigate whether our models benefit transfer learning capabilities across different domains, i.e. dataset distributions. Experiments on the same workload datasets reveal that acceptable transfer learning performance can be achieved within the same provider (because distributions are more similar). Also, domain knowledge does not transfer when the source and target domains are very different (e.g. from different providers), but this performance degradation can be mitigated by increasing the training set size of the source domain.
翻译:预测云计算中的未来资源需求对于优化客户请求的高效处理与最低化预置成本之间的权衡至关重要。建模预测不确定性同样有助于更好地指导资源决策过程,但该领域的研究尚不充分。本文提出单变量与双变量贝叶斯深度学习模型,可提供未来工作负载需求及其不确定性的预测。我们在谷歌和阿里巴巴集群上开展大量实验,首先使用不同云服务提供商的数据集训练模型,并将其与基于LSTM的基线模型进行比较。结果表明,建模预测不确定性对性能具有积极影响,尤其在服务等级指标方面,因为不确定性量化可针对云应用中关键的目标服务等级进行定制。此外,我们探究了模型是否受益于跨领域(即数据集分布不同)的迁移学习能力。对相同工作负载数据集的实验表明,在同一提供商内部可达到可接受的迁移学习性能(因分布更为相似)。同时,当源域与目标域差异显著时(如来自不同提供商),领域知识无法迁移,但可通过扩大源域训练集规模来缓解此性能下降。