Computing servers have played a key role in developing and processing emerging compute-intensive applications in recent years. Consolidating multiple virtual machines (VMs) inside one server to run various applications introduces severe competence for limited resources among VMs. Many techniques such as VM scheduling and resource provisioning are proposed to maximize the cost-efficiency of the computing servers while alleviating the performance inference between VMs. However, these management techniques require accurate performance prediction of the application running inside the VM, which is challenging to get in the public cloud due to the black-box nature of the VMs. From this perspective, this paper proposes a novel machine learning-based performance prediction approach for applications running in the cloud. To achieve high accuracy predictions for black-box VMs, the proposed method first identifies the running application inside the virtual machine. It then selects highly-correlated runtime metrics as the input of the machine learning approach to accurately predict the performance level of the cloud application. Experimental results with state-of-the-art cloud benchmarks demonstrate that our proposed method outperforms the existing prediction methods by more than 2x in terms of worst prediction error. In addition, we successfully tackle the challenge in performance prediction for applications with variable workloads by introducing the performance degradation index, which other comparison methods fail to consider. The workflow versatility of the proposed approach has been verified with different modern servers and VM configurations.
翻译:近年来,计算服务器在处理新兴计算密集型应用的开发与运行中发挥了关键作用。将多个虚拟机(VM)整合到一台服务器中运行不同应用,会导致虚拟机之间对有限资源的激烈竞争。为在提高计算服务器成本效率的同时缓解虚拟机间的性能干扰,学界提出了虚拟机调度、资源供应等多种技术。然而,这些管理技术需要准确预测虚拟机内运行的应用程序性能,而由于公有云中虚拟机的黑箱特性,这一预测极具挑战性。基于此,本文提出一种新颖的基于机器学习的云应用性能预测方法。为实现对黑箱虚拟机的高精度预测,该方法首先识别虚拟机内运行的应用,随后选取高度相关的运行时指标作为机器学习模型的输入,从而准确预测云应用的性能水平。采用最新云基准测试的实验结果表明,本文方法在预测最差误差上较现有方法提升超过2倍。此外,通过引入性能退化指数(其他对比方法均未考虑该指标),我们成功解决了可变工作负载应用的性能预测难题。该方法的工作流通用性已在不同现代服务器及虚拟机配置中得到验证。