In modern computing environments, users may have multiple systems accessible to them such as local clusters, private clouds, or public clouds. This abundance of choices makes it difficult for users to select the system and configuration for running an application that best meet their performance and cost objectives. To assist such users, we propose a prediction tool that predicts the full performance-cost trade-off space of an application across multiple systems. Our tool runs and profiles a submitted application on a small number of configurations from some of the systems, and uses that information to predict the application's performance on all configurations in all systems. The prediction models are trained offline with data collected from running a large number of applications on a wide variety of configurations. Notable aspects of our tool include: providing different scopes of prediction with varying online profiling requirements, automating the selection of the small number of configurations and systems used for online profiling, performing online profiling using partial runs thereby make predictions for applications without running them to completion, employing a classifier to distinguish applications that scale well from those that scale poorly, and predicting the sensitivity of applications to interference from other users. We evaluate our tool using 69 data analytics and scientific computing benchmarks executing on three different single-node CPU systems with 8-9 configurations each and show that it can achieve low prediction error with modest profiling overhead.
翻译:在现代计算环境中,用户可能拥有多个可访问的系统,例如本地集群、私有云或公有云。这种选择的多样性使用户难以选择最适合其性能和成本目标的系统及配置来运行应用程序。为帮助此类用户,我们提出了一种预测工具,能够预测应用在多个系统中的完整性能-成本权衡空间。该工具在部分系统的少量配置上运行并分析提交的应用,利用这些信息预测该应用在所有系统的所有配置上的性能。预测模型通过离线训练,使用在多种配置上运行大量应用所收集的数据构建。该工具的显著特点包括:提供不同预测范围(对应不同在线分析需求)、自动选择用于在线分析的少量配置与系统、通过部分运行进行在线分析(无需运行完整应用即可预测性能)、采用分类器区分可良好扩展与扩展性差的应用,以及预测应用对其他用户干扰的敏感度。我们在三个单节点CPU系统(各含8-9种配置)上使用69个数据分析和科学计算基准程序进行评估,结果表明该工具能在适度的分析开销下实现较低的预测误差。