Modern computer systems are highly configurable, with hundreds of configuration options that interact, resulting in an enormous configuration space. As a result, optimizing performance goals (e.g., latency) in such systems is challenging due to frequent uncertainties in their environments (e.g., workload fluctuations). Recently, transfer learning has been applied to address this problem by reusing knowledge from configuration measurements from the source environments, where it is cheaper to intervene than the target environment, where any intervention is costly or impossible. Recent empirical research showed that statistical models can perform poorly when the deployment environment changes because the behavior of certain variables in the models can change dramatically from source to target. To address this issue, we propose CAMEO, a method that identifies invariant causal predictors under environmental changes, allowing the optimization process to operate in a reduced search space, leading to faster optimization of system performance. We demonstrate significant performance improvements over state-of-the-art optimization methods in MLperf deep learning systems, a video analytics pipeline, and a database system.
翻译:现代计算机系统具有高度可配置性,数百个相互影响的配置选项导致其配置空间极为庞大。在此类系统中优化性能目标(如延迟)时,环境中的频繁不确定性(例如工作负载波动)往往带来严峻挑战。近年来,迁移学习被应用于解决该问题:通过复用源自更经济干预的源环境中的配置测量知识,来辅助成本高昂或无法干预的目标环境优化。最新实证研究表明,当部署环境发生变化时,统计模型的性能可能显著下降,因为模型中某些变量的行为在源环境与目标环境之间存在剧烈差异。针对此问题,我们提出CAMEO方法——一种能够识别环境变化下不变因果预测因子的技术,使优化过程可在缩减的搜索空间中运行,从而加速系统性能优化。在MLperf深度学习系统、视频分析流水线及数据库系统中的实验表明,该方法相较现有最先进优化方法取得了显著性能提升。