The past decade has seen rapid growth of distributed stream data processing systems. Under these systems, a stream application is realized as a Directed Acyclic Graph (DAG) of operators, where the level of parallelism of each operator has a substantial impact on its overall performance. However, finding optimal levels of parallelism remains challenging. Most existing methods are heavily coupled with the topological graph of operators, unable to efficiently tune under-provisioned jobs. They either insufficiently use previous tuning experience by treating successively tuning independently, or explore the configuration space aggressively, violating the Service Level Agreements (SLA). To address the above problems, we propose ContTune, a continuous tuning system for stream applications. It is equipped with a novel Big-small algorithm, in which the Big phase decouples the tuning from the topological graph by decomposing the job tuning problem into sub-problems that can be solved concurrently. We propose a conservative Bayesian Optimization (CBO) technique in the Small phase to speed up the tuning process by utilizing the previous observations. It leverages the state-of-the-art (SOTA) tuning method as conservative exploration to avoid SLA violations. Experimental results show that ContTune reduces up to 60.75% number of reconfigurations under synthetic workloads and up to 57.5% number of reconfigurations under real workloads, compared to the SOTA method DS2.
翻译:过去十年间,分布式流数据处理系统发展迅猛。在此类系统中,流应用程序实现为操作符的有向无环图(DAG),其中每个操作符的并行度对其整体性能具有显著影响。然而,寻找最优并行度仍具挑战性。现有方法大多与操作符的拓扑图紧密耦合,无法有效调优资源供给不足的作业。它们要么将连续调优视为独立过程,未能充分利用历史调优经验;要么激进地探索配置空间,违反服务等级协议(SLA)。为解决上述问题,本文提出ContTune——面向流应用的连续调优系统。该系统配备新颖的Big-small算法,其中Big阶段通过将作业调优问题分解为可并行求解的子问题,实现调优过程与拓扑图的解耦。我们在Small阶段提出保守贝叶斯优化(CBO)技术,通过利用历史观测值加速调优过程。该技术将最先进的(SOTA)调优方法作为保守探索策略,以避免违反SLA。实验结果表明,与SOTA方法DS2相比,ContTune在合成工作负载下最高减少60.75%的重配置次数,在实际工作负载下最高减少57.5%的重配置次数。