Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains a challenge. This is especially true in environments where workloads change over time and node failures are all but inevitable. Furthermore, configuration parameters such as memory allocation and checkpointing intervals impact performance and resource usage as well. Sub-optimal configurations easily lead to high operational costs, poor performance, or unacceptable loss of service. In this paper, we present Demeter, a method for dynamically optimizing key DSP system configuration parameters for resource efficiency. Demeter uses Time Series Forecasting to predict future workloads and Multi-Objective Bayesian Optimization to model runtime behaviors in relation to parameter settings and workload rates. Together, these techniques allow us to determine whether or not enough is known about the predicted workload rate to proactively initiate short-lived parallel profiling runs for data gathering. Once trained, the models guide the adjustment of multiple, potentially dependent system configuration parameters ensuring optimized performance and resource usage in response to changing workload rates. Our experiments on a commodity cluster using Apache Flink demonstrate that Demeter significantly improves the operational efficiency of long-running benchmark jobs.
翻译:分布式流处理专注于对无界大数据流进行近实时处理。为提升处理能力,分布式流处理系统可在商品节点集群中动态伸缩,确保在可变工作负载下保持服务质量。然而,如何在最大化资源利用率的同时选择扩缩容配置仍是一个挑战,尤其在负载随时间变化且节点故障不可避免的环境中。此外,内存分配、检查点间隔等配置参数同样影响性能与资源消耗。次优配置易导致运营成本高企、性能下降或服务中断。本文提出Demeter方法,通过动态优化关键分布式流处理系统配置参数以实现资源高效。Demeter利用时间序列预测预估未来工作负载,并采用多目标贝叶斯优化建模运行时行为与参数设置及负载速率的关系。这些技术协同使我们能够判断是否充分掌握预测负载速率,从而主动启动短时并行性能分析运行以收集数据。模型训练完成后,可指导调整多个可能相互依赖的系统配置参数,确保在负载变化时实现优化性能与资源利用。我们在基于Apache Flink的商品集群上的实验表明,Demeter显著提升了长周期基准作业的运营效率。