Distributed Stream Processing (DSP) systems are capable of processing large streams of unbounded data, offering high throughput and low latencies. To maintain a stable Quality of Service (QoS), these systems require a sufficient allocation of resources. At the same time, over-provisioning can result in wasted energy and high operating costs. Therefore, to maximize resource utilization, autoscaling methods have been proposed that aim to efficiently match the resource allocation with the incoming workload. However, determining when and by how much to scale remains a significant challenge. Given the long-running nature of DSP jobs, scaling actions need to be executed at runtime, and to maintain a good QoS, they should be both accurate and infrequent. To address the challenges of autoscaling, the concept of self-adaptive systems is particularly fitting. These systems monitor themselves and their environment, adapting to changes with minimal need for expert involvement. This paper introduces Daedalus, a self-adaptive manager for autoscaling in DSP systems, which draws on the principles of self-adaption to address the challenge of efficient autoscaling. Daedalus monitors a running DSP job and builds performance models, aiming to predict the maximum processing capacity at different scale-outs. When combined with time series forecasting to predict future workloads, Daedalus proactively scales DSP jobs, optimizing for maximum throughput and minimizing both latencies and resource usage. We conducted experiments using Apache Flink and Kafka Streams to evaluate the performance of Daedalus against two state-of-the-art approaches. Daedalus was able to achieve comparable latencies while reducing resource usage by up to 71%.
翻译:分布式流处理(DSP)系统能够处理大规模无界数据流,提供高吞吐量和低延迟。为维持稳定的服务质量(QoS),这些系统需要充足的资源分配。同时,过度配置会导致能源浪费和高昂运营成本。因此,为最大化资源利用率,研究者提出了旨在高效匹配资源分配与输入工作负载的自动伸缩方法。然而,确定何时伸缩以及伸缩多少仍是一个重大挑战。鉴于DSP作业的长期运行特性,伸缩操作需在运行时执行,且为维持良好QoS,这些操作应既精确又低频。为应对自动伸缩的挑战,自适应系统的概念尤为契合。此类系统能够监控自身及其环境,在尽可能减少专家干预的情况下适应变化。本文提出Daedalus——一种面向DSP系统自动伸缩的自适应管理器,它借鉴自适应原理以解决高效自动伸缩的挑战。Daedalus通过监控运行中的DSP作业并构建性能模型,旨在预测不同扩缩容规模下的最大处理能力。结合时间序列预测以预判未来工作负载后,Daedalus能够主动伸缩DSP作业,优化最大吞吐量并最小化延迟与资源使用。我们使用Apache Flink和Kafka Streams进行实验,将Daedalus的性能与两种前沿方法进行对比。Daedalus在实现相当延迟的同时,资源使用量最高降低了71%。