Grand Perspective: Load Shedding in Distributed CEP Applications

In distributed Complex Event Processing (CEP) applications with high load but limited resources, bottleneck operators in the operator graph can significantly slow down processing of event streams, thus compelling the need to shed load. A high-quality load shedding strategy that resolves the bottleneck with high output quality evaluates each event's importance with regards to the application's final output and drops less important events from the event stream for the benefit of important ones. So far, no solution has been proposed that is able to permit good load shedding in distributed, multi-operator CEP applications. On one hand, shedding strategies have been proposed for single-operator CEP applications that can measure an event's importance immediately at the bottleneck operator, only, and thereby ignore the effect of other streams in the application on an event's importance. On the other hand, shedding strategies have been proposed for applications with multiple operators from the area of stream processing that provide a fixed selectivity which is not given in the conditional CEP operators. We, therefore, propose a load-shedding solution for distributed CEP applications that maximizes the application's final output and ensures timely processing of important events by using a set of CEP-tailored selectivity functions and a linear program, which is an abstraction of the CEP application. Moreover, our solution ensures a quality optimal shedder configuration even in the presence of dynamically changing conditions. With the help of extensive evaluations on both synthetic and real data, we show that our solution successfully resolves overload at bottleneck operators and at the same time maximizes the quality of the application's output.

翻译：在资源有限但负载高昂的分布式复杂事件处理（CEP）应用中，算子图中的瓶颈算子会显著拖慢事件流的处理速度，因此亟需进行负载丢弃。一种高质量的负载丢弃策略需能解决瓶颈问题并保持高输出质量，该策略会评估每个事件对应用最终输出的重要性，并丢弃事件流中较不重要的事件，以优先保障重要事件的处理。目前，尚未出现能够在分布式、多算子CEP应用中实现良好负载丢弃的解决方案。一方面，针对单算子CEP应用的丢弃策略只能直接在瓶颈算子处衡量事件重要性，忽略了应用内其他流对事件重要性的影响；另一方面，来自流处理领域的多算子应用丢弃策略虽能提供固定选择性，但这种选择性在条件性CEP算子中并不存在。为此，我们提出了一种适用于分布式CEP应用的负载丢弃方案，该方案通过采用一组针对CEP定制的选择性函数以及一个作为CEP应用抽象模型的线性规划，最大化应用的最终输出并确保重要事件的及时处理。此外，即使面对动态变化的条件，我们的解决方案也能确保质量最优的丢弃器配置。通过在合成数据和真实数据上的广泛评估，我们证明了该方案能成功解除瓶颈算子的过载，同时最大化应用输出的质量。