Overload situations, in the presence of resource limitations, in complex event processing (CEP) systems are typically handled using load shedding to maintain a given latency bound. However, load shedding might negatively impact the quality of results (QoR). To minimize the shedding impact on QoR, CEP researchers propose shedding approaches that drop events/internal state with the lowest importances/utilities. In both black-box and white-box shedding approaches, different features are used to predict these utilities. In this work, we propose a novel black-box shedding approach that uses a new set of features to drop events from the input event stream to maintain a given latency bound. Our approach uses a probabilistic model to predict these event utilities. Moreover, our approach uses Zobrist hashing and well-known machine learning models, e.g., decision trees and random forests, to handle the predicted event utilities. Through extensive evaluations on several synthetic and two real-world datasets and a representative set of CEP queries, we show that, in the majority of cases, our load shedding approach outperforms state-of-the-art black-box load shedding approaches, w.r.t. QoR.
翻译:在复杂事件处理(CEP)系统中,资源受限情况下的过载问题通常采用负载丢弃策略来维持给定的延迟界限。然而,负载丢弃可能对结果质量(QoR)产生负面影响。为最小化丢弃对QoR的影响,CEP研究者提出了多种丢弃方法,这些方法会舍弃重要性/效用最低的事件或内部状态。在黑盒和白盒丢弃方法中,不同的特征被用于预测这些效用。本文提出了一种新型黑盒丢弃方法,该方法利用一组新特征从输入事件流中丢弃事件,以维持给定的延迟界限。我们的方法采用概率模型来预测这些事件效用。此外,该方法使用Zobrist哈希以及众所周知的机器学习模型(例如决策树和随机森林)来处理预测的事件效用。通过在多个合成数据集、两个真实世界数据集以及一组具有代表性的CEP查询上进行广泛评估,我们证明在大多数情况下,我们的负载丢弃方法在QoR方面优于最先进的黑盒负载丢弃方法。