Streaming process mining deals with the real-time analysis of event streams. A common approach for it is to adopt windowing mechanisms that select event data from a stream for subsequent analysis. However, the size of these windows denotes a crucial parameter, as it influences the representativeness of the window content and, by extension, of the analysis results. Given that process dynamics are subject to changes and potential concept drift, a static, fixed window size leads to inaccurate representations that introduce bias in the analysis. In this work, we present a novel approach for streaming process mining that addresses these limitations by adjusting window sizes. Specifically, we dynamically determine suitable window sizes based on estimators for the representativeness of samples as developed for species estimation in biodiversity research. Evaluation results on real-world data sets show improvements over existing approaches that adopt static window sizes in terms of accuracy and robustness to concept drifts.
翻译:流式过程挖掘致力于对事件流进行实时分析。一种常见方法是采用窗口机制从流中选择事件数据进行后续分析。然而,窗口大小的设定是一个关键参数,因为它直接影响窗口内容的代表性,进而影响分析结果的准确性。鉴于过程动态可能发生变化并存在潜在的概念漂移,采用静态固定的窗口大小会导致不准确的表征,从而在分析中引入偏差。本研究提出一种新颖的流式过程挖掘方法,通过动态调整窗口大小来解决这些局限性。具体而言,我们基于生物多样性研究中为物种估计开发的样本代表性估计器,动态确定合适的窗口大小。在真实数据集上的评估结果表明,相较于采用静态窗口大小的现有方法,本方法在分析准确性和对概念漂移的鲁棒性方面均有提升。