In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used in process mining. In this paper, we present an extension of the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are considered to be fitting with respect to a certain fraction of the behavior described by the given event log as indicated by a given noise threshold. It evaluates all possible candidate places using token-based replay. The set of replayable traces is determined for each place in isolation, i.e., these sets do not need to be consistent. This allows the algorithm to abstract from infrequent behavioral patterns occuring only in some traces. When combining these places into a Petri net by connecting them to the corresponding uniquely labeled transitions, the resulting net can replay exactly those traces from the event log that are allowed by the combination of all inserted places. Thus, inserting places one-by-one without considering their combined effect may result in deadlocks and low fitness of the Petri net. In this paper, we explore adaptions of the eST-Miner, that aim to select a subset of places such that the resulting Petri net guarantees a definable minimal fitness while maintaining high precision with respect to the input event log. Furthermore, current place evaluation techniques tend to block the execution of infrequent activity labels. Thus, a refined place fitness metric is introduced and thoroughly investigated. Furthermore, various place selection strategies are proposed and their impact on the returned Petri net is evaluated by experiments using both real and artificial event logs.
翻译:在过程发现中,目标是为给定事件日志找到描述底层过程的模型。尽管过程模型可通过多种方式表示,但Petri网作为一种理论上充分探索的描述语言,常在过程挖掘中使用。本文提出了eST-Miner过程发现算法的扩展。eST-Miner计算一组Petri网库所,这些库所被认为与给定噪声阈值所指示的事件日志中特定比例的行为相吻合。它基于令牌重放评估所有可能的候选库所。每个库所独立确定可重放轨迹集合,即这些集合无需一致。这使得算法能够抽象出仅出现在部分轨迹中的低频行为模式。当将这些库所通过连接至对应唯一标识变迁的方式组合成Petri网时,生成的网络仅允许所有插入库所组合所允许的事件日志轨迹。因此,逐个插入库所而不考虑其组合效果可能导致死锁和Petri网拟合度降低。本文探索了eST-Miner的改进方法,旨在选择库所子集,使得生成的Petri网在保证可定义的最小拟合度的同时,保持对输入事件日志的高精度。此外,当前库所评估技术倾向于阻止低频活动标签的执行。为此,引入并深入研究了改进的库所拟合度指标。同时,提出了多种库所选策略,并通过使用真实和人工事件日志的实验评估了其对返回Petri网的影响。