Process mining is concerned with deriving formal models capable of reproducing the behaviour of a given organisational process by analysing observed executions collected in an event log. The elements of an event log are finite sequences (i.e., traces or words) of actions. Many effective algorithms have been introduced which issue a control flow model (commonly in Petri net form) aimed at reproducing, as precisely as possible, the language of the considered event log. However, given that identical executions can be observed several times, traces of an event log are associated with a frequency and, hence, an event log inherently yields also a stochastic language. By exploiting the trace frequencies contained in the event log, the stochastic extension of process mining, therefore, consists in deriving stochastic (Petri nets) models capable of reproducing the likelihood of the observed executions. In this paper, we introduce a novel stochastic process mining approach. Starting from a ``standard'' Petri net model mined through classical mining algorithms, we employ optimization to identify optimal weights for the transitions of the mined net so that the stochastic language issued by the stochastic interpretation of the mined net closely resembles that of the event log. The optimization is either based on the maximum likelihood principle or on the earth moving distance. Experiments on some popular real system logs show an improved accuracy w.r.t. to alternative approaches.
翻译:过程挖掘旨在通过分析事件日志中收集的观测执行记录,推导出能够复现给定组织过程行为的形式化模型。事件日志的元素是有限的动作序列(即轨迹或词)。目前已提出许多有效算法,这些算法可生成控制流模型(通常以Petri网形式呈现),旨在尽可能精确地复现所考虑事件日志的语言。然而,由于相同执行可能被多次观测,事件日志的轨迹具有频率属性,因此事件日志本质上还蕴含随机语言。通过利用事件日志中的轨迹频率,过程挖掘的随机扩展便致力于推导能够复现观测执行似然度的随机(Petri网)模型。本文提出一种新颖的随机过程挖掘方法。从通过经典挖掘算法得到的“标准”Petri网模型出发,我们采用优化技术为挖掘所得网络的变迁确定最优权重,使得该网络经随机化解释后生成的随机语言与事件日志的随机语言高度近似。优化过程基于极大似然原理或推土距离实现。在若干典型真实系统日志上的实验表明,该方法相较于替代方案具有更高的准确性。