The assessment of process mining techniques using real-life data is often compromised by the lack of ground truth knowledge, the presence of non-essential outliers in system behavior and recording errors in event logs. Using synthetically generated data could leverage ground truth for better evaluation. Existing log generation tools inject noise directly into the logs, which does not capture many typical behavioral deviations. Furthermore, the link between the model and the log, which is needed for later assessment, becomes lost. We propose a ground-truth approach for generating process data from either existing or synthetic initial process models, whether automatically generated or hand-made. This approach incorporates patterns of behavioral deviations and recording errors to produce a synthetic yet realistic deviating model and imperfect event log. These, together with the initial model, are required to assess process mining techniques based on ground truth knowledge. We demonstrate this approach to create datasets of synthetic process data for three processes, one of which we used in a conformance checking use case, focusing on the assessment of (relaxed) systemic alignments to expose and explain deviations in modeled and recorded behavior. Our results show that this approach, unlike traditional methods, provides detailed insights into the strengths and weaknesses of process mining techniques, both quantitatively and qualitatively.
翻译:使用真实数据评估过程挖掘技术常常因缺乏真实情况知识、系统行为中存在非必要的异常值以及事件日志中的记录错误而受到影响。利用合成生成的数据可以借助真实情况进行更优评估。现有的日志生成工具直接将噪声注入日志,这无法捕捉许多典型的行为偏差。此外,模型与日志之间的关联(为后续评估所需)会丢失。我们提出一种基于真实情况的方法,用于从现有或合成的初始过程模型(无论是自动生成还是手工制作)生成过程数据。该方法结合行为偏差和记录错误的模式,生成一个合成但真实的偏差模型和不完美事件日志。这些数据与初始模型一起,是基于真实情况知识评估过程挖掘技术所必需的。我们展示了该方法为三个过程创建合成过程数据集的实例,其中一个被用于一致性检查用例,重点评估(松弛的)系统对齐以揭示和解释建模行为与记录行为之间的偏差。我们的结果表明,与传统方法不同,该方法能够从定量和定性两方面为过程挖掘技术的优缺点提供详细洞察。