Process mining offers powerful techniques for discovering, analyzing, and enhancing real-world business processes. In this context, Petri nets provide an expressive means of modeling process behavior. However, directly analyzing and comparing intricate Petri net presents challenges. This study introduces PetriNet2Vec, a novel unsupervised methodology based on Natural Language Processing concepts inspired by Doc2Vec and designed to facilitate the effective comparison, clustering, and classification of process models represented as embedding vectors. These embedding vectors allow us to quantify similarities and relationships between different process models. Our methodology was experimentally validated using the PDC Dataset, featuring 96 diverse Petri net models. We performed cluster analysis, created UMAP visualizations, and trained a decision tree to provide compelling evidence for the capability of PetriNet2Vec to discern meaningful patterns and relationships among process models and their constituent tasks. Through a series of experiments, we demonstrated that PetriNet2Vec was capable of learning the structure of Petri nets, as well as the main properties used to simulate the process models of our dataset. Furthermore, our results showcase the utility of the learned embeddings in two crucial downstream tasks within process mining enhancement: process classification and process retrieval.
翻译:过程挖掘提供了用于发现、分析和增强现实世界业务流程的强大技术。在此背景下,Petri网为建模过程行为提供了一种表达性强的方法。然而,直接分析和比较复杂的Petri网模型面临挑战。本研究提出PetriNet2Vec,这是一种基于自然语言处理概念的无监督方法,受Doc2Vec启发,旨在实现以嵌入向量表示的过程模型之间的有效比较、聚类和分类。这些嵌入向量使我们能够量化不同过程模型之间的相似性和关系。我们使用PDC数据集进行了实验验证,该数据集包含96个不同的Petri网模型。我们进行了聚类分析,生成了UMAP可视化结果,并训练了决策树,从而提供了令人信服的证据,证明PetriNet2Vec能够识别过程模型及其构成任务之间有意义的模式和关系。通过一系列实验,我们证明了PetriNet2Vec能够学习Petri网的结构,以及用于模拟数据集中过程模型的主要属性。此外,我们的结果展示了所学嵌入在过程挖掘增强中的两个关键下游任务(过程分类和过程检索)中的实用性。