Process mining offers powerful techniques for discovering, analyzing, and enhancing real-world business processes. In this context, Petri nets provide an expressive means of modeling process behavior. However, directly analyzing and comparing intricate Petri net presents challenges. This study introduces PetriNet2Vec, a novel unsupervised methodology based on Natural Language Processing concepts inspired by Doc2Vec and designed to facilitate the effective comparison, clustering, and classification of process models represented as embedding vectors. These embedding vectors allow us to quantify similarities and relationships between different process models. Our methodology was experimentally validated using the PDC Dataset, featuring 96 diverse Petri net models. We performed cluster analysis, created UMAP visualizations, and trained a decision tree to provide compelling evidence for the capability of PetriNet2Vec to discern meaningful patterns and relationships among process models and their constituent tasks. Through a series of experiments, we demonstrated that PetriNet2Vec was capable of learning the structure of Petri nets, as well as the main properties used to simulate the process models of our dataset. Furthermore, our results showcase the utility of the learned embeddings in two crucial downstream tasks within process mining enhancement: process classification and process retrieval.
翻译:流程挖掘提供了发现、分析和增强现实世界业务流程的强大技术。在此背景下,Petri网为建模流程行为提供了一种富有表现力的手段。然而,直接分析和比较复杂的Petri网模型面临挑战。本研究提出PetriNet2Vec,一种基于Doc2Vec启发式自然语言处理概念的新型无监督方法,旨在将流程模型表示为嵌入向量,以促进其有效比较、聚类和分类。这些嵌入向量使我们能够量化不同流程模型之间的相似性和关联性。我们使用包含96个多样化Petri网模型的PDC数据集对该方法进行了实验验证。通过聚类分析、生成UMAP可视化图以及训练决策树,我们获得了有力证据,证明PetriNet2Vec能够识别流程模型及其组成任务中的有意义模式与关联。一系列实验表明,PetriNet2Vec能够学习Petri网的结构,以及用于模拟数据集中流程模型的主要属性。此外,我们的结果展示了所学嵌入在流程挖掘增强的两个关键下游任务——流程分类与流程检索中的实用价值。