Starting with a collection of traces generated by process executions, process discovery is the task of constructing a simple model that describes the process, where simplicity is often measured in terms of model size. The challenge of process discovery is that the process of interest is unknown, and that while the input traces constitute positive examples of process executions, no negative examples are available. Many commercial tools discover Directly-Follows Graphs, in which nodes represent the observable actions of the process, and directed arcs indicate execution order possibilities over the actions. We propose a new approach for discovering sound Directly-Follows Graphs that is grounded in grammatical inference over the input traces. To promote the discovery of small graphs that also describe the process accurately we design and evaluate a genetic algorithm that supports the convergence of the inference parameters to the areas that lead to the discovery of interesting models. Experiments over real-world datasets confirm that our new approach can construct smaller models that represent the input traces and their frequencies more accurately than the state-of-the-art technique. Reasoning over the frequencies of encoded traces also becomes possible, due to the stochastic semantics of the action graphs we propose, which, for the first time, are interpreted as models that describe the stochastic languages of action traces.
翻译:从过程执行生成的迹集合出发,过程发现是构建描述该过程的简单模型的任务,其中简单性通常以模型规模衡量。过程发现的挑战在于,目标过程是未知的,且输入迹虽构成过程执行的正例,但不存在负例。许多商业工具可发现直接跟随图,其中节点表示过程的可观测动作,有向弧表示动作间的执行顺序可能性。我们提出一种基于输入迹的语法推断来发现可靠直接跟随图的新方法。为促进发现既能准确描述过程又结构紧凑的小型图,我们设计并评估了一种遗传算法,该算法支持推理参数收敛至能发现有趣模型的区域。真实数据集实验证实,与现有最优技术相比,我们的新方法能构建更小的模型,同时更准确地表示输入迹及其频率。由于我们提出的动作图具有随机语义——这是首次将其解释为描述动作迹随机语言的模型——因此还能基于编码迹的频率进行推理。