Process mining gains increasing popularity in business process analysis, also in heavy industry. It requires a specific data format called an event log, with the basic structure including a case identifier (case ID), activity (event) name, and timestamp. In the case of industrial processes, data is very often provided by a monitoring system as time series of low level sensor readings. This data cannot be directly used for process mining since there is no explicit marking of activities in the event log, and sometimes, case ID is not provided. We propose a novel rule-based algorithm for identification patterns, based on the identification of significant changes in short-term mean values of selected variable to detect case ID. We present our solution on the mining use case. We compare computed results (identified patterns) with expert labels of the same dataset. Experiments show that the developed algorithm in the most of the cases correctly detects IDs in datasets with and without outliers reaching F1 score values: 96.8% and 97% respectively. We also evaluate our algorithm on dataset from manufacturing domain reaching value 92.6% for F1 score.
翻译:过程挖掘在业务流程分析中日益普及,在重工业领域亦不例外。该方法要求使用特定数据格式——事件日志,其基本结构包括案例标识符(案例ID)、活动(事件)名称和时间戳。对于工业过程而言,数据通常由监测系统以底层传感器读数的时间序列形式提供。由于事件日志中缺乏明确的活动标记,且有时未提供案例ID,此类数据无法直接用于过程挖掘。本文提出一种基于规则的新型识别模式算法,通过检测选定变量短期均值的显著变化来实现案例ID识别,并以采矿用例展示该解决方案。我们将算法计算结果(识别出的模式)与同一数据集的专家标注进行对比。实验表明:在含异常值和不含异常值的数据集中,所开发算法在多数情况下能正确检测ID,F1分数分别达到96.8%和97%。我们还在制造业领域的数据集上评估了该算法,其F1分数达到92.6%。