Developments in machine learning together with the increasing usage of sensor data challenge the reliance on deterministic logs, requiring new process mining solutions for uncertain, and in particular stochastically known, logs. In this work we formulate {trace recovery}, the task of generating a deterministic log from stochastically known logs that is as faithful to reality as possible. An effective trace recovery algorithm would be a powerful aid for maintaining credible process mining tools for uncertain settings. We propose an algorithmic framework for this task that recovers the best alignment between a stochastically known log and a process model, with three innovative features. Our algorithm, SKTR, 1) handles both Markovian and non-Markovian processes; 2) offers a quality-based balance between a process model and a log, depending on the available process information, sensor quality, and machine learning predictiveness power; and 3) offers a novel use of a synchronous product multigraph to create the log. An empirical analysis using five publicly available datasets, three of which use predictive models over standard video capturing benchmarks, shows an average relative accuracy improvement of more than 10 over a common baseline.
翻译:机器学习的发展以及传感器数据使用的日益增多的趋势,挑战了对确定性日志的依赖,需要针对不确定性(特别是随机已知日志)开发新的过程挖掘解决方案。本文提出了"轨迹恢复"任务,即从随机已知日志中生成尽可能忠实于现实的确定性日志。有效的轨迹恢复算法将成为维护不确定性环境下可信过程挖掘工具的重要辅助手段。我们为此任务提出了一种算法框架,该框架能在随机已知日志与过程模型之间恢复最佳对齐,并具有三个创新特性。我们的算法SKTR能够:1) 处理马尔可夫与非马尔可夫过程;2) 根据可用的过程信息、传感器质量和机器学习预测能力,在过程模型与日志之间实现基于质量平衡;3) 创新性地利用同步乘积多图生成日志。基于五个公开数据集的实证分析(其中三个使用标准视频捕捉基准的预测模型)表明,该算法相较于常见基线方法,平均相对精度提升超过10个百分点。