Although event logs are a powerful source to gain insight about the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior. Given an event log, we are interested in finding accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.
翻译:尽管事件日志是洞察底层业务流程行为的强大来源,现有工作主要关注发现事件日志活动序列中的模式,而忽略了事件属性数据。事件属性数据大多用于预测事件发生和过程结果,但现有技术未能挖掘关于过程执行期间事件属性数据如何变化的简洁且可解释的规则。子群发现和基于规则的分类方法缺乏捕获事件日志中存在的序列依赖性的能力,从而导致结果不理想且对过程行为的洞察有限。针对给定的事件日志,我们关注发现准确、简洁且可解释的“如果-则”规则,揭示过程如何修改数据。我们基于最小描述长度(MDL)原则形式化该问题,据此选择能对数据进行最优无损描述的模型。此外,我们提出贪心的Moody算法以高效搜索规则。通过在合成数据和真实数据上的大量实验,我们证明Moody确实能发现紧凑且可解释的规则,仅需少量数据即可实现准确发现,并且对噪声具有鲁棒性。