Various kinds of uncertainty can occur in event logs, e.g., due to flawed recording, data quality issues, or the use of probabilistic models for activity recognition. Stochastically known event logs make these uncertainties transparent by encoding multiple possible realizations for events. However, the number of realizations encoded by a stochastically known log grows exponentially with its size, making exhaustive exploration infeasible even for moderately sized event logs. Thus, considering only the top-K most probable realizations has been proposed in the literature. In this paper, we implement an efficient algorithm to calculate a top-K realization ranking of an event log under event independence within O(Kn), where n is the number of uncertain events in the log. This algorithm is used to investigate the benefit of top-K rankings over top-1 interpretations of stochastically known event logs. Specifically, we analyze the usefulness of top-K rankings against different properties of the input data. We show that the benefit of a top-K ranking depends on the length of the input event log and the distribution of the event probabilities. The results highlight the potential of top-K rankings to enhance uncertainty-aware process mining techniques.
翻译:事件日志中可能出现各类不确定性,例如由于记录缺陷、数据质量问题或使用概率模型进行活动识别所致。随机已知事件日志通过为事件编码多种可能实现,使这些不确定性变得透明。然而,随机已知日志编码的实现数量随其规模呈指数级增长,即使对中等规模的事件日志,穷举探索也不可行。因此,文献中提出了仅考虑前K个最可能实现的方法。本文实现了一种高效算法,可在O(Kn)时间复杂度内计算事件日志在事件独立性条件下的前K个实现排序,其中n为日志中不确定事件的数量。该算法用于研究随机已知事件日志的前K排序相较于仅考虑最可能实现(top-1)的优势。具体而言,我们针对输入数据的不同特性分析了前K排序的实用性。研究表明,前K排序的效益取决于输入事件日志的长度及事件概率的分布。结果凸显了前K排序在增强不确定性感知过程挖掘技术方面的潜力。