Event logs are invaluable for conducting process mining projects, offering insights into process improvement and data-driven decision-making. However, data quality issues affect the correctness and trustworthiness of these insights, making preprocessing tasks a necessity. Despite the recognized importance, the execution of preprocessing tasks remains ad-hoc, lacking support. This paper presents a systematic literature review that establishes a comprehensive repository of preprocessing tasks and their usage in case studies. We identify six high-level and 20 low-level preprocessing tasks in case studies. Log filtering, transformation, and abstraction are commonly used, while log enriching, integration, and reduction are less frequent. These results can be considered a first step in contributing to more structured, transparent event log preprocessing, enhancing process mining reliability.
翻译:事件日志对于开展过程挖掘项目具有不可估量的价值,能够提供过程改进和数据驱动决策的洞察。然而,数据质量问题会影响这些洞察的正确性和可信度,使得预处理任务成为必要。尽管预处理的重要性已被认可,但其执行仍然缺乏支持,处于临时性状态。本文通过系统文献综述,构建了一个全面的预处理任务及其在案例研究中使用的知识库。我们在案例研究中识别出6个高层级和20个低层级的预处理任务。日志过滤、转换和抽象是常见任务,而日志丰富、集成和缩减则较少使用。这些结果可视为迈向更结构化、更透明的事件日志预处理的第一步,从而增强过程挖掘的可靠性。