Event logs are invaluable for conducting process mining projects, offering insights into process improvement and data-driven decision-making. However, data quality issues affect the correctness and trustworthiness of these insights, making preprocessing tasks a necessity. Despite the recognized importance, the execution of preprocessing tasks remains ad-hoc, lacking support. This paper presents a systematic literature review that establishes a comprehensive repository of preprocessing tasks and their usage in case studies. We identify six high-level and 20 low-level preprocessing tasks in case studies. Log filtering, transformation, and abstraction are commonly used, while log enriching, integration, and reduction are less frequent. These results can be considered a first step in contributing to more structured, transparent event log preprocessing, enhancing process mining reliability.
翻译:事件日志对于进行过程挖掘项目至关重要,能够提供对流程改进和数据驱动决策的洞察。然而,数据质量问题会影响这些洞察的正确性和可信度,使得预处理任务成为必要。尽管预处理的重要性已得到公认,但其执行仍缺乏系统支持,往往是临时性的。本文通过系统性文献综述,建立了一个全面的预处理任务库及其在案例研究中的使用情况。我们在案例研究中识别出6个高层级和20个低层级的预处理任务。日志过滤、转换和抽象是常用的预处理任务,而日志丰富、集成和简化则较少使用。这些结果可视为朝着更结构化、更透明的事件日志预处理迈出的第一步,从而提升过程挖掘的可靠性。