Using Large Language Models (LLMs) for Process Mining (PM) tasks is becoming increasingly essential, and initial approaches yield promising results. However, little attention has been given to developing strategies for evaluating and benchmarking the utility of incorporating LLMs into PM tasks. This paper reviews the current implementations of LLMs in PM and reflects on three different questions. 1) What is the minimal set of capabilities required for PM on LLMs? 2) Which benchmark strategies help choose optimal LLMs for PM? 3) How do we evaluate the output of LLMs on specific PM tasks? The answer to these questions is fundamental to the development of comprehensive process mining benchmarks on LLMs covering different tasks and implementation paradigms.
翻译:将大语言模型(LLMs)应用于流程挖掘(PM)任务日益重要,初步方法已展现出显著潜力。然而,针对制定评估策略与基准以衡量LLMs在PM任务中应用价值的研究仍相对匮乏。本文系统梳理了当前LLMs在PM领域的实现方案,并围绕三个核心问题展开思考:1)LLMs支持PM所需的最小能力集是什么?2)哪些基准测试策略有助于为PM任务选择最优LLMs?3)如何评估LLMs在特定PM任务中的输出效果?解答这些问题对于构建覆盖不同任务类型与实现范式的综合性PM基准体系具有根本性意义。