Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks. While not originally designed for complex cognitive processes, recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making. In business processes, such abilities could be invaluable for leveraging on the massive corpora LLMs have been trained on for gaining deep understanding of such processes. In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations. We refer to this view as Causally-augmented Business Processes (BP^C). The core of the benchmark comprises a set of BP^C related situations, a set of questions about these situations, and a set of deductive rules employed to systematically resolve the ground truth answers to these questions. Also with the power of LLMs, the seed is then instantiated into a larger-scale set of domain-specific situations and questions. Reasoning on BP^C is of crucial importance for process interventions and process improvement. Our benchmark, accessible at https://huggingface.co/datasets/ibm/BPC, can be used in one of two possible modalities: testing the performance of any target LLM and training an LLM to advance its capability to reason about BP^C.
翻译:大语言模型正日益广泛地应用于提升组织效率和实现任务自动化。尽管其最初并非为复杂认知过程而设计,但近期研究已进一步扩展至利用大语言模型进行推理、规划与决策等活动。在业务流程领域,此类能力对于利用大语言模型训练所基于的海量语料库以深入理解业务流程具有不可估量的价值。本研究为开发评估大语言模型对业务运营因果与流程视角推理能力的基准奠定了基础。我们将该视角称为因果增强型业务流程。该基准的核心包含一组BP^C相关情境、针对这些情境的问题集,以及用于系统推导这些问题真实答案的演绎规则集。借助大语言模型的能力,该基础框架可进一步实例化为更大规模的领域特定情境与问题集。对BP^C的推理在流程干预与流程改进中至关重要。我们发布于https://huggingface.co/datasets/ibm/BPC的基准可通过两种模式使用:测试任意目标大语言模型的性能,或训练大语言模型以提升其BP^C推理能力。