Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated frameworks assess syntactic and semantic quality, they miss human factors like trust, usability, and professional alignment. We conducted a mixed-methods evaluation of our proposed solution, an LLM-powered BPMN copilot, with five process modeling experts using focus groups and standardized questionnaires. Our findings reveal a critical tension between acceptable perceived usability (mean CUQ score: 67.2/100) and notably lower trust (mean score: 48.8\%), with reliability rated as the most critical concern (M=1.8/5). Furthermore, we identified output-quality issues, prompting difficulties, and a need for the LLM to ask more in-depth clarifying questions about the process. We envision five use cases ranging from domain-expert support to enterprise quality assurance. We demonstrate the necessity of human-centered evaluation complementing automated benchmarking for LLM modeling agents.
翻译:将大型语言模型(LLM)集成到业务流程管理工具中,有望为非专业人士普及业务流程模型与标注(BPMN)建模。尽管自动化框架能够评估句法和语义质量,但它们忽略了信任度、可用性及专业契合度等人本因素。我们采用焦点小组和标准化问卷,与五位流程建模专家共同对我们提出的解决方案——一款基于LLM的BPMN助手——进行了混合方法评估。研究发现,在可接受的感知可用性(平均CUQ得分:67.2/100)与显著较低的信任度(平均得分:48.8%)之间存在关键张力,其中可靠性被评为最受关注的问题(均值=1.8/5)。此外,我们识别出输出质量缺陷、提示词设计困难,以及LLM需就流程细节提出更深入澄清问题的需求。我们展望了从领域专家支持到企业质量保障的五类应用场景。本研究论证了人本评估对LLM建模代理自动基准测试的补充必要性。