This paper presents an investigation of the capabilities of Generative Pre-trained Transformers (GPTs) to auto-generate graphical process models from multi-modal (i.e., text- and image-based) inputs. More precisely, we first introduce a small dataset as well as a set of evaluation metrics that allow for a ground truth-based evaluation of multi-modal process model generation capabilities. We then conduct an initial evaluation of commercial GPT capabilities using zero-, one-, and few-shot prompting strategies. Our results indicate that GPTs can be useful tools for semi-automated process modeling based on multi-modal inputs. More importantly, the dataset and evaluation metrics as well as the open-source evaluation code provide a structured framework for continued systematic evaluations moving forward.
翻译:本文研究了生成式预训练Transformer(GPT)从多模态(即基于文本和图像的)输入中自动生成图形化流程模型的能力。具体而言,我们首先引入一个小型数据集和一套评估指标,以实现基于真实值的多模态流程模型生成能力评估。随后,我们采用零样本、单样本和少样本提示策略对商用GPT能力进行了初步评估。结果表明,GPT可成为基于多模态输入的半自动化流程建模的有效工具。更重要的是,所提供的数据集、评估指标以及开源评估代码为未来持续开展系统性评估建立了结构化框架。