Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. To address these challenges, we investigate the efficiency of Large Language Models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed three user studies in two scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.
翻译:科学工作流系统在表达和执行大规模数据集上的复杂数据分析流程中日益普及,它们通过在大规模计算集群上自动并行化,确保了分析的可重复性、可靠性和可扩展性。然而,由于涉及众多黑盒工具及执行所需的深层基础设施架构,工作流的实现十分困难。与此同时,支持用户的工具稀缺,可用的示例数量远少于传统编程语言。为应对这些挑战,我们研究了大型语言模型(LLMs)——特别是ChatGPT——在支持用户处理科学工作流方面的效率。我们在两个科学领域开展了三项用户研究,评估ChatGPT在理解、适配和扩展工作流方面的能力。结果表明,LLMs能高效解释工作流,但在组件替换或目的性工作流扩展方面表现欠佳。我们刻画了模型在这些挑战场景中的局限性,并提出了未来研究方向。