Scientific workflow systems are increasingly popular for expressing and executing complex data analysis pipelines over large datasets, as they offer reproducibility, dependability, and scalability of analyses by automatic parallelization on large compute clusters. However, implementing workflows is difficult due to the involvement of many black-box tools and the deep infrastructure stack necessary for their execution. Simultaneously, user-supporting tools are rare, and the number of available examples is much lower than in classical programming languages. To address these challenges, we investigate the efficiency of Large Language Models (LLMs), specifically ChatGPT, to support users when dealing with scientific workflows. We performed three user studies in two scientific domains to evaluate ChatGPT for comprehending, adapting, and extending workflows. Our results indicate that LLMs efficiently interpret workflows but achieve lower performance for exchanging components or purposeful workflow extensions. We characterize their limitations in these challenging scenarios and suggest future research directions.
翻译:科学工作流系统在表达和执行大规模数据分析管道方面日益流行,通过在大规模计算集群上自动并行化,实现了分析的可重复性、可靠性和可扩展性。然而,由于涉及众多黑箱工具以及执行所需的基础设施栈,工作流的实现困难重重。同时,用户辅助工具较为稀缺,且可用示例数量远少于经典编程语言。为解决这些挑战,我们研究了大型语言模型(LLMs),特别是ChatGPT,在支持用户处理科学工作流方面的效率。我们在两个科学领域进行了三项用户研究,评估ChatGPT在理解、适配和扩展工作流方面的能力。结果表明,LLMs能高效解读工作流,但在组件替换或目的性工作流扩展方面表现欠佳。我们描述了这些挑战性场景中的局限性,并提出了未来研究方向。