Automating software development processes through the orchestration of GitHub Action workflows has revolutionized the efficiency and agility of software delivery pipelines. This paper presents a detailed investigation into the use of Large Language Models (LLMs) specifically, GPT 3.5 and GPT 4 to generate and evaluate GitHub Action workflows for DevOps tasks. Our methodology involves data collection from public GitHub repositories, prompt engineering for LLM utilization, and evaluation metrics encompassing exact match scores, BLEU scores, and a novel DevOps Aware score. The research scrutinizes the proficiency of GPT 3.5 and GPT 4 in generating GitHub workflows, while assessing the influence of various prompt elements in constructing the most efficient pipeline. Results indicate substantial advancements in GPT 4, particularly in DevOps awareness and syntax correctness. The research introduces a GitHub App built on Probot, empowering users to automate workflow generation within GitHub ecosystem. This study contributes insights into the evolving landscape of AI-driven automation in DevOps practices.
翻译:通过编排GitHub Actions工作流实现软件开发流程自动化,已显著提升软件交付流水线的效率与敏捷性。本文系统研究了利用大语言模型(LLMs,具体包括GPT 3.5与GPT 4)生成并评估GitHub Actions工作流的方法。我们的技术路线涵盖:从公开GitHub仓库采集数据、面向LLM的提示工程策略构建,以及采用精确匹配分数、BLEU分数及新型DevOps感知分数(DevOps Aware score)构成的评估指标体系。研究重点剖析了GPT 3.5与GPT 4在生成GitHub工作流方面的能力差异,并评估了不同提示要素对构建最优流水线的影响。实验结果表明,GPT 4在DevOps语义理解与语法正确性方面取得显著突破。本研究还基于Probot框架开发了GitHub应用程序,使用户能够在GitHub生态系统中实现工作流自动化生成。本文为DevOps实践中人工智能驱动自动化的发展提供了重要见解。