Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose Reinforcement Learning from Static Quality Metrics (RLSQM), wherein we utilize Reinforcement Learning to generate high-quality unit tests based on static analysis-based quality metrics. First, we analyzed LLM-generated tests and show that LLMs frequently do generate undesirable test smells -- up to 37% of the time. Then, we implemented lightweight static analysis-based reward model and trained LLMs using this reward model to optimize for five code quality metrics. Our experimental results demonstrate that the RL-optimized Codex model consistently generated higher-quality test cases than the base LLM, improving quality metrics by up to 23%, and generated nearly 100% syntactically-correct code. RLSQM also outperformed GPT-4 on all code quality metrics, in spite of training a substantially cheaper Codex model. We provide insights into how reliably utilize RL to improve test generation quality and show that RLSQM is a significant step towards enhancing the overall efficiency and reliability of automated software testing. Our data are available at https://doi.org/10.6084/m9.figshare.25983166.

翻译：软件测试是软件开发中至关重要但耗时的一环，近期，大型语言模型（LLMs）在自动化测试用例生成方面日益普及。然而，由于LLMs是在海量开源代码上训练的，它们生成的测试用例常常不符合最佳实践，甚至可能包含测试异味（反模式）。为解决这一问题，我们提出了基于静态质量指标的强化学习方法（RLSQM），即利用强化学习，基于静态分析的质量指标来生成高质量的单元测试。首先，我们分析了LLM生成的测试，发现LLM确实频繁生成不良的测试异味——比例高达37%。随后，我们实现了一个基于轻量级静态分析的奖励模型，并使用该奖励模型训练LLMs，以优化五项代码质量指标。实验结果表明，经过RL优化的Codex模型持续生成比基础LLM更高质量的测试用例，质量指标提升最高达23%，且生成的代码语法正确率接近100%。尽管训练的是成本显著更低的Codex模型，RLSQM在所有代码质量指标上均优于GPT-4。我们深入探讨了如何可靠地利用RL提升测试生成质量，并表明RLSQM是提高自动化软件测试整体效率与可靠性的重要一步。相关数据可在 https://doi.org/10.6084/m9.figshare.25983166 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日