In this work, we explore the potential of large language models (LLMs) for generating functional test scripts, which necessitates understanding the dynamically evolving code structure of the target software. To achieve this, we propose a case-based reasoning (CBR) system utilizing a 4R cycle (i.e., retrieve, reuse, revise, and retain), which maintains and leverages a case bank of test intent descriptions and corresponding test scripts to facilitate LLMs for test script generation. To improve user experience further, we introduce Re4, an optimization method for the CBR system, comprising reranking-based retrieval finetuning and reinforced reuse finetuning. Specifically, we first identify positive examples with high semantic and script similarity, providing reliable pseudo-labels for finetuning the retriever model without costly labeling. Then, we apply supervised finetuning, followed by a reinforcement learning finetuning stage, to align LLMs with our production scenarios, ensuring the faithful reuse of retrieved cases. Extensive experimental results on two product development units from Huawei Datacom demonstrate the superiority of the proposed CBR+Re4. Notably, we also show that the proposed Re4 method can help alleviate the repetitive generation issues with LLMs.
翻译:本研究探索了大语言模型(LLM)在生成功能测试脚本方面的潜力,这需要理解目标软件动态演化的代码结构。为此,我们提出了一种采用4R循环(即检索、重用、修订与保留)的案例推理(CBR)系统。该系统维护并利用一个包含测试意图描述与对应测试脚本的案例库,以辅助LLM生成测试脚本。为进一步提升用户体验,我们提出了Re4——一种针对CBR系统的优化方法,其包含基于重排序的检索微调与强化重用微调。具体而言,我们首先识别出具有高语义相似度和脚本相似度的正例,为微调检索模型提供可靠的伪标签,从而避免昂贵的标注成本。随后,我们应用监督微调,继而进行强化学习微调阶段,以使LLM与我们的生产场景对齐,确保对检索到的案例进行忠实重用。在华为数据通信产品两个开发单元上的大量实验结果证明了所提出的CBR+Re4方法的优越性。值得注意的是,我们还表明所提出的Re4方法有助于缓解LLM的重复生成问题。