In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.
翻译:过去几年,我们一直致力于自动化微信支付(中国最具影响力的移动支付应用之一)的用户验收测试(UAT)流程。为此,我们开发了名为XUAT的系统。然而,当前系统中仍存在一个高人力密集阶段,即测试脚本生成。因此,本文聚焦于提升当前系统自动化水平的方法,尤其是测试脚本生成阶段。近年来,大语言模型(LLM)在实现类人智能方面展现出显著潜力,同时将其作为自主智能体以获取类人决策能力的研究领域也在不断拓展。受这些工作启发,我们提出了一种基于LLM的多智能体协同系统XUAT-Copilot,用于自动化UAT。该系统主要由三个基于LLM的智能体构成,分别负责动作规划、状态检查和参数选择,并包含状态感知与案例重写两个附加模块。各智能体通过协同方式与测试设备交互,做出类人决策并生成动作指令。实验研究表明,所提出的多智能体系统在有效性上接近人类测试人员,且在Pass@1准确率上相较单智能体架构取得显著提升。更重要的是,该系统已部署于微信支付移动应用的形式化测试环境,为日常开发工作节省了大量人力。