In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.
翻译:过去数年,我们致力于自动化微信支付(中国最具影响力的移动支付应用之一)的用户验收测试(UAT)流程,并为此开发了名为XUAT的系统。然而,当前系统仍存在一个高度依赖人力的阶段——测试脚本生成。因此,本文聚焦于提升当前系统的自动化水平,尤其是测试脚本生成阶段。近年来,大语言模型(LLMs)在模拟人类智能方面展现出显著潜力,且将LLMs作为自主智能体以获取类人决策能力的研究领域日益兴起。受上述工作启发,我们提出一种基于LLM的多智能体协作系统XUAT-Copilot,用于自动化UAT。该系统主要由三个基于LLM的智能体构成,分别负责动作规划、状态检测和参数选择,并额外配备状态感知与案例重写两个模块。各智能体通过与测试设备交互,以协作方式做出类人决策并生成动作指令。实验研究表明,所提多智能体系统在效果上接近人类测试人员,且在Pass@1准确率上较单智能体架构取得显著提升。更重要的是,该系统已在微信支付移动应用的正式测试环境中部署,大幅节省了日常开发工作中的人力成本。