XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.

翻译：过去数年，我们致力于自动化微信支付（中国最具影响力的移动支付应用之一）的用户验收测试（UAT）流程，并为此开发了名为XUAT的系统。然而，当前系统仍存在一个高度依赖人力的阶段——测试脚本生成。因此，本文聚焦于提升当前系统的自动化水平，尤其是测试脚本生成阶段。近年来，大语言模型（LLMs）在模拟人类智能方面展现出显著潜力，且将LLMs作为自主智能体以获取类人决策能力的研究领域日益兴起。受上述工作启发，我们提出一种基于LLM的多智能体协作系统XUAT-Copilot，用于自动化UAT。该系统主要由三个基于LLM的智能体构成，分别负责动作规划、状态检测和参数选择，并额外配备状态感知与案例重写两个模块。各智能体通过与测试设备交互，以协作方式做出类人决策并生成动作指令。实验研究表明，所提多智能体系统在效果上接近人类测试人员，且在Pass@1准确率上较单智能体架构取得显著提升。更重要的是，该系统已在微信支付移动应用的正式测试环境中部署，大幅节省了日常开发工作中的人力成本。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日