XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.

翻译：过去几年，我们一直致力于自动化微信支付（中国最具影响力的移动支付应用之一）的用户验收测试（UAT）流程。为此，我们开发了名为XUAT的系统。然而，当前系统中仍存在一个高人力密集阶段，即测试脚本生成。因此，本文聚焦于提升当前系统自动化水平的方法，尤其是测试脚本生成阶段。近年来，大语言模型（LLM）在实现类人智能方面展现出显著潜力，同时将其作为自主智能体以获取类人决策能力的研究领域也在不断拓展。受这些工作启发，我们提出了一种基于LLM的多智能体协同系统XUAT-Copilot，用于自动化UAT。该系统主要由三个基于LLM的智能体构成，分别负责动作规划、状态检查和参数选择，并包含状态感知与案例重写两个附加模块。各智能体通过协同方式与测试设备交互，做出类人决策并生成动作指令。实验研究表明，所提出的多智能体系统在有效性上接近人类测试人员，且在Pass@1准确率上相较单智能体架构取得显著提升。更重要的是，该系统已部署于微信支付移动应用的形式化测试环境，为日常开发工作节省了大量人力。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日