Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat

UI automation tests play a crucial role in ensuring the quality of mobile applications. Despite the growing popularity of machine learning techniques to generate these tests, they still face several challenges, such as the mismatch of UI elements. The recent advances in Large Language Models (LLMs) have addressed these issues by leveraging their semantic understanding capabilities. However, a significant gap remains in applying these models to industrial-level app testing, particularly in terms of cost optimization and knowledge limitation. To address this, we introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and LLMs with best practices. Given the task description, CAT employs Retrieval Augmented Generation (RAG) to source examples of industrial app usage as the few-shot learning context, assisting LLMs in generating the specific sequence of actions. CAT then employs machine learning techniques, with LLMs serving as a complementary optimizer, to map the target element on the UI screen. Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming the state-of-the-art. We have also integrated our approach into the real-world WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and enhancing the developers' testing process.

翻译：UI自动化测试在确保移动应用质量方面发挥着关键作用。尽管机器学习技术生成此类测试日益普及，但仍面临若干挑战，如UI元素匹配问题。大语言模型（LLMs）凭借其语义理解能力的最新进展已解决这些问题。然而，将这些模型应用于工业级应用测试仍存在显著差距，特别是在成本优化与知识局限性方面。为此，我们提出CAT系统，通过融合机器学习、大语言模型与最佳实践，为工业级应用创建成本效益型UI自动化测试。给定任务描述后，CAT采用检索增强生成（RAG）技术获取工业级应用使用范例作为少样本学习上下文，辅助大语言模型生成特定操作序列。随后，CAT运用机器学习技术（以大语言模型作为补充优化器）实现UI屏幕上目标元素的映射。在微信测试数据集上的评估表明，CAT能以0.34美元成本实现90%的UI自动化率，其性能与成本效益均优于现有最优方法。我们已将该方案集成至实际微信测试平台，实践证明其能有效检测141个错误并优化开发者的测试流程。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日