GUI testing checks if a software system behaves as expected when users interact with its graphical interface, e.g., testing specific functionality or validating relevant use case scenarios. Currently, deciding what to test at this high level is a manual task since automated GUI testing tools target lower level adequacy metrics such as structural code coverage or activity coverage. We propose DroidAgent, an autonomous GUI testing agent for Android, for semantic, intent-driven automation of GUI testing. It is based on Large Language Models and support mechanisms such as long- and short-term memory. Given an Android app, DroidAgent sets relevant task goals and subsequently tries to achieve them by interacting with the app. Our empirical evaluation of DroidAgent using 15 apps from the Themis benchmark shows that it can set up and perform realistic tasks, with a higher level of autonomy. For example, when testing a messaging app, DroidAgent created a second account and added a first account as a friend, testing a realistic use case, without human intervention. On average, DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques. Further, manual analysis shows that 317 out of the 374 autonomously created tasks are realistic and relevant to app functionalities, and also that DroidAgent interacts deeply with the apps and covers more features.
翻译:GUI测试旨在验证用户与图形界面交互时软件系统是否按预期运行,例如测试特定功能或验证相关用例场景。当前,高层级测试决策(如测试内容的选择)仍依赖人工操作,因为自动化GUI测试工具主要针对结构代码覆盖率或活动覆盖率等低层级充分性指标。我们提出DroidAgent——一种面向Android系统的自主化GUI测试智能体,用于实现语义化、意图驱动的GUI测试自动化。该智能体基于大语言模型,并集成长期与短期记忆等支持机制。给定任意Android应用,DroidAgent能够自主设定相关任务目标,并通过与应用交互逐步实现这些目标。基于Themis基准中15个应用的实证评估表明,DroidAgent能够以更高自主性完成真实任务的设定与执行。例如,在测试某即时通讯应用时,DroidAgent可自主创建第二个账户并添加首个账户为好友,无需人工干预即可测试真实用例。平均而言,DroidAgent实现了61%的活动覆盖率,而现有最优GUI测试技术仅为51%。此外,人工分析显示,在DroidAgent自主生成的374项任务中,317项具有真实性与应用功能相关性,且该智能体与应用交互深度较高,覆盖了更多功能特性。