A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs

As modern web services increasingly rely on REST APIs, their thorough testing has become crucial. Furthermore, the advent of REST API specifications such as the OpenAPI Specification has led to the emergence of many black-box REST API testing tools. However, these tools often focus on individual test elements in isolation (e.g., APIs, parameters, values), resulting in lower coverage and less effectiveness in detecting faults (i.e., 500 response codes). To address these limitations, we present AutoRestTest, the first black-box framework to adopt a dependency-embedded multi-agent approach for REST API testing, integrating Multi-Agent Reinforcement Learning (MARL) with a Semantic Property Dependency Graph (SPDG) and Large Language Models (LLMs). Our approach treats REST API testing as a separable problem, where four agents -- API, dependency, parameter, and value -- collaborate to optimize API exploration. LLMs handle domain-specific value restrictions, the SPDG model simplifies the search space for dependencies using a similarity score between API operations, and MARL dynamically optimizes the agents' behavior. Evaluated on 12 real-world REST services, AutoRestTest outperforms the four leading black-box REST API testing tools, including those assisted by RESTGPT (which augments realistic test inputs using LLMs), in terms of code coverage, operation coverage, and fault detection. Notably, AutoRestTest is the only tool able to identify an internal server error in Spotify. Our ablation study underscores the significant contributions of the agent learning, SPDG, and LLM components.

翻译：随着现代网络服务日益依赖REST API，对其进行全面测试变得至关重要。此外，OpenAPI规范等REST API规范的出现催生了许多黑盒REST API测试工具。然而，这些工具通常孤立地关注单个测试元素（例如API、参数、值），导致覆盖率较低且检测故障（即500响应代码）的有效性不足。为应对这些局限，我们提出了AutoRestTest——首个采用依赖嵌入多智能体方法进行REST API测试的黑盒框架，该框架将多智能体强化学习（MARL）与语义属性依赖图（SPDG）及大语言模型（LLMs）相集成。我们的方法将REST API测试视为可分解问题，其中四个智能体——API智能体、依赖智能体、参数智能体和值智能体——通过协作优化API探索。LLMs处理领域特定的值约束，SPDG模型通过API操作间的相似度评分简化依赖关系的搜索空间，而MARL则动态优化智能体的行为。在12个真实REST服务上的评估表明，AutoRestTest在代码覆盖率、操作覆盖率和故障检测方面均优于四种主流黑盒REST API测试工具（包括采用LLMs增强现实测试输入的RESTGPT辅助工具）。值得注意的是，AutoRestTest是唯一能检测出Spotify内部服务器错误的工具。我们的消融研究证实了智能体学习、SPDG和LLM组件的重要贡献。