OOPS: Automated generation of REST API specification via LLMs

REST APIs, based on the REpresentational State Transfer (REST) architecture, are the primary type of Web API. The OpenAPI Specification (OAS) serves as the de facto standard for describing REST APIs and is crucial for multiple software engineering tasks. Automated OAS generation can help developers identify and correct issues in manually maintained OAS, but existing approaches rely on technology-specific rules and human expert intervention. LLMs' powerful code understanding capabilities offer the potential to overcome these limitations, but introduce additional challenges such as context length limitations and hallucinations. To address these challenges, we propose OOPS, the first technology-agnostic approach that leverages LLM-based static analysis of server code for OAS generation. Through an LLM agent workflow comprising two key steps, endpoint method extraction and OAS generation, OOPS eliminates the need for technology-specific rules or human expert intervention. By constructing an API dependency graph, it establishes necessary file associations to address LLMs' context length limitations. By multi-stage generation and self-refine, it mitigates both syntactic and semantic hallucinations during OAS generation. We evaluated OOPS on 12 real-world REST APIs spanning 5 programming languages and 8 development frameworks. Experimental results demonstrate that OOPS accurately generates high-quality OAS for REST APIs implemented with diverse technologies, achieving an average F1-score exceeding 98% for endpoint method inference, 97% for both request parameter and response inference, and 92% for parameter constraint inference. The input tokens average below 5.6K with a maximum of 16.13K, while the output tokens average below 0.9K with a maximum of 7.63K.

翻译：REST API基于表述性状态转移（REST）架构，是Web API的主要类型。OpenAPI规范（OAS）作为描述REST API的事实标准，对多项软件工程任务至关重要。自动化OAS生成可帮助开发者识别并修正人工维护OAS中的问题，但现有方法依赖于特定技术规则和人类专家干预。LLM强大的代码理解能力为克服这些限制提供了可能，但同时也带来了上下文长度限制和幻觉等额外挑战。针对这些问题，我们提出OOPS——首个技术无关的OAS生成方法，通过基于LLM的服务器代码静态分析实现。该方法通过包含端点方法提取和OAS生成两个关键步骤的LLM智能体工作流，消除了对特定技术规则或人类专家干预的需求。通过构建API依赖图建立必要的文件关联以应对LLM上下文长度限制，通过多阶段生成与自我精炼缓解OAS生成过程中的语法与语义幻觉。我们在涵盖5种编程语言和8个开发框架的12个真实世界REST API上评估了OOPS。实验结果表明，OOPS能够准确为采用多种技术实现的REST API生成高质量OAS：端点方法推断平均F1分数超过98%，请求参数与响应推断超过97%，参数约束推断达到92%。输入令牌平均低于5.6K（最大16.13K），输出令牌平均低于0.9K（最大7.63K）。