Software Testing at the Network Layer: Automated HTTP API Quality Assessment and Security Analysis of Production Web Applications

Modern web applications rely heavily on client-side API calls to fetch data, render content, and communicate with backend services. However, the quality of these network interactions (redundant requests, missing cache headers, oversized payloads, and excessive third-party dependencies) is rarely tested in a systematic way. Moreover, many of these quality deficiencies carry security implications: missing cache headers enable cache poisoning, excessive third-party dependencies expand the supply-chain attack surface, and error responses risk leaking server internals. In this study, we present an automated software testing framework that captures and analyzes the complete HTTP traffic of 18 production websites spanning 11 categories (e-commerce, news, government, developer tools, travel, and more). Using automated browser instrumentation via Playwright, we record 108 HAR (HTTP Archive) files across 3 independent runs per page, then apply 8 heuristic-based anti-pattern detectors to produce a composite quality score (0-100) for each site. Our results reveal a wide quality spectrum: minimalist server-rendered sites achieve perfect scores of 100, while content-heavy commercial sites score as low as 56.8. We identify redundant API calls and missing cache headers as the two most pervasive anti-patterns, each affecting 67% of sites, while third-party overhead exceeds 20% on 72% of sites. One utility site makes 2,684 requests per page load, which is 447x more than the most minimal site. To protect site reputations, all identities are anonymized using category-based pseudonyms. We provide all analysis scripts, anonymized results, and reproducibility instructions as an open artifact. This work establishes an empirical baseline for HTTP API call quality across the modern web and offers a reproducible testing framework that researchers and practitioners can apply to their own applications.

翻译：现代Web应用高度依赖客户端API调用来获取数据、渲染内容并与后端服务通信。然而，这些网络交互的质量（冗余请求、缓存标头缺失、负载过大以及过度第三方依赖）很少得到系统性测试。此外，许多质量缺陷会引发安全隐患：缓存标头缺失可能导致缓存污染，过度第三方依赖会扩大供应链攻击面，错误响应则存在泄露服务器内部信息的风险。本研究提出一种自动化软件测试框架，通过捕获并分析涵盖11个类别（电子商务、新闻、政府、开发者工具、旅游等）的18个生产环境网站的完整HTTP流量。借助Playwright实现自动化浏览器插桩，我们为每个页面记录3次独立运行产生的108个HAR（HTTP存档）文件，随后应用8种基于启发式的反模式检测器，为每个网站生成综合质量评分（0-100）。研究结果显示质量表现存在显著差异：极简的服务端渲染网站获得满分100分，而内容密集型商业网站最低得分仅为56.8。我们发现冗余API调用与缓存标头缺失是两种最普遍的反模式，分别影响67%的网站；同时72%的网站第三方开销超过20%。某工具类网站单页面加载产生2,684次请求，是最精简网站的447倍。为保护网站声誉，所有实体均使用基于类别的匿名标识进行脱敏处理。我们以开放资源形式提供全部分析脚本、匿名化结果及复现指南。本工作为现代网络HTTP API调用质量建立了实证基准，并提供了可供研究人员与实践者应用于自身应用的可复现测试框架。