APIs often transmit far more data to client applications than they need, and in the context of web applications, often do so over public channels. This issue, termed Excessive Data Exposure (EDE), was OWASP's third most significant API vulnerability of 2019. However, there are few automated tools -- either in research or industry -- to effectively find and remediate such issues. This is unsurprising as the problem lacks an explicit test oracle: the vulnerability does not manifest through explicit abnormal behaviours (e.g., program crashes or memory access violations). In this work, we develop a metamorphic relation to tackle that challenge and build the first fuzzing tool -- that we call EDEFuzz -- to systematically detect EDEs. EDEFuzz can significantly reduce false negatives that occur during manual inspection and ad-hoc text-matching techniques, the current most-used approaches. We tested EDEFuzz against the sixty-nine applicable targets from the Alexa Top-200 and found 33,365 potential leaks -- illustrating our tool's broad applicability and scalability. In a more-tightly controlled experiment of eight popular websites in Australia, EDEFuzz achieved a high true positive rate of 98.65% with minimal configuration, illustrating our tool's accuracy and efficiency.
翻译:API通常向客户端应用程序传输的数据远超其实际需求,且在Web应用场景中,这类数据传输常通过公共信道进行。该问题被称为过量数据暴露(EDE),在OWASP 2019年十大API安全漏洞中位列第三。然而,无论是学术界还是工业界,都鲜有自动化工具能有效发现并修复此类问题。这并不令人意外,因为该问题缺乏明确的测试预言:此类漏洞不会通过显式的异常行为(如程序崩溃或内存访问违规)表现出来。本研究通过构建蜕变关系来解决这一挑战,并开发了首个系统化检测EDE的模糊测试工具——EDEFuzz。该工具能显著减少当前最常用方法(人工检查与临时文本匹配技术)所产生的漏报。我们在Alexa Top-200榜单中选取69个适用目标进行测试,共发现33,365个潜在数据泄露点,这证明了我们工具的广泛适用性与可扩展性。在对澳大利亚八个热门网站进行的更严格受控实验中,EDEFuzz以最小配置实现了98.65%的高真阳性率,展现了工具的准确性与高效性。