The Ethereum ecosystem, which secures over $381 billion in assets, fundamentally relies on client APIs as the sole interface between users and the blockchain. However, these critical APIs suffer from widespread implementation inconsistencies, which can lead to financial discrepancies, degraded user experiences, and threats to network reliability. Despite this criticality, existing testing approaches remain manual and incomplete: they require extensive domain expertise, struggle to keep pace with Ethereum's rapid evolution, and fail to distinguish genuine bugs from acceptable implementation variations. We present APIDiffer, the first specification-guided differential testing framework designed to automatically detect API inconsistencies across Ethereum's diverse client ecosystem. APIDiffer transforms API specifications into comprehensive test suites through two key innovations: (1) specification-guided test input generation that creates both syntactically valid and invalid requests enriched with real-time blockchain data, and (2) specification-aware false positive filtering that leverages large language models to distinguish genuine bugs from acceptable variations. Our evaluation across all 11 major Ethereum clients reveals the pervasiveness of API bugs in production systems. APIDiffer uncovered 72 bugs, with 90.28% already confirmed or fixed by developers. Beyond these raw numbers, APIDiffer achieves up to 89.67% higher code coverage than existing tools and reduces false positive rates by 37.38%. The Ethereum community's response validates our impact: developers have integrated our test cases, expressed interest in adopting our methodology, and escalated one bug to the official Ethereum Project Management meeting.
翻译:以太坊生态系统承载着超过3810亿美元的资产安全,其根本依赖于客户端API作为用户与区块链之间的唯一接口。然而,这些关键API普遍存在实现不一致的问题,可能导致财务差异、用户体验下降以及对网络可靠性的威胁。尽管这一问题至关重要,现有的测试方法仍停留在手动且不完整的阶段:它们需要广泛的领域专业知识,难以跟上以太坊的快速演进,并且无法区分真正的缺陷与可接受的实现差异。我们提出了APIDiffer,这是首个基于规范指导的差分测试框架,旨在自动检测以太坊多样化客户端生态系统中的API不一致性。APIDiffer通过两项关键创新将API规范转化为全面的测试套件:(1)基于规范的测试输入生成,创建语法有效和无效的请求,并注入实时区块链数据;(2)基于规范的误报过滤,利用大语言模型区分真正的缺陷与可接受的差异。我们对所有11个主要以太坊客户端的评估揭示了生产系统中API缺陷的普遍性。APIDiffer发现了72个缺陷,其中90.28%已获得开发者确认或修复。除了这些原始数据,APIDiffer比现有工具实现了高达89.67%的代码覆盖率提升,并将误报率降低了37.38%。以太坊社区的响应验证了我们的影响力:开发者已集成我们的测试用例,表达了采用我们方法的兴趣,并将一个缺陷上报至官方以太坊项目管理会议。