Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.
翻译:访问前沿大型语言模型(LLMs),如GPT-5和Gemini-2.5,常受限于高昂定价、支付壁垒和区域限制。这些限制催生了$\textit{影子API}$的激增,即声称通过间接访问提供无区域限制的官方模型服务的第三方服务。尽管影子API被广泛使用,但其输出是否与官方API保持一致尚不明确,这引发了对依赖此类服务的下游应用可靠性及研究成果有效性的担忧。本文首次对官方LLM API与相应影子API进行了系统性审计。我们首先识别出187篇学术论文中使用的17个影子API,其中截至2025年12月6日最受欢迎的一个已获得5,966次引用和58,639个GitHub星标。通过对三个代表性影子API在效用性、安全性和模型验证三个维度的审计,我们发现了影子API中欺骗行为的间接与直接证据。具体而言,我们揭示了高达$47.21\%$的性能差异、安全行为的显著不可预测性,以及$45.83\%$的指纹测试中出现身份验证失败。这些欺骗行为严重损害了科学研究的可复现性与有效性,侵害了影子API用户的利益,并损害了官方模型提供商的声誉。