Millions of users rely on a market of cloud-based services to obtain access to state-of-the-art large language models. However, it has been very recently shown that the de facto pay-per-token pricing mechanism used by providers creates a financial incentive for them to strategize and misreport the (number of) tokens a model used to generate an output. In this paper, we develop an auditing framework based on martingale theory that enables a trusted third-party auditor who sequentially queries a provider to detect token misreporting. Crucially, we show that our framework is guaranteed to always detect token misreporting, regardless of the provider's (mis-)reporting policy, and not falsely flag a faithful provider as unfaithful with high probability. To validate our auditing framework, we conduct experiments across a wide range of (mis-)reporting policies using several large language models from the $\texttt{Llama}$, $\texttt{Gemma}$ and $\texttt{Ministral}$ families, and input prompts from a popular crowdsourced benchmarking platform. The results show that our framework detects an unfaithful provider after observing fewer than $\sim 70$ reported outputs, while maintaining the probability of falsely flagging a faithful provider below $α= 0.05$.
翻译:数百万用户依赖基于云的服务市场获取最先进的大型语言模型。然而,近期研究表明,提供商采用的按令牌计费机制使其存在操纵和虚报模型生成输出所用令牌数量的财务动机。本文基于鞅理论开发了一个审计框架,使受信任的第三方审计员能够通过顺序查询提供商来检测令牌虚报行为。关键的是,我们证明该框架无论提供商采用何种(虚)报告策略,都能保证检测到令牌虚报,并且以高概率避免将诚信提供商误标记为不诚信。为验证该审计框架,我们使用来自$\texttt{Llama}$、$\texttt{Gemma}$和$\texttt{Ministral}$系列的多款大型语言模型以及流行众包基准测试平台的输入提示,在多种(虚)报告策略下进行了实验。结果表明,本框架在观测到不足$\sim 70$个报告输出后即可检测出不诚信提供商,同时将误判诚信提供商的概率控制在$α= 0.05$以下。