Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge users for the amount of test-time compute they use to generate an output. In our work, we show that the market of LLM-as-a-service is socially inefficient: providers have a financial incentive to increase the amount of test-time compute, even if this increase contributes little to the quality of the outputs. To address this inefficiency, we introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user, and users pay proportionally to the marginal value generated by the winning provider relative to the second-highest bidder. To illustrate and complement our theoretical results, we conduct experiments with multiple instruct models from the $\texttt{Llama}$ and $\texttt{Qwen}$ families, as well as reasoning models distilled from $\texttt{DeepSeek-R1}$, on math and science benchmark datasets.
翻译:测试时计算已成为增强大型语言模型推理能力的一种有前景的策略。然而,该策略也增加了用户向提供LLM即服务的云服务提供商支付的费用,因为提供商根据用户生成输出所使用的测试时计算量进行收费。在我们的工作中,我们证明LLM即服务市场存在社会效率低下的问题:提供商有经济激励去增加测试时计算量,即使这种增加对输出质量的贡献微乎其微。为解决这一效率低下问题,我们引入了一种反向第二价格拍卖机制:提供商为其服务机会竞标报价与(预期)质量,用户则根据中标提供商相对于次高竞价者所产生的边际价值按比例支付费用。为阐释并补充我们的理论结果,我们在数学与科学基准数据集上,对来自$\texttt{Llama}$和$\texttt{Qwen}$系列的多个指令模型,以及从$\texttt{DeepSeek-R1}$蒸馏得到的推理模型进行了实验。