This paper provides the first systematic economic analysis of token pricing in the large language model (LLM) inference market. Assembling a novel dataset integrating OpenRouter API data (318 models), Epoch AI records (3,237 models), and 62 cross-validated milestone observations spanning 2020-2026, we document an approximately 600-fold decline in token prices and propose the "Tiered Super-Moore" hypothesis. Economy-tier models exhibit a price half-life of 1.10 years and mid-tier models 1.55 years -- both significantly faster than Moore's Law's two-year benchmark -- while flagship models display near-zero exponential fit (R^2 = 0.031) due to a reasoning premium averaging 31.5 times non-reasoning prices. A Chow structural break test identifies May 2024 as the critical market inflection point (F = 5.74, p = 0.005), marking a transition from technology-driven to competition-driven price acceleration. Cost decomposition reveals that total factor productivity residuals account for approximately 103.7% of cost reduction, with GPU hardware contributing only -0.9%, confirming that software and architectural innovation -- not hardware advances -- drive the decline. Data Envelopment Analysis shows a Malmquist Productivity Index peaking at 4.11 during 2024Q1-Q4, with technological frontier shift (TC = 4.13) as the dominant driver. Training cost-inference pricing elasticity is 0.432, and the 63-fold training cost gap between U.S. and Chinese firms is statistically attributable to architectural innovation ($/FLOP difference insignificant, p = 0.228) rather than factor price differentials. Market concentration declined sharply, with HHI falling from 4,558 to 2,086 over three years. These findings establish token economics as a distinct subfield of digital goods pricing and carry implications for competition policy, AI accessibility, and international technology governance.
翻译:本文首次对大型语言模型(LLM)推理市场中的代币定价进行了系统性经济分析。通过整合OpenRouter API数据(318个模型)、Epoch AI记录(3237个模型)以及覆盖2020-2026年间的62个交叉验证里程碑观测值,我们构建了一个新颖的数据集,记录了代币价格约600倍的下降,并提出了“分级超摩尔”假说。经济型模型的价格半衰期为1.10年,中端模型为1.55年——两者均显著快于摩尔定律的两年基准——而旗舰模型由于平均高达31.5倍于非推理价格的推理溢价,其指数拟合近乎为零(R² = 0.031)。邹氏结构断点检验将2024年5月识别为关键市场拐点(F = 5.74, p = 0.005),标志着从技术驱动向竞争驱动的价格加速转变。成本分解表明,全要素生产率残差约占成本降低的103.7%,GPU硬件仅贡献-0.9%,证实软件和架构创新——而非硬件进步——才是价格下降的驱动因素。数据包络分析显示,Malmquist生产率指数在2024年第一季度至第四季度达到峰值4.11,技术进步前沿移动(TC = 4.13)是主导驱动因素。训练成本-推理定价弹性为0.432,中美企业之间63倍的训练成本差距在统计上归因于架构创新(每FLOP成本差异不显著,p = 0.228),而非要素价格差异。市场集中度大幅下降,HHI在三年内从4558降至2086。这些发现确立了代币经济学作为数字商品定价的一个独特子领域,并对竞争政策、人工智能可及性以及国际技术治理具有重要启示。