Tiered Super-Moore's Law: Price Evolution, Production Frontiers, and Market Competition in Large Language Model Inference Services

This paper provides the first systematic economic analysis of token pricing in the large language model (LLM) inference market. Assembling a novel dataset integrating OpenRouter API data (318 models), Epoch AI records (3,237 models), and 62 cross-validated milestone observations spanning 2020-2026, we document an approximately 600-fold decline in token prices and propose the "Tiered Super-Moore" hypothesis. Economy-tier models exhibit a price half-life of 1.10 years and mid-tier models 1.55 years -- both significantly faster than Moore's Law's two-year benchmark -- while flagship models display near-zero exponential fit (R^2 = 0.031) due to a reasoning premium averaging 31.5 times non-reasoning prices. A Chow structural break test identifies May 2024 as the critical market inflection point (F = 5.74, p = 0.005), marking a transition from technology-driven to competition-driven price acceleration. Cost decomposition reveals that total factor productivity residuals account for approximately 103.7% of cost reduction, with GPU hardware contributing only -0.9%, confirming that software and architectural innovation -- not hardware advances -- drive the decline. Data Envelopment Analysis shows a Malmquist Productivity Index peaking at 4.11 during 2024Q1-Q4, with technological frontier shift (TC = 4.13) as the dominant driver. Training cost-inference pricing elasticity is 0.432, and the 63-fold training cost gap between U.S. and Chinese firms is statistically attributable to architectural innovation ($/FLOP difference insignificant, p = 0.228) rather than factor price differentials. Market concentration declined sharply, with HHI falling from 4,558 to 2,086 over three years. These findings establish token economics as a distinct subfield of digital goods pricing and carry implications for competition policy, AI accessibility, and international technology governance.

翻译：本文首次对大型语言模型（LLM）推理市场中的代币定价进行了系统性经济分析。通过整合OpenRouter API数据（318个模型）、Epoch AI记录（3237个模型）以及覆盖2020-2026年间的62个交叉验证里程碑观测值，我们构建了一个新颖的数据集，记录了代币价格约600倍的下降，并提出了“分级超摩尔”假说。经济型模型的价格半衰期为1.10年，中端模型为1.55年——两者均显著快于摩尔定律的两年基准——而旗舰模型由于平均高达31.5倍于非推理价格的推理溢价，其指数拟合近乎为零（R² = 0.031）。邹氏结构断点检验将2024年5月识别为关键市场拐点（F = 5.74, p = 0.005），标志着从技术驱动向竞争驱动的价格加速转变。成本分解表明，全要素生产率残差约占成本降低的103.7%，GPU硬件仅贡献-0.9%，证实软件和架构创新——而非硬件进步——才是价格下降的驱动因素。数据包络分析显示，Malmquist生产率指数在2024年第一季度至第四季度达到峰值4.11，技术进步前沿移动（TC = 4.13）是主导驱动因素。训练成本-推理定价弹性为0.432，中美企业之间63倍的训练成本差距在统计上归因于架构创新（每FLOP成本差异不显著，p = 0.228），而非要素价格差异。市场集中度大幅下降，HHI在三年内从4558降至2086。这些发现确立了代币经济学作为数字商品定价的一个独特子领域，并对竞争政策、人工智能可及性以及国际技术治理具有重要启示。