We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks. To address this, we propose a latent variable modeling framework in which each LLM family is associated with a latent variable that captures the common underlying features in that family. An LLM's performance on different benchmarks is then driven by its latent skills, which are jointly determined by the latent variable and the model's own observable features. We develop an estimation procedure for this latent variable model and establish its statistical properties. We also design efficient numerical algorithms that support estimation and various downstream tasks. Empirically, we evaluate the approach on 12 widely used benchmarks from the Open LLM Leaderboard (v1/v2).
翻译:我们提出一个基于潜变量建模的统计框架,用于研究大型语言模型(LLM)的缩放法则。这一工作源于当前涌现出大量具有不同架构和训练策略的新型LLM家族,并在日益增多的基准测试中接受评估。这种异质性使得单一的全局缩放曲线无法有效刻画不同模型家族与基准测试之间的性能差异。为此,我们提出一个潜变量建模框架:每个LLM家族关联一个表征其共有底层特征的潜变量,而LLM在不同基准测试上的表现则由其潜在技能驱动——这些技能由潜变量与模型自身可观测特征共同决定。我们为该潜变量模型开发了估计方法,并论证其统计性质。同时设计支持估计及多种下游任务的高效数值算法。实证方面,我们在Open LLM排行榜(v1/v2)的12项广泛使用的基准测试上评估了该方法。