While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a sigmoid in the log-linear combination of model parameter count and topic representation in training data. These two variables alone explain 60% of the variance across 16 dense models from four families, rising to 74-94% within individual families. The form matches a superposition-inspired account in which recall is gated by a signal-to-noise ratio: signal strength scales with concept frequency and the noise floor with model capacity.
翻译:虽然规模法则支配着大型语言模型的整体性能,但尚未有规模法则将事实回忆能力同时与模型规模和训练数据构成相联系。我们使用自动参考文献验证系统评估了38个模型在超过8900条学术参考文献上的表现。回忆质量遵循模型参数数量与训练数据中主题表示的对数线性组合的S型函数关系。仅这两个变量就能解释来自四个系列的16个密集模型间60%的方差,在单个系列内部该比例上升至74-94%。其形式符合一种受叠加启发的解释:回忆能力受信噪比调控,信号强度随概念频率提升,而噪声基底随模型容量增加。