Neural scaling laws have garnered significant interest due to their ability to predict model performance as a function of increasing parameters, data, and compute. In this work, we propose a simple statistical ansatz based on memorization to study scaling laws in the context of inference, specifically how performance improves with multiple inference attempts. We explore the coverage, or pass@k metric, which measures the chance of success over repeated attempts and provide a motivation for the observed functional form of the inference scaling behavior of the coverage in large language models (LLMs) on reasoning tasks. We then define an "inference loss", which exhibits a power law decay as the number of trials increases, and connect this result with prompting costs. We further test our construction by conducting experiments on a simple generative model, and find that our predictions are in agreement with the empirical coverage curves in a controlled setting. Our simple framework sets the ground for incorporating inference scaling with other known scaling laws.
翻译:神经缩放律因其能够预测模型性能随参数、数据及计算量增加的变化规律而备受关注。本文提出一个基于记忆机制的简单统计假设,用于研究推理场景下的缩放规律,特别是性能如何随多次推理尝试而提升。我们探讨了覆盖率(或称pass@k指标)——该指标衡量重复尝试下的成功概率,并为大语言模型在推理任务中观察到的覆盖率推理缩放行为函数形式提供了理论依据。随后,我们定义了"推理损失"概念,该损失随尝试次数增加呈现幂律衰减特性,并将此结果与提示成本建立关联。我们通过在简单生成模型上的实验进一步验证了理论构建,发现在受控环境中实验观测的覆盖率曲线与理论预测相符。该简洁框架为将推理缩放律与其他已知缩放律相结合奠定了基础。