While scaling laws optimize training configurations for large language models (LLMs) through experiments on smaller or early-stage models, they fail to predict emergent abilities due to the absence of such capabilities in these models. To address this, we propose a method that predicts emergent abilities by leveraging proxy tasks. We begin by establishing relevance metrics between the target task and candidate tasks based on performance differences across multiple models. These candidate tasks are then validated for robustness with small model ensembles, leading to the selection of the most appropriate proxy tasks. The predicted performance on the target task is then derived by integrating the evaluation results of these proxies. In a case study on tool utilization capabilities, our method demonstrated a strong correlation between predicted and actual performance, confirming its effectiveness.
翻译:尽管缩放定律通过在小规模或早期模型上的实验来优化大型语言模型(LLMs)的训练配置,但由于这些模型本身缺乏涌现能力,缩放定律无法预测此类能力。为解决此问题,我们提出一种利用代理任务预测涌现能力的方法。我们首先基于多个模型间的性能差异,建立目标任务与候选任务之间的相关性度量。随后通过小型模型集成验证这些候选任务的鲁棒性,从而筛选出最合适的代理任务。目标任务的预测性能最终通过整合这些代理任务的评估结果得出。在工具使用能力的案例研究中,我们的方法显示出预测性能与实际性能之间的强相关性,验证了其有效性。