Large language models (LLMs) have been shown to exhibit emergent abilities in some downstream tasks, where performance seems to stagnate at first and then improve sharply and unpredictably with scale beyond a threshold. By dividing questions in the datasets according to difficulty level by average performance, we observe U-shaped scaling for hard questions, and inverted-U scaling followed by steady improvement for easy questions. Moreover, the emergence threshold roughly coincides with the point at which performance on easy questions reverts from inverse scaling to standard scaling. Capitalizing on the observable though opposing scaling trend on easy and hard questions, we propose a simple yet effective pipeline, called Slice-and-Sandwich, to predict both the emergence threshold and model performance beyond the threshold.
翻译:大语言模型(LLMs)在某些下游任务中展现出涌现能力,其性能在规模达到阈值前似乎停滞不前,随后随规模扩大而急剧且不可预测地提升。通过根据平均表现将数据集中的问题按难度分级,我们观察到困难问题呈现U型缩放规律,而简单问题则呈现倒U型缩放,随后进入稳定提升阶段。此外,涌现阈值大致与简单问题性能从逆向缩放转变为标准缩放的时间点重合。基于简单问题与困难问题中可观测且相反的缩放趋势,我们提出了一种简单而有效的预测流程——称为“切片-夹层法”,用于预测涌现阈值及阈值之外的模型性能。